-
PDF
- Split View
-
Views
-
Cite
Cite
Huaqiang Zhou, Yaxiong Zhang, Jiaqing Liu, Yunpeng Yang, Wenfeng Fang, Shaodong Hong, Gang Chen, Shen Zhao, Zhonghan Zhang, Jiayi Shen, Wei Xian, Yan Huang, Hongyun Zhao, Li Zhang, Education and lung cancer: a Mendelian randomization study, International Journal of Epidemiology, Volume 48, Issue 3, June 2019, Pages 743–750, https://doi.org/10.1093/ije/dyz121
Close - Share Icon Share
Abstract
We aimed to investigate whether more years spent in education are causally associated with a lower risk of lung cancer, through a two-sample Mendelian randomization study.
The main analysis used publicly available genetic summary data from two large consortia [International Lung Cancer Consortium (ILCCO) and Social Science Genetic Association Consortium (SSGAC)]. Genetic variants used as instrumental variables for years of education were derived from SSGAC. Finally, genetic data from three additional consortia (TAG, GLGC, GIANT) were analysed to investigate whether education could causally alter common lung cancer risk factors. The exposure was the genetic predisposition to higher levels of education, measured by 73 single nucleotide polymorphisms from SSGAC. The primary outcome was the risk of lung cancer (11 348 events in ILCCO). Secondary outcomes based on different histological subtypes were also examined. Analyses were performed using the package TwoSampleMR in R.
Genetic predisposition towards 3.6 years of additional education was associated with a 52% lower risk of lung cancer (odds ratio 0.48, 95% confidence interval 0.34 to 0.66; P = 1.02 × 10 − 5). Sensitivity analyses were consistent with a causal interpretation in which major bias from genetic pleiotropy was unlikely. The Mendelian randomization assumptions did not seem to be violated. Genetic predisposition towards longer education was additionally associated with less smoking, lower body mass index and a favourable blood lipid profile.
Our study indicated that low education is a causal risk factor in the development of lung cancer. Further work is needed to elucidate the potential mechanisms.
Genetic predisposition towards 3.6 years of additional education was associated with a 52% reduced risk of lung cancer.
Genetic predisposition towards longer education was additionally associated with less smoking, lower body mass index and a favourable blood lipid profile.
Our study indicated that low education is a causal risk factor in the development of lung cancer.
Background
Lung cancer is the leading cause of cancer death and the second most common cancer in the USA, with approximately 154 050 estimated deaths and 234 030 new cancer cases in 2018.1 The overall 5-year survival rate for lung cancer patients is 18%.1 With the increasing burden of lung cancer, it is important to identify potentially modifiable risk factors for better prevention. For instance, smoking is the most common established cause of lung cancer.2 Cigarette smoking rates are at historically low levels in the USA due to tobacco control policies, and lung cancer incidence/mortality has significantly declined. Although the causal relationship of risk factors, such as smoking and radon, are generally accepted, the causality of other potential factors, for example education, remains unclear.
Education has been shown to be inversely associated with the incidence of lung cancer in several conventional observational studies.3–7 In general, the higher the level of educational attainment, the lower the lung cancer risk. Even when adjusted for smoking, education is still associated with lung cancer in observational studies.3 However, this association may be biased due to the methodological limitations of traditional observational study, including confounding, reverse causation and measurement error.8 Furthermore, we cannot determine causality through these observational studies. Indeed, clarifying the relationship between education and lung cancer is beneficial for our understanding of the causes of lung cancer, and for the potential development of prevention policies. Given the long latency period between exposure and outcome and the unethical approaches of limiting education in childhood, randomized controlled trials (RCTs), the gold standard in causal inference, are not feasible for this issue. Therefore, we urgently need to assess causal inference through other study designs.
Mendelian randomization (MR) is a novel genetic epidemiological study design that uses genetic variants as instrumental variables (‘proxies’) to assess causal inferences between risk factors and disease outcomes.9 The fundamental principle of MR is that genetic variants, which indicate the level or the biological effects of a modifiable environmental exposure that alters disease risk, should be causally related to disease risk to the extent predicted by their influence on the modifiable environmental exposure.10 These genetic variants could serve as unconfounded proxies for modifiable risk factors because they are randomly allocated before birth and fixed at conception, which provides an analogy to randomized controlled trials (RCTs) in a non-experimental (observational) setting. The MR design can prevent the potential limitations that are common in conventional observational studies, such as confounding, reverse causation and measurement error.11 In addition, we can implement the MR design using two-sample MR analysis, based on the published summary data from large-scale genome-wide association studies (GWAS), which greatly increases the scope and statistical power of MR.12,13 The MR method has been successfully applied in many studies. Several researchers have found that education is causally associated with myopia and coronary heart disease, using the MR approach.14,15 However, there are no related reports about educational attainment in the field of lung cancer.
In the present study, we applied two-sample MR analyses to identify the potential causal association between education and risk of lung cancer.
Methods
Genetic variants associated with educational attainment
Okbay et al. reported a large meta-analysis of GWAS for educational attainment in 293 723 people of European descent. We defined educational attainment in the same manner as that in the education GWAS by Okbay et al.,16 in which educational attainment was defined by whether the participant attained a given level of schooling based on the International Standard Classification of Education 1997 classification scale. Okbay et al. identified 74 single nucleotide polymorphisms (SNPs) robustly associated with educational attainment in the Social Science Genetic Association Consortium (SSGAC) at a GWAS threshold of statistical significance [293 723 participants; P < 5*10–8; linkage disequilibrium (LD) r2 <0.1] (Table 1).17 The results were directionally consistent with the UK Biobank replication set of 111,000 individuals. These 74 SNPs explain 0.43% of the variation in educational attainment across individuals. The F statistic is larger than the conventional value of 10, which means the instruments used strongly predict the educational attainment.18 The number required for 80% power in lung cancer with an odds ratio (OR) of 0.48 is at least 15 206 subjects (see Table 2 for power calculations). Thus, it is sufficient to generate a strong genetic instrument based on these 74 SNPs (Table 2). Therefore, we used only SNPs and summary data from SSGAC in this study. One SNP (rs12772375) was not in the Haplotype Reference Consortium panel.19 Finally, we used a set of 73 independent SNPs as instrumental variables that were associated with educational attainment.
Details of studies included in Mendelian randomization analyses
| Consortium . | Phenotype . | Participants . | Web source . |
|---|---|---|---|
| SSGAC | Years of education | 293 723 | www.thessgac.org/data |
| ILCCO | Lung cancer | 27 209 | ilcco.iarc.fr |
| TAG | Smoking | 74 053 | www.med.unc.edu/pgc/results-and-downloads |
| GLGC | Triglycerides and total cholesterol | 188 557 | csg.sph.umich.edu/willer/public/lipids2013/ |
| GIANT | Body mass index | 339 224 | portals.broadinstitute.org/collaboration/giant |
| Consortium . | Phenotype . | Participants . | Web source . |
|---|---|---|---|
| SSGAC | Years of education | 293 723 | www.thessgac.org/data |
| ILCCO | Lung cancer | 27 209 | ilcco.iarc.fr |
| TAG | Smoking | 74 053 | www.med.unc.edu/pgc/results-and-downloads |
| GLGC | Triglycerides and total cholesterol | 188 557 | csg.sph.umich.edu/willer/public/lipids2013/ |
| GIANT | Body mass index | 339 224 | portals.broadinstitute.org/collaboration/giant |
SSGAC, Social Science Genetic Association Consortium; ILCCO, International Lung Cancer Consortium; TAG, Tobacco and Genetics consortium; GLGC, Global Lipids Genetics Consortium (GLGC); GIANT, Genetic Investigation of ANthropometric Traits consortium.
Details of studies included in Mendelian randomization analyses
| Consortium . | Phenotype . | Participants . | Web source . |
|---|---|---|---|
| SSGAC | Years of education | 293 723 | www.thessgac.org/data |
| ILCCO | Lung cancer | 27 209 | ilcco.iarc.fr |
| TAG | Smoking | 74 053 | www.med.unc.edu/pgc/results-and-downloads |
| GLGC | Triglycerides and total cholesterol | 188 557 | csg.sph.umich.edu/willer/public/lipids2013/ |
| GIANT | Body mass index | 339 224 | portals.broadinstitute.org/collaboration/giant |
| Consortium . | Phenotype . | Participants . | Web source . |
|---|---|---|---|
| SSGAC | Years of education | 293 723 | www.thessgac.org/data |
| ILCCO | Lung cancer | 27 209 | ilcco.iarc.fr |
| TAG | Smoking | 74 053 | www.med.unc.edu/pgc/results-and-downloads |
| GLGC | Triglycerides and total cholesterol | 188 557 | csg.sph.umich.edu/willer/public/lipids2013/ |
| GIANT | Body mass index | 339 224 | portals.broadinstitute.org/collaboration/giant |
SSGAC, Social Science Genetic Association Consortium; ILCCO, International Lung Cancer Consortium; TAG, Tobacco and Genetics consortium; GLGC, Global Lipids Genetics Consortium (GLGC); GIANT, Genetic Investigation of ANthropometric Traits consortium.
Power for conventional Mendelian randomization analysis (two-sided α = 0.05)
| Exposure/genetic instrument . | R-squared (of variance in education phenotype) . | Actual n (ILCCO) . | Proportion of cases (ILCCO) . | Observational OR . | n required for 80% power . | Power at actual n . |
|---|---|---|---|---|---|---|
| Education/73 SNPs | 0.0043 | 27 209 | 0.417 | 0.48 | 15 206 | 0.96 |
| Exposure/genetic instrument . | R-squared (of variance in education phenotype) . | Actual n (ILCCO) . | Proportion of cases (ILCCO) . | Observational OR . | n required for 80% power . | Power at actual n . |
|---|---|---|---|---|---|---|
| Education/73 SNPs | 0.0043 | 27 209 | 0.417 | 0.48 | 15 206 | 0.96 |
Power calculation was based on the method on the method developed by Brion et al.
Power for conventional Mendelian randomization analysis (two-sided α = 0.05)
| Exposure/genetic instrument . | R-squared (of variance in education phenotype) . | Actual n (ILCCO) . | Proportion of cases (ILCCO) . | Observational OR . | n required for 80% power . | Power at actual n . |
|---|---|---|---|---|---|---|
| Education/73 SNPs | 0.0043 | 27 209 | 0.417 | 0.48 | 15 206 | 0.96 |
| Exposure/genetic instrument . | R-squared (of variance in education phenotype) . | Actual n (ILCCO) . | Proportion of cases (ILCCO) . | Observational OR . | n required for 80% power . | Power at actual n . |
|---|---|---|---|---|---|---|
| Education/73 SNPs | 0.0043 | 27 209 | 0.417 | 0.48 | 15 206 | 0.96 |
Power calculation was based on the method on the method developed by Brion et al.
GWAS summary data on lung cancer
We retrieved GWAS summary data on lung cancer from the International Lung Cancer Consortium (ILCCO) (totalling 11 348 lung cancer cases and 15 861 controls; European ancestry) (Table 1).20 For each of the 73 SNPs associated with educational attainment, we retrieved summary data (the effects of each of the SNPs on the lung cancer; effect sizes and standard errors) from ILCCO (71 of 73 SNPs). Two particular exposure SNPs (rs11130222, rs12534506) were absent in the ILCCO dataset. Proxy SNPs (LD at r2 >0.8) from the 1000 Genomes Project were used instead of these two SNPs.
Statistical analyses
To determine MR estimates of educational attainment for lung cancer, we used several MR approaches. We conducted a random effects inverse variance weighted (IVW) meta-analysis of the Wald ratio for individual SNPs. We also estimated the effects using other statistical tests (the weighted median and MR-Egger regression methods). We additionally performed the same analyses for two different histological subtypes [adenocarcinoma (LUAD) and squamous cell carcinoma (LUSC)]. The results were presented as odds ratios (OR) and 95% confidence intervals (CI), which provided an estimate of relative risk caused by each standard deviation (SD) (3.6 years) increase in the years of schooling.
The MR method was based on the following three assumptions: (i) the instrumental variables are strongly associated with the educational attainment; (ii) the instrumental variables affect cancer only through their effect on educational attainment and not through any alternative causal pathway; and (iii) the instrumental variables are independent of any confounders.21 To assess the potential violation of these assumptions, we evaluated the directional pleiotropy based on the intercept obtained from the MR-Egger analysis.22 We also performed a leave-one-out analysis in which we sequentially omitted one SNP at a time, to evaluate whether the MR estimate was driven or biased by a single SNP.
To investigate potential mechanisms from education to lung cancer, we applied conventional MR to investigate whether genetic predisposition towards higher educational attainment could be associated with the common risk factors of lung cancer. Tillmann et al. comprehensively performed MR of education and traditional cardiovascular risk factors.14 We selected the potential mediators (apart from smoking) based on existing literature, such as lipids and body mass index (BMI) (Table 1).23,24 Genetic effects on cigarette smoking status (ever vs never smoked; former vs current smoker; age of smoking initiation; cigarettes smoked per day) were obtained from the Tobacco and Genetics consortium (TAG).25 We also assessed the association of triglycerides and total cholesterol based on the GWAS summary data from the Global Lipids Genetics Consortium (GLGC).26 Genetic instruments for BMI were obtained from the Genetic Investigation of ANthropometric Traits consortium (GIANT).27
Analyses were performed using the package TwoSampleMR (version 0.3.4) in R (version 3.4.2).28
Patient involvement
No patients were involved in the study design, recruitment or conduct and the need for ethical approval was waived. No patients were asked to advise on interpretation or writing up of results. There are no plans to disseminate the results of the research to study participants or the relevant patient community.
Results
Causal effect from education to lung cancer
Genetically predicted higher educational attainment was associated with significantly lower odds of lung cancer (Table 3). Using conventional MR analysis, one SD longer education (due to genetic predisposition across 73 SNPs) was associated with a 52% lower risk of lung cancer (OR 0.48, 95% CI 0.34, 0.66; P = 1.02 × 10 − 5; Q statistic 114.94, P = 0.001). Power calculations on MR analysis were performed according to Brion et al.29 With a sample size of 27 209, our sample provided sufficient statistical power (greater than 80%) to detect a causal effect of educational attainment on lung cancer. Supplementary Figure 1, available as Supplementary data at IJE online, shows individual causal estimates of each of the 73 SNPs. As expected, the associations were consistent in sensitivity analyses using weighted median (OR 0.52, 95% CI 0.35, 0.77; P = 1.08 × 10 − 3) and MR-Egger method (OR 0.61, 95% CI 0.12, 3.17; P = 0.56), but provided less precise estimates than with conventional MR (IVW method). Nonetheless, their causal estimates were similar in terms of direction and magnitude, and they were unlikely to occur by chance alone. The MR regression slopes are illustrated in Supplementary Figure 2, available as Supplementary data at IJE online. In a leave-one-out sensitivity analysis, we found that no single SNP was strongly driving the overall effect of education on lung cancer (Supplementary Figure 3, available as Supplementary data at IJE online). There was no evidence for the presence of directional pleiotropy in the MR-Egger regression analysis (Table 4). We found that the P-values for the intercept were large and the estimates adjusted for pleiotropy suggested null effects (intercept β = -0.004, P = 0.775). These results were consistent with the hypothesis that genetic pleiotropy was not driving the result. We observed the similar causal trends in both LUSC and LUAD subgroups (LUSC: 0.41, 95% CI 0.27, 0.62, P = 2.57 × 10 − 5; LUAD: 0.64, 95% CI 0.41, 1.00; P = 4.97 × 10 − 2) (Table 3; Supplementary Figures 5–12, available as Supplementary data at IJE online). Heterogeneity was not observed in the subgroup analyses (LUSC: Q statistic 79.19, P = 0.26; LUAD: Q statistic 91.86, P = 0.06).
Mendelian randomization estimates of the associations between education attainment and risk of lung cancer overall and histological types
| Outcome . | IVW method . | MR-Egger . | Weighted median method . | |||
|---|---|---|---|---|---|---|
| OR (95% CI) . | P-value . | OR (95% CI) . | P-value . | OR (95% CI) . | P-value . | |
| Lung cancer overall | 0.48 (0.34, 0.66) | 1.02e-05* | 0.61 (0.12, 3.17) | 0.56 | 0.52 (0.35, 0.77) | 1.08e-03* |
| Adenocarcinoma | 0.64 (0.41, 1.00) | 4.97e-02* | 0.89 (0.09, 8.48) | 0.92 | 0.59 (0.32, 1.08) | 8.94e-02 |
| Squamous cell carcinoma | 0.41 (0.27, 0.62) | 2.57e-05* | 0.32 (0.04, 2.63) | 0.29 | 0.55 (0.30, 1.01) | 5.24e-02 |
| Outcome . | IVW method . | MR-Egger . | Weighted median method . | |||
|---|---|---|---|---|---|---|
| OR (95% CI) . | P-value . | OR (95% CI) . | P-value . | OR (95% CI) . | P-value . | |
| Lung cancer overall | 0.48 (0.34, 0.66) | 1.02e-05* | 0.61 (0.12, 3.17) | 0.56 | 0.52 (0.35, 0.77) | 1.08e-03* |
| Adenocarcinoma | 0.64 (0.41, 1.00) | 4.97e-02* | 0.89 (0.09, 8.48) | 0.92 | 0.59 (0.32, 1.08) | 8.94e-02 |
| Squamous cell carcinoma | 0.41 (0.27, 0.62) | 2.57e-05* | 0.32 (0.04, 2.63) | 0.29 | 0.55 (0.30, 1.01) | 5.24e-02 |
P-value <0.05.
Mendelian randomization estimates of the associations between education attainment and risk of lung cancer overall and histological types
| Outcome . | IVW method . | MR-Egger . | Weighted median method . | |||
|---|---|---|---|---|---|---|
| OR (95% CI) . | P-value . | OR (95% CI) . | P-value . | OR (95% CI) . | P-value . | |
| Lung cancer overall | 0.48 (0.34, 0.66) | 1.02e-05* | 0.61 (0.12, 3.17) | 0.56 | 0.52 (0.35, 0.77) | 1.08e-03* |
| Adenocarcinoma | 0.64 (0.41, 1.00) | 4.97e-02* | 0.89 (0.09, 8.48) | 0.92 | 0.59 (0.32, 1.08) | 8.94e-02 |
| Squamous cell carcinoma | 0.41 (0.27, 0.62) | 2.57e-05* | 0.32 (0.04, 2.63) | 0.29 | 0.55 (0.30, 1.01) | 5.24e-02 |
| Outcome . | IVW method . | MR-Egger . | Weighted median method . | |||
|---|---|---|---|---|---|---|
| OR (95% CI) . | P-value . | OR (95% CI) . | P-value . | OR (95% CI) . | P-value . | |
| Lung cancer overall | 0.48 (0.34, 0.66) | 1.02e-05* | 0.61 (0.12, 3.17) | 0.56 | 0.52 (0.35, 0.77) | 1.08e-03* |
| Adenocarcinoma | 0.64 (0.41, 1.00) | 4.97e-02* | 0.89 (0.09, 8.48) | 0.92 | 0.59 (0.32, 1.08) | 8.94e-02 |
| Squamous cell carcinoma | 0.41 (0.27, 0.62) | 2.57e-05* | 0.32 (0.04, 2.63) | 0.29 | 0.55 (0.30, 1.01) | 5.24e-02 |
P-value <0.05.
MR-Egger pleiotropy test of the associations between education attainment and risk of lung cancer overall and histological types
| Outcome . | MR-Egger method . | |
|---|---|---|
| Intercept . | P-value . | |
| Lung cancer overall | −0.004 | 0.78 |
| Adenocarcinoma | −0.006 | 0.77 |
| Squamous cell carcinoma | 0.004 | 0.82 |
| Outcome . | MR-Egger method . | |
|---|---|---|
| Intercept . | P-value . | |
| Lung cancer overall | −0.004 | 0.78 |
| Adenocarcinoma | −0.006 | 0.77 |
| Squamous cell carcinoma | 0.004 | 0.82 |
MR-Egger pleiotropy test of the associations between education attainment and risk of lung cancer overall and histological types
| Outcome . | MR-Egger method . | |
|---|---|---|
| Intercept . | P-value . | |
| Lung cancer overall | −0.004 | 0.78 |
| Adenocarcinoma | −0.006 | 0.77 |
| Squamous cell carcinoma | 0.004 | 0.82 |
| Outcome . | MR-Egger method . | |
|---|---|---|
| Intercept . | P-value . | |
| Lung cancer overall | −0.004 | 0.78 |
| Adenocarcinoma | −0.006 | 0.77 |
| Squamous cell carcinoma | 0.004 | 0.82 |
Causal effect from education on potential lung cancer risk factors
To identify potential risk factors that could mediate the association between education and lung cancer, we investigated whether higher educational attainment was associated with several potential cancer risk factors. Table 5 shows that a one SD longer education was associated with 37% lower odds of smoking, 1.97 times higher odds of smoking cessation among smokers, late age of smoking initiation [OR 1.05 (1.01, 1.09) log years], less smoking intensity [OR 0.13 (0.03, 0.57) cigarettes per day], 0.28 lower body mass index, 0.24 mmol/L lower triglycerides and 0.17 mmol/L higher total cholesterol (P <0.05).
Causal effects from 3.6 years of education to common risk factors
| Outcomes . | Causal effect (95% CI) . | P-value . |
|---|---|---|
| Ever vs never smoker | 0.63 (0.53, 0.79) | 7.02e-05* |
| Former vs current smoker | 1.97 (1.47, 2.64) | 6.20e-06* |
| Age of smoking initiation | 1.05 (1.01, 1.09) | 0.016* |
| Cigarettes smoked per day | 0.13 (0.03, 0.57) | 0.007* |
| Body mass index | −0.28 (−0.17, −0.38) | 8.48e-07* |
| Triglycerides | −0.24 (−0.44, −0.04) mmol/L | 0.021* |
| Total cholesterol | 0.17 (0.01, 0.33) mmol/L | 0.043* |
| Outcomes . | Causal effect (95% CI) . | P-value . |
|---|---|---|
| Ever vs never smoker | 0.63 (0.53, 0.79) | 7.02e-05* |
| Former vs current smoker | 1.97 (1.47, 2.64) | 6.20e-06* |
| Age of smoking initiation | 1.05 (1.01, 1.09) | 0.016* |
| Cigarettes smoked per day | 0.13 (0.03, 0.57) | 0.007* |
| Body mass index | −0.28 (−0.17, −0.38) | 8.48e-07* |
| Triglycerides | −0.24 (−0.44, −0.04) mmol/L | 0.021* |
| Total cholesterol | 0.17 (0.01, 0.33) mmol/L | 0.043* |
*P-value <0.05.
Causal effects from 3.6 years of education to common risk factors
| Outcomes . | Causal effect (95% CI) . | P-value . |
|---|---|---|
| Ever vs never smoker | 0.63 (0.53, 0.79) | 7.02e-05* |
| Former vs current smoker | 1.97 (1.47, 2.64) | 6.20e-06* |
| Age of smoking initiation | 1.05 (1.01, 1.09) | 0.016* |
| Cigarettes smoked per day | 0.13 (0.03, 0.57) | 0.007* |
| Body mass index | −0.28 (−0.17, −0.38) | 8.48e-07* |
| Triglycerides | −0.24 (−0.44, −0.04) mmol/L | 0.021* |
| Total cholesterol | 0.17 (0.01, 0.33) mmol/L | 0.043* |
| Outcomes . | Causal effect (95% CI) . | P-value . |
|---|---|---|
| Ever vs never smoker | 0.63 (0.53, 0.79) | 7.02e-05* |
| Former vs current smoker | 1.97 (1.47, 2.64) | 6.20e-06* |
| Age of smoking initiation | 1.05 (1.01, 1.09) | 0.016* |
| Cigarettes smoked per day | 0.13 (0.03, 0.57) | 0.007* |
| Body mass index | −0.28 (−0.17, −0.38) | 8.48e-07* |
| Triglycerides | −0.24 (−0.44, −0.04) mmol/L | 0.021* |
| Total cholesterol | 0.17 (0.01, 0.33) mmol/L | 0.043* |
*P-value <0.05.
Discussion
Higher educational attainment was causally associated with a lower risk of lung cancer in this two-sample MR study. More specifically, 3.6 years of additional education predicted a reduction in the risk of lung cancer by approximately one-half. Furthermore, to investigate the potential mediator mechanisms between education and lung cancer, we observed that genetic predisposition towards higher education attainment was associated with established potential lung cancer risk factors, such as smoking status, obesity and blood lipid profiles.
The results were consistent with previous observational studies showing an inverse association of educational attainment with the risk of lung cancer, and people with lower educational attainment have the highest incidence rates. In fact, education inequalities in lung cancer incidence have long been noted. Mouw and his colleagues identified this negative relationship between education and the risk of lung cancer in a prospective NIH-AARP cohort of 498 455 participants from the USA[relative risk (RR) 3.67, 95% CI 3.25, 4.15].3 Data from the SYNERGY study and the Canadian Census Cohort have also consistently shown that socioeconomic status (SES) remained a risk factor for lung cancer, and the inverse association between lung cancer risk and SES was the strongest for education.5,7 Furthermore, our findings are more in line with a systematic review and meta-analysis of published epidemiological observational studies conducted by Sidorchuk et al.4 Their research has indeed provided strong evidence that the group with the lowest educational attainment has a 61% higher lung cancer incidence compared with the highest education group. However, most of the evidence mentioned above came from conventional observational studies, and few studies have clearly investigated the causality of this association. First, Leuven et al.6 conducted an analysis of natural experiments using the education reform which expanded compulsory schooling during the 1960s in Norway, to establish causal effects of education on cancers. They observed that compulsory school reform could lower the risk of lung cancer. Another source of causality comes from a sibling study. Researchers used the valid information of millions of siblings born in Denmark between 1950 and 1979, and observed that low education was associated with an increased risk of lung cancer. Furthermore, family factors shared by siblings can explain some of the association between education and lung cancer.30 Some recent studies have also begun to use the MR approach, and found that genetic variants for education are strongly causally associated with other disease outcomes, such as myopia and coronary heart disease.14,15 To date, our work is the first MR study to focus on the issue of education in lung cancer risk.
Importantly, the association between education and lung cancer is mediated by many intermediate phenotypes; but the overall mechanisms that mediate this association remain unknown. Previous research suggested that education was a strong predictor of income, occupation and health behaviours, as it is often achieved early in life.31,32 College graduates usually live a healthier lifestyle than those with less education, i.e. less smoking, which is consistent with our results.33 Our further detailed analysis confirmed that higher educational attainment leads to reduced risk of smoking, decreased likelihood of smoking initiation, reduced heaviness of smoking among smokers and greater likelihood of smoking cessation among smokers. Given that smoking is an established cause of lung cancer and is clearly related to education, it is a key intermediate factor on the education-lung cancer pathway.6,34,35 Interestingly, the MR causal effects in the LUSC group were larger than the effect in LUAD. This phenomenon might be associated with the greater smoking rates in the LUSC group. However, the borderline association of education with LUAD (P = 0.0497) in comparison with LUSC might be pointing to a confounding effect between smoking and education. However, given the nature of the data we used, stratified analysis according to smoking status is impossible. We could not disentangle educational attainment from confounders in the present study. Previous studies have shown that the inverse association between education and lung cancer still existed after adjustment for confounders like smoking.3,36 This finding might indicate that smoking could not completely account for all of this association and causal factors other than smoking are present. Notably, our study found that higher educational attainment was associated with lower body mass index and improved blood lipid profiles, which ae potential risk factors for lung cancer.37 However, further studies are still needed to explore the accurate degree of their mediation. Without more extensive mediation analysis, the notion of education acting via any other mechanism than reducing different aspects of smoking is still truly far-fetched. However, a lack of smoking-independent mediators should not detract from the importance of improving education as a means to stifle lung cancer risk, as it already provides a mechanistic interpretation of this relation. Meanwhile, we should keep in mind that there are still possible confounders of education including the SES, the fees for education in different countries and others. High SES populations may achieve better academic attainment, better health lifestyle and improved access to health care with regular medical examination and so on, which may lead to a spurious association between education and lung cancer.38,39 Theoretically, confounding could still occur in MR if genotypes differed consistently between groups differing by confounders, such as low vs high socioeconomic position (SEP). But the MR approach, which is the closest approximation to RCTs, still offers one of the most compelling methods to determine causation if there are confounders, because it can minimize the influence from following confounding factors and provide enough statistical power for causal estimation.
RCTs are widely accepted to answer questions of causality, but at a high cost. Given the consistently long latency between the exposures and the occurrence of diseases, it is not suitable and impossible to investigate all these causal associations through RCTs. Instead, we must collect evidence from many other design studies to reveal these relationships comprehensively. From this perspective, our study can provide evidence from a new type of study design (MR), which also supports an inverse relationship between education and lung cancer. Undoubtedly, cancer prevention is the key to reduce the incidence and mortality of cancers. It is critical for us to identify more modifiable risk factors associated with cancer. Then, we can employ some interventions to reduce the disease burden. For example, past decades have observed the amazing effect of the smoking ban on lung cancer in the USA.1 There is no doubt that backwardness of education will be a seriously ‘expensive’ socioeconomic burden in the long run. Therefore, it is urgent for us to extend educational attainment and initiate popular science education programmes about healthy behaviours, particularly among the developing countries such as China, which is the world’s largest tobacco consumer country and has a high lung cancer burden.
Our study has several important strengths. We conducted the first MR study to investigate the causality between education and lung cancer. Participants were grouped based on their randomly allocated genotype, and this procedure mimics an RCT. The MR design can prevent the reverse causation and potential confounding factors that are commonly present in conventional observational studies. Given the large sample sizes of these studies (over 320 000) and the use of robustly associated instrumental variables (F statistics >10), our study had sufficient power (96%) to detect robust causal effect estimates with high precision.29
As with any MR analysis, several limitations should also be considered in our study. First, all the participants included in our study were of European origin. Thus, whether our findings are generalizable to other populations still needs to be confirmed. Furthermore, the currently used genetic variants explained only a small amount of variance in educational attainment across individuals. However, the variance is sufficient to be useful in social epidemiology studies, which focus on average behaviour in the population rather than individual outcomes. Indeed, our large sample can provide sufficient statistical power (over 80%) to detect the effect of educational attainment on lung cancer. It is impossible to prove the validity of all three MR assumptions. We found no evidence of any horizontal pleiotropic effects in several sensitivity analyses. Given the pleiotropic nature of genetic variants affecting educational attainment, we cannot unequivocally exclude residual pleiotropy that could have reduced the validity of the results. In addition behavioural traits, such as educational attainment, were previously shown to be influenced by assortative mating with a non-zero spouse-correlation.40 Finally, the summary data we used for two-sample MR did not allow for stratified analyses by covariates of interest, such as age and smoking. Researchers need caution in the interpretation of our study. As the original GWAS author, Okbay et al., mentioned, identified education-associated SNPs were not equal to ‘genes for education’.17 Education is primarily determined by social and other environmental factors, and genetic factors are estimated to account for about 20% of the variation. So it is a mistake to infer that genetic effects are independent of environmental factors.
In conclusion, our present Mendelian randomization study provided strong evidence to suggest that higher educational attainment plays a causal role in lowering the risk of lung cancer. Furthermore, more work is needed to elucidate the potential mechanisms that mediate the association between education and lung cancer.
Funding
This work was supported by: the National Key R&D Program of China (2016YFC0905500, 2016YFC0905503), the Science and Technology Program of Guangdong (2017B020227001) and the Science and Technology Program of Guangzhou (201704020072).
Acknowledgements
The authors acknowledge the efforts of the consortia in providing high-quality GWAS resources for researchers. Data and material are available from corresponding the GWAS consortium. The authors would like to thank the editors and the anonymous reviewer for their valuable comments and suggestions to improve the quality of the paper. The authors are also grateful to CDMG (SYSU) and Xiaodong Zhuang for assistance.
Author Contributions
L.Z., H.Z., Y.Z., J.L. and W.F. were responsible for the concept and design of the study, interpretation of data, drafting and writing of the article. The other authors were responsible for interpretation of data and revision of the intellectual content. All authors participated in final approval of the article and agreed to be accountable for all aspects of the work.
Conflict of interest: All authors have no conflict of interested to declare.
References
Author notes
Huaqiang Zhou, Yaxiong Zhang, Jiaqing Liu and Yunpeng Yang authors contributed equally to this work.