Abstract

Exposure to environmental and lifestyle factors, such as cigarette smoking, affect the epigenome and might mediate risk for diseases and cancers. We have performed a genome-wide DNA methylation study to determine the effect of smoke and snuff (smokeless tobacco) on DNA methylation. A total of 95 sites were differentially methylated [false discovery rate (FDR) q-values < 0.05] in smokers and a subset of the differentially methylated loci were also differentially expressed in smokers. We found no sites, neither any biological functions nor molecular processes enriched for smoke-less tobacco-related differential DNA methylation. This suggests that methylation changes are not caused by the basic components of the tobacco but from its burnt products. Instead, we see a clear enrichment (FDR q-value < 0.05) for genes, including CPOX, CDKN1A and PTK2, involved in response to arsenic-containing substance, which agrees with smoke containing small amounts of arsenic. A large number of biological functions and molecular processes with links to disease conditions are also enriched (FDR q-value < 0.05) for smoke-related DNA methylation changes. These include ‘insulin receptor binding’, and ‘negative regulation of glucose import’ which are associated with diabetes, ‘positive regulation of interleukin-6-mediated signaling pathway’, ‘regulation of T-helper 2 cell differentiation’, ‘positive regulation of interleukin-13 production’ which are associated with the immune system and ‘sertoli cell fate commitment’ which is important for male fertility. Since type 2 diabetes, repressed immune system and infertility have previously been associated with smoking, our results suggest that this might be mediated by DNA methylation changes.

INTRODUCTION

Environmental and lifestyle factors, such as smoking, have been shown to influence the epigenome, which might affect the response to diseases and medications (1). Smoking has generally posed as a risk factor for diseases and different forms of cancers affecting many different tissue types (2–4). Recent studies have shown that smoking can influence the transcriptome (5) and alters the DNA methylation pattern (6–8). Differential methylation is also seen in newborn due to maternal smoking during pregnancy (9). Dysregulated gene expression might mediate health consequences associated with smoking (5). DNA methylation influence gene expression by affecting factor binding and chromatin structure (10). Therefore, understanding the methylome disparity between smokers and non-smokers can give us an inside into the molecular mechanisms behind the development of diseases where smoking is an etiologic agent.

The introduction of the high-throughput methods for measuring DNA methylation (11) allows for large-scale epigenetic studies by being cost effective, increasing quantification precision (12,13). The Illumina Infinium HumanMethylation450K BeadChip (11) interrogates more than 470 000 CpG sites, covering about 96% of CpG islands (CGIs), 99% of the RefSeq genes including promoter regions, 5′UTRs, gene bodies and 3′UTRs, as well as intergenic regions. Our study aims to investigate the effect of smoke on DNA methylation and subsequently gene expression at a genome-wide level. We also investigate the effect of snuff (a powdered tobacco available in small bags placed in between the cheek and gum) on DNA methylation pattern, aiming to determine whether the smoking-related methylation change and associated pathologies are caused by the basic components of the tobacco or from its burnt products produced during the smoking process.

RESULTS

The DNA methylation status was measured in a total of 432 samples. The average methylation level per sample (excluding positive and negative controls) was 0.524 (ranging from 0.510 to 0.538). The average methylation levels for the two negative controls were 0.277 and 0.299 and for the positive controls 0.913 and 0.915. The average (across all sites) difference in methylation between two individuals was 0.0374 (ranging from 0.0137 to 0.6376). The lowest average pairwise difference between samples was seen for the positive controls (average difference = 0.0137) followed by the sample duplicates (average difference = 0.0146 and 0.0169), and then the pairwise individual duplicates sampled with 3 years in between (0.0177 and 0.0181), which indicated a good reproducibility of the experiment. The negative controls were not as similar with an average difference of 0.0352. After removing all controls and individuals with missing smoking and snuffing information, a total of 421 individuals remained for the analyses. Of these, 43 (10.2%) were current smokers and 74 (17.6%) current snuffers. The fraction of smokers was similar between sexes (9.9% for females and 10.6% for males, P-value = 0.07). However, snuffing was more common among males (32.8 compared with 4.0%, P-value = 5.2 × 10−23). There was no difference in age distribution between males and female, or between smokers and non-smokers (Table 1). However, snuffers had lower age compared with non-snuffers (P-value = 0.0079). Cell composition (first–third quartiles) in the samples was estimated to 7.9% CD8 T-cells (4.6–10.3), 14.2% CD4 T-cells (10.7–17.1), 8.5% NK-cells (4.1–12.0), 5.3% B-cells (3.1–7.0), 8.5% monocytes (6.9–10.0%) and 56.0% granulocytes (50.6–62.1%).

Table 1.

Age distribution in males compared with females, smokers compared with non-smokers and snuffers compared with non-snuffers

 N Min 1st quartile Median Mean 3rd quartile Max P-valuea 
Females 223 14 22 43 43.53 61.75 94  
Males 198 15 24 44 44.21 60.75 86 0.2484 
Non-smokers 378 14 21 43 43.12 62 94  
Smokers 43 20 43 49 50.23 59.5 80 0.1515 
Non-snuffers 347 14 22 44 44.55 64 94  
Snuffers 74 15 24 44 40.55 53 83 0.007851 
 N Min 1st quartile Median Mean 3rd quartile Max P-valuea 
Females 223 14 22 43 43.53 61.75 94  
Males 198 15 24 44 44.21 60.75 86 0.2484 
Non-smokers 378 14 21 43 43.12 62 94  
Smokers 43 20 43 49 50.23 59.5 80 0.1515 
Non-snuffers 347 14 22 44 44.55 64 94  
Snuffers 74 15 24 44 40.55 53 83 0.007851 

aWilcoxon rank-sum test P-value.

DNA methylation in relation to smoking and snuffing

A total of 476 366 sites passed the quality control (QC) of which 95 sites (Table 2 and Supplementary Material, Table S1) in 66 chromosomal regions were associated with smoking status [false discovery rate (FDR) q-value < 0.05]. Out of these regions, 55 could be mapped directly to a RefSeq gene. A majority of the sites (60%) overlapped with top sites in the study by Zeilinger et al. (7), and the effect of smoking was in the same direction for all these sites (Supplementary Material, Table S1), with all but three sites being hypomethylated. While both significant sites in the MYO1G are located close to the transcription-starting site (TSS) and are hypermethylated, one of the CNTNAP2 sites is located close to the TSS and hypomethylated, whereas the other is located more than 1 MB downstream and is hypermethylated. Most sites appear to have a dose-dependent effect (Fig. 1). The gene with largest number of differentially methylated sites is AHRR. All nine significant sites cluster in the third and fourth introns of the AHRR gene (Supplementary Material, Fig. S1), and most of them are located in predicted enhancer regions in GM12878 (B-lymphocyte, lymphoblastoid) cells. The AHRR gene has previously been shown to be differentially methylated in smokers (7–9,14–16), and the most significant site (cg05575921, P-value = 7.19 × 10−70) in our study was also found to be the most significant in previous studies (7,9,16). There was also a cluster of seven significant sites at locus 2q37.1 (Supplementary Material, Fig. S2) and a cluster of four significant sites located at locus 6p21.33 (Supplementary Material, Fig. S3). Even though the makers at the 2q37.1 and 6p21.33 locus are not directly located adjacent to a gene, they are located in regions occupied by transcription factor binding and regions that have been predicted to be regulatory (enhancer versus poised promoter). Another site (cg04885881) was found to be located at a none-coding DNA region in the genome sequence, which is ∼3000 bp upstream the SRM gene and ∼3500 bp downstream the EXOSC10 gene. The region is predicted to be a strong enhancer region in GM12878 cells, which may indicate a possible enhancing function of this region to either the SRM gene or the EXOSC10 gene. For most other regions, the most significant sites also appeared to be located in putative regulatory regions. For examples, the sites cg25013095, cg19859270, cg00297362, cg00310412 and cg19572487 were found to be located in the promoter regions of the GPRSS, GPR15, FAM98B, SEMA7A and RARA, respectively, while the site cg02657160 was found to be located in a predicted poised promoter region in the CPOX.

Table 2.

The DNA methylation regions that were most differentially methylated in smokers (P-value < 1 × 10–6)

Chr Most significant site Position No. of sitesb Locus Effect of smoking P-value smokingc Effect of snuffing P-value snuffing Zeilinger et al. (7)a 
Effectd P-value 
cg04885881 11123118  −0.022 1.51E−09 −0.004 0.168 −7.41% 1.35E−28 
cg25189904 68299493 GNG12 −0.027 3.13E−08 0.001 0.871 −8.19% 1.71E−39 
cg09935388 92947588 GFI1 −0.038 6.9E−23 −0.005 0.130 −15.31% 3.27E−24 
cg11231349 162050656 NOS1AP −0.020 7.95E−07 −0.003 0.384 −1.92% 3.94E−09 
cg20295214 206226794 AVPR1B −0.016 1.39E−12 −0.001 0.763 −2.98% 1.13E−09 
cg11314684 244006288 AKT3 −0.012 6.08E−08 −0.001 0.590   
cg23079012 8343710  −0.007 9.81E−13 −0.001 0.286 −1.10% 8.29E−24 
cg21566642 233284661  2q37.1 (ALPPL2, APLP, ALPI) −0.068 2.57E−35 −0.005 0.260 −16.70% 6.90E−138 
cg26718213e 241976080 SNED1 0.033 7.76E−07 0.003 0.622   
cg19859270 98251294 GPR15 −0.010 1.19E−14 −0.001 0.262 −1.31% 9.00E−24 
cg02657160 98311063 CPOX −0.011 1.00E−09 −0.001 0.526 −1.48% 1.67E−09 
cg05575921 373378 AHRR −0.074 7.19E−70 −0.005 0.069 −24.40% 2.54E−182 
cg14580211 150161299 C5orf62 −0.020 1.54E−10 −0.002 0.484 −3.91% 2.19E−12 
cg06126421 30720080 6p21.33 −0.049 5.51E−32 −0.008 0.015 −17.05% 1.72E−75 
cg22132788 45002486 MYO1G 0.023 6.85E−08 0.002 0.602 6.68% 1.99E−34 
cg01692968 108005349  −0.020 2.60E−07 −0.002 0.586 −1.42% 7.30E−11 
10 cg03450842 80834947 ZMIZ1 −0.014 2.41E−09 −0.002 0.279   
11 cg08967584 3535099  −0.051 8.90E−07 0.008 0.383   
11 cg21611682 68138269 LRP5 −0.013 4.60E−08 0.001 0.758 −5.24% 1.09E−18 
11 cg01901332 75031054 ARRB1 −0.017 1.24E−07 0.000 0.898 −5.07% 1.70E−09 
11 cg11660018 86510915 PRSS23 −0.017 8.99E−09 −0.001 0.670 −3.87% 1.29E−22 
13 cg23848624 100622582 ZIC5 0.007 6.77E−07 0.000 0.671   
14 cg13976502 74227875 C14orf43 −0.012 9.28E−07 −0.003 0.096 −1.01% 4.67E−16 
14 cg05284742 93552128 ITPK1 −0.015 8.11E−09 −0.002 0.351 −3.76% 3.57E−09 
15 cg00297362 38750358 FAM98B −0.048 1.47E−10 −0.008 0.179   
15 cg25292882e 39431467  −0.010 1.54E−07 −0.002 0.276   
15 cg00310412 74724918 SEMA7A −0.013 5.77E−10 −0.003 0.148 −1.97% 2.34E−13 
17 cg19572487 38476024 RARA −0.024 2.92E−10 −0.005 0.114 −10.02% 9.37E−40 
19 cg03636183 17000585 F2RL3 −0.039 1.71E−37 0.000 0.892 −14.74% 2.42E−80 
22 cg23441595 37273987 NCF4 −0.013 3.15E−07 −0.004 0.074   
cg17294479 102470404 BEX4 −0.026 7.11E−10 0.000 0.990   
Chr Most significant site Position No. of sitesb Locus Effect of smoking P-value smokingc Effect of snuffing P-value snuffing Zeilinger et al. (7)a 
Effectd P-value 
cg04885881 11123118  −0.022 1.51E−09 −0.004 0.168 −7.41% 1.35E−28 
cg25189904 68299493 GNG12 −0.027 3.13E−08 0.001 0.871 −8.19% 1.71E−39 
cg09935388 92947588 GFI1 −0.038 6.9E−23 −0.005 0.130 −15.31% 3.27E−24 
cg11231349 162050656 NOS1AP −0.020 7.95E−07 −0.003 0.384 −1.92% 3.94E−09 
cg20295214 206226794 AVPR1B −0.016 1.39E−12 −0.001 0.763 −2.98% 1.13E−09 
cg11314684 244006288 AKT3 −0.012 6.08E−08 −0.001 0.590   
cg23079012 8343710  −0.007 9.81E−13 −0.001 0.286 −1.10% 8.29E−24 
cg21566642 233284661  2q37.1 (ALPPL2, APLP, ALPI) −0.068 2.57E−35 −0.005 0.260 −16.70% 6.90E−138 
cg26718213e 241976080 SNED1 0.033 7.76E−07 0.003 0.622   
cg19859270 98251294 GPR15 −0.010 1.19E−14 −0.001 0.262 −1.31% 9.00E−24 
cg02657160 98311063 CPOX −0.011 1.00E−09 −0.001 0.526 −1.48% 1.67E−09 
cg05575921 373378 AHRR −0.074 7.19E−70 −0.005 0.069 −24.40% 2.54E−182 
cg14580211 150161299 C5orf62 −0.020 1.54E−10 −0.002 0.484 −3.91% 2.19E−12 
cg06126421 30720080 6p21.33 −0.049 5.51E−32 −0.008 0.015 −17.05% 1.72E−75 
cg22132788 45002486 MYO1G 0.023 6.85E−08 0.002 0.602 6.68% 1.99E−34 
cg01692968 108005349  −0.020 2.60E−07 −0.002 0.586 −1.42% 7.30E−11 
10 cg03450842 80834947 ZMIZ1 −0.014 2.41E−09 −0.002 0.279   
11 cg08967584 3535099  −0.051 8.90E−07 0.008 0.383   
11 cg21611682 68138269 LRP5 −0.013 4.60E−08 0.001 0.758 −5.24% 1.09E−18 
11 cg01901332 75031054 ARRB1 −0.017 1.24E−07 0.000 0.898 −5.07% 1.70E−09 
11 cg11660018 86510915 PRSS23 −0.017 8.99E−09 −0.001 0.670 −3.87% 1.29E−22 
13 cg23848624 100622582 ZIC5 0.007 6.77E−07 0.000 0.671   
14 cg13976502 74227875 C14orf43 −0.012 9.28E−07 −0.003 0.096 −1.01% 4.67E−16 
14 cg05284742 93552128 ITPK1 −0.015 8.11E−09 −0.002 0.351 −3.76% 3.57E−09 
15 cg00297362 38750358 FAM98B −0.048 1.47E−10 −0.008 0.179   
15 cg25292882e 39431467  −0.010 1.54E−07 −0.002 0.276   
15 cg00310412 74724918 SEMA7A −0.013 5.77E−10 −0.003 0.148 −1.97% 2.34E−13 
17 cg19572487 38476024 RARA −0.024 2.92E−10 −0.005 0.114 −10.02% 9.37E−40 
19 cg03636183 17000585 F2RL3 −0.039 1.71E−37 0.000 0.892 −14.74% 2.42E−80 
22 cg23441595 37273987 NCF4 −0.013 3.15E−07 −0.004 0.074   
cg17294479 102470404 BEX4 −0.026 7.11E−10 0.000 0.990   

The table shows the effect of smoking on DNA methylation and associated P-value, and the P-value for sites being differentially methylated in individuals using snuff in our study. The table also shows the P-values and the effect for the sites that replicated in the study by Zeilinger et al. (7).

aOnly sites with P ≤ 1E−07 in the Zeilinger study is included.

bThe total number of significant (FDR q-value < 0.05) DNA methylation sites at each locus.

cThe effect represents the average increase in β-value between smokers and non-smokers, and between four groups smokers with different number of cigarettes smoked per week.

dThis effect is the median β-value methylation difference of current versus never smokers in %.

eAmong the regions that did not overlap with Zeilinger et al. (7), two additional regions, represented by cg25292882 and cg26718213, overlaps with Shenker et al. (8).

Figure 1.

Differences in DNA methylation (beta-values) dependent on smoke. The numbers on the x-axis represent (0) non-smokers, (1) low-exposure: 1–4 packages of cigarettes per week, (2) medium exposure: 5–14 packages per week, and (3) high-exposure: more than 14 packages a week. The beanplot shows the mean of each distribution (black horizontal line), the overall mean (black dashed line), the distribution (kernel density estimate) and the methylation values for each observation (white horizontal line).

Figure 1.

Differences in DNA methylation (beta-values) dependent on smoke. The numbers on the x-axis represent (0) non-smokers, (1) low-exposure: 1–4 packages of cigarettes per week, (2) medium exposure: 5–14 packages per week, and (3) high-exposure: more than 14 packages a week. The beanplot shows the mean of each distribution (black horizontal line), the overall mean (black dashed line), the distribution (kernel density estimate) and the methylation values for each observation (white horizontal line).

For all the 476 366 sites tested for DNA methylation status in relation to snuff use, none of the site met the threshold for significance (FDR q-value < 0.05), with the site cg17757848 having the lowest nominal P-value of 9.92 × 10−07. Neither, any of the smoke-associated differentially methylated sites were significantly associated with snuffing (Table 2, Supplementary Material, Table S1). Adjusting the analyses for variation in cell composition (fraction of CD8 T-cells, CD4 T-cells, NK-cells, B-cells, monocytes and granulocytes) did not change the results markedly (Supplementary Material, Table S2). However, for 15 of the 95 smoke-associated sites, the cell-adjusted P-value did not meet the threshold for significance (FDR q-value < 0.05).

Gene expression analysis

Using a FDR q-value of 0.05 as the threshold resulted in 95 sites being differentially methylated between smokers and non-smokers. These 95 DNA methylation sites mapped to 55 genes. In the transcription data set, 56 transcription probes of which 34 had passed the QC were mapped to the same set of genes (Table 3). A total of seven probes, representing six different genes (LRRN3, NCF4, GFI1, SPN, MTSS1 and LIPA), differed between smokers and non-smokers in the transcription data (Bonferroni-adjusted P-value < 0.05). The result shows that the expression of LIPA, NCF4 and LRRN3 increased with smoking while SPN, GFI1 and MTSS1 decreased with smoking.

Table 3.

The 34 expression probes that were detected in the SAFHS that correspond to the genes that were differentially methylated (FDR P-value < 0.05) in our study

Gene ProbeID Regression coefficient SE P-value 
ANPEP GI_4502094-S 0.16 0.067 0.017 
ARRB1 GI_10880133-A −0.10 0.068 0.122 
BCL6 GI_21040323-A 0.06 0.067 0.337 
C14orf4 GI_38327635-S −0.04 0.068 0.588 
C14orf43 GI_44890061-S 0.19 0.067 0.0054 
CAPN3 GI_27765077-A −0.11 0.068 0.114 
CNTNAP2 GI_21071040-S 0.00 0.068 0.967 
ETV6 GI_41872473-S 0.05 0.068 0.431 
F2RL3 GI_4503638-S 0.07 0.068 0.315 
GFI1 GI_4885266-S −0.31 0.064 1.03E−06 
HOXA7 GI_24497555-S −0.12 0.068 0.080 
ITPK1 GI_41393564-S −0.11 0.068 0.097 
LIPA GI_4557720-S 0.24 0.067 0.00034 
LRRN3 GI_37059785-S 0.52 0.055 1.37E−20 
MGAT4A GI_6912501-S 0.01 0.068 0.875 
MPP4 GI_14780901-S 0.12 0.068 0.065 
MTSS1 GI_30023852-S −0.29 0.065 1.41E−05 
MYO1G GI_37538413-S −0.13 0.067 0.050 
NCF4 GI_7382494-A 0.34 0.068 3.76E−07 
NCF4 GI_7382494-I 0.33 0.068 1.03E−06 
NCF4 GI_7382492-I 0.13 0.068 0.065 
NFE2L2 GI_20149575-S 0.17 0.068 0.014 
PLEC1 GI_4505876-A 0.03 0.068 0.671 
PLEC1 GI_41322918-I −0.01 0.068 0.880 
PLEC1 GI_41322907-I −0.01 0.068 0.915 
PPP1R15A GI_9790902-S 0.03 0.068 0.707 
PTPRN2 GI_19743913-A −0.02 0.067 0.774 
RARA GI_4506418-S 0.11 0.068 0.113 
SDHA GI_4759079-S −0.05 0.068 0.497 
SH3BP4 GI_7657561-S 0.06 0.067 0.356 
SPN GI_40254469-S −0.30 0.065 5.86E−06 
VPS52 GI_18379336-I 0.08 0.068 0.244 
VPS52 GI_18379339-A −0.01 0.067 0.829 
ZIC5 GI_22547202-S 0.12 0.068 0.068 
Gene ProbeID Regression coefficient SE P-value 
ANPEP GI_4502094-S 0.16 0.067 0.017 
ARRB1 GI_10880133-A −0.10 0.068 0.122 
BCL6 GI_21040323-A 0.06 0.067 0.337 
C14orf4 GI_38327635-S −0.04 0.068 0.588 
C14orf43 GI_44890061-S 0.19 0.067 0.0054 
CAPN3 GI_27765077-A −0.11 0.068 0.114 
CNTNAP2 GI_21071040-S 0.00 0.068 0.967 
ETV6 GI_41872473-S 0.05 0.068 0.431 
F2RL3 GI_4503638-S 0.07 0.068 0.315 
GFI1 GI_4885266-S −0.31 0.064 1.03E−06 
HOXA7 GI_24497555-S −0.12 0.068 0.080 
ITPK1 GI_41393564-S −0.11 0.068 0.097 
LIPA GI_4557720-S 0.24 0.067 0.00034 
LRRN3 GI_37059785-S 0.52 0.055 1.37E−20 
MGAT4A GI_6912501-S 0.01 0.068 0.875 
MPP4 GI_14780901-S 0.12 0.068 0.065 
MTSS1 GI_30023852-S −0.29 0.065 1.41E−05 
MYO1G GI_37538413-S −0.13 0.067 0.050 
NCF4 GI_7382494-A 0.34 0.068 3.76E−07 
NCF4 GI_7382494-I 0.33 0.068 1.03E−06 
NCF4 GI_7382492-I 0.13 0.068 0.065 
NFE2L2 GI_20149575-S 0.17 0.068 0.014 
PLEC1 GI_4505876-A 0.03 0.068 0.671 
PLEC1 GI_41322918-I −0.01 0.068 0.880 
PLEC1 GI_41322907-I −0.01 0.068 0.915 
PPP1R15A GI_9790902-S 0.03 0.068 0.707 
PTPRN2 GI_19743913-A −0.02 0.067 0.774 
RARA GI_4506418-S 0.11 0.068 0.113 
SDHA GI_4759079-S −0.05 0.068 0.497 
SH3BP4 GI_7657561-S 0.06 0.067 0.356 
SPN GI_40254469-S −0.30 0.065 5.86E−06 
VPS52 GI_18379336-I 0.08 0.068 0.244 
VPS52 GI_18379339-A −0.01 0.067 0.829 
ZIC5 GI_22547202-S 0.12 0.068 0.068 

Names in bold indicates the probes were differentially expressed among smokers after adjusting for multiple testing (Bonferroni adjustment).

Enrichment analysis

A total of 20 551 gene symbols were entered into the enrichment analyses of which 20 379 were recognized by the software. After removing 19 duplicated genes, 17 242 genes were matched to a gene ontology (GO) term. A total of 20 enriched molecular functions (Supplementary Material, Table S3) met the threshold of significance (FDR q-value < 0.05). The most enriched molecular functions were ‘insulin receptor binding’ with an enrichment of 5.68 (P-value = 1.15 × 10−4), with other protein binding GOs also being among the significant functions. As many as 161 enriched biological processes (Supplementary Material, Table S4) were identified (FDR q-value < 0.05). Among the most enriched processes, we found ‘positive regulation of interleukin-6-mediated signaling pathway’ with an enrichment of 5753.00 (one out of three genes in this GO were differentially methylated), but we also find other immune response-related processes among the most enriched processes including ‘regulation of T-helper 2 cell differentiation’ and ‘positive regulation of interleukin-13 production’. Other processes of interest among the most enriched are ‘response to arsenic-containing substance’, ‘Sertoli cell fate commitment’, ‘cell aging’ and ‘negative regulation of glucose import’.

DISCUSSION

A better understanding of the methylation changes that arise from smoking and subsequent gene expression changes could result in early therapeutic interventions and follow-up of related pathologies. Many studies have shown that smoking is associated with a differed DNA methylation, with some studies being able to demonstrate that the differed methylation is accompanied by change in gene expression (14). In this study, we successfully identified 95 DNA methylation sites that were significantly differentially methylated due to smoking. We were able to replicate most of the significant sites using the study by Zeilinger et al. (7). These replicated sites showed the same direction of methylation change with smoking in both studies, increasing the power of our findings. The study by Zeilinger et al. (7) is the largest study so far that has utilized the Illumina 450K BeadChip to investigate DNA methylation due to smoking. Their study had a larger sample size than our study, but not all of their significant sites overlapped with ours. We therefore identified novel sites to be differentially methylated in smokers. The novel sites identified can either be false positives or just above the threshold of significance in previous studies. The latter is probably the fact for at least some of the sites. For example in GPR55, and AHRR, that have been identified preciously, we identified novel sites to be differentially methylated. In addition, three of the sites that did not replicate in the data by Zeilinger et al. (7) overlapped with the study by Shenker et al. (8). We also saw an overlap with studies using other tissues to estimate the effect of smoking on DNA methylation. Nine of our sites, located in four different genes (AHRR, GFI1, MYO1G and CNTNAP2), overlapped with sites being differentially expressed in newborn cord blood due to maternal smoking (9).

Among our significant sites, seven are located at the 2q37.1 locus, which has previously been shown to be associated with breast cancer risk (8). This supports the hypothesis that cigarette smoking may be a breast cancer risk factor (17), possibly mediated by changes in DNA methylation. The sites at the 2q37.1 locus are located in a region predicted to be poised promoter, which is occupied by a number of transcription factors. This region is located close to a cluster of alkaline phosphatase genes (ALPPL2, APLP and ALPI), of which at least ALPPL2 has been associated with cancer (pancreatic carcinoma) in a recent study (18). Similarly CNTNAP2, one of the genes for which sites became hypermethylated in smokers, has also been shown to be hypermethylated in cancers (19). We also identified a cluster of nine sites located in the AHRR gene. The AHRR gene has been shown to contain differentially methylated CpG sites in many previous studies and the methylation level of one of the site (cg05575921) has been shown to be associated with the expression of the gene (14). AHRR encodes the aryl hydrocarbon receptor repressor protein, which acts on the aryl hydrocarbon receptor (AHR) pathway in a negative feedback mechanism. The AHR pathway has been suggested to play an important role in the metabolism of benzopyrene and dioxin-like compounds and their conversion into various carcinogenic metabolites (8), emphasizing the importance of this pathway in carcinogenesis through diverse mechanisms. We also identified two differentially methylated sites in GFI1, which encodes a nuclear zinc finger protein that function as a transcriptional repressor. The nuclear zinc finger protein is not directly involved in the metabolic pathway of the smoking components. However it is involved in developmental processes such as hematopoesis and oncogenesis, more especially the protection of hematopeitic stem cells against stress-induced apoptosis (20).

In addition to investigating single DNA methylation sites, we performed enrichment analyses. One of the most enriched molecular functions was ‘insulin receptor binding’. The differentially methylated genes in this GO included: PTPN11, GRB10, ENPP1, IGF1R, SHC1, IRS1, DOK2, SORBS1 and PIK3R1. This molecular function is of special interest since smoking (active and passive) has been suggested as a risk factor for type 2 diabetes (21,22). It is possible that smoking, mediated by DNA methylation changes, may alter gene expression of genes involved in insulin receptor binding, which can result in insulin resistance and subsequently increased risk of type 2 diabetes. This may be one of the underlying molecular mechanism by which smoking increase the risk of type 2 diabetes, but further studies are needed to establish this. Similarly, one of the most enriched biological processes was ‘negative regulation of glucose import’, which included partly overlapping differentially methylated genes: GRB10, TNF, ENPP1, PRKCA and GSK3A. This might be another link between smoking and diabetes, mediated by DNA methylation.

It is also known that there is impairment in wound healing and increase incidence of microbial infections by smoking (23), that smoking has dysfunctional effect on the immune system and that there is a significant loss of antibody respond and T cell proliferation in animals treated with nicotine (24). Therefore, finding immune response-related biological functions such as ‘positive regulation of interleukin-6-mediated signaling pathway’, ‘regulation of T-helper 2 cell differentiation’ and ‘positive regulation of interleukin-13 production’ being enriched for differentially methylated genes is of special interest. Cigarette smoking is also known to have harmful effect on male fertility by negatively influencing sperm morphology, motility and concentration (25). It has been shown that nicotine decreases testosterone levels in experimental rats, with high doses leading to a decrease in follicle stimulating hormone (FSH) and increase luteinizing hormone levels (26). FSH acts on the seminiferous tubules to initiate and maintain spermatogenesis through the activation of the sertoli cells. This therefore implies that the decrease in FSH levels will mean decrease in the activation of the sertoli cells, leading to decrease in spermatogenesis. Finding ‘Sertoli cell fate commitment’ to be a significantly enriched GO term relates the fact that smoking is implicated in male infertility, through FSH that affects the sertoli cells fate.

In contrast to the large number of sites that were differentially methylated in smokers, no sites were differentially methylated in individuals using snuff. Cigarette smoke is known to comprise of over 5000 chemical components (27) with ∼98 hazardous smoke components (28). There are over 450 ingredients (flavoring and additives) added to the cigarette tobacco (29) and the burnt effect of the tobacco during the smoking process changes the chemical composition of the tobacco in cigarette smoke compared with snuff. This difference in chemical compositions is the most likely explanation to the differential methylation seen due to cigarette smoking but not due to snuffing. It is known that smoke contains traces of arsenic, which agrees with a strong enrichment of genes, which are known to respond to arsenic-containing substance. Out of 15 genes within this process, 3 were among our top candidates for being differentially methylated in smokers, including CPOX (coproporphyrinogen oxidase), CDKN1A (cyclin-dependent kinase inhibitor 1a) and PTK2 (protein tyrosine kinase 2).

There are some possible limitations in our study. We have used genomic DNA from whole blood to perform genome-wide DNA methylation analysis. Whole blood contains an inter-individual variability in the number of different kinds of DNA-containing cells. It has been demonstrated (9,14) that disparity in methylation can arise as a result of variability in cell composition when blood is used in DNA methylation analysis studies. However, this variability in DNA methylation due to cell composition is probably small and insignificant when compared with the differential methylation pattern that can be observed between smokers and non-smokers. This agrees with that we did not see a major difference in the results when adjusting for cell composition. DNA methylation may also differ in systemic samples such as blood compared with a target tissue sample. The components of cigarette smoking are first exposed to the lungs, which through the pulmonary alveoli gets into the blood and are then distributed by the blood to other body tissues. Therefore, the genomic DNA of peripheral blood components can serve as a useful surrogate to study the DNA methylation pattern in the lungs and subsequently other tissues since there is a high chance of similar methylation dynamics between the genomic DNA of blood components and that of other tissues. However, it is important to consider that the use of a mixed population of white blood cells for DNA methylation analyses will result in decreased power to identify smoke (or snuff) associated differential methylation. The introduction of false positives could only be possible if smoking dramatically change the number of different cell types.

Another limitation is the difference in sampling material and demographics between studies. The transcription data published by Charlesworth et al. (5) were produced from blood lymphocytes only, not a mixed population of white blood cells as the DNA methylation data sets. It is reasonable to believe that the overlap between genes with differential methylation and genes with differential expression in smokers would have been larger if the same cell types had been used. However, it is unlikely that the use of different sampling materials should introduce false positive results. Similarly, demographic differences between studied might have influenced the results. While the Zeilinger cohort and ours are of European descent with more similar lifestyle, the Charlesworth study cohort is of Mexican descent. Together with the fact that all studies differed in age and fraction of smokers, this might have lowered the overlap between studies. In agreement with most studies on gene expression, we have used FDR adjustment for multiple testing throughout the article. The fact that methylation status at nearby sites is not independent supports the idea of using FDR. However, compared with a more stringent Bonferroni adjustment, the number of false positive findings might be higher and the borderline significant sites should be interpreted with care.

In summary, this study showed that tobacco smoking, but not smokeless tobacco, is associated with a differential DNA methylation pattern. Even though most of our differentially methylated sites overlap with previous studies on DNA methylation, only a limited number of these genes were differentially expressed in lymphocytes due to smoking. By performing enrichment analyses, we pinpoint a number of molecular functions and biological processes that appear to be more prone to be differentially methylated in smokers. These results indicate that the association between smoking and pathologic effects, such as diabetes and cancer, and infertility might be mediated through DNA methylation.

MATERIALS AND METHODS

Study population

This study is part of the Northern Sweden population Health Study (NSPHS) that was initiated in 2006 to provide health survey of the population in the parishes of Karesuando and Soppero, county of Norrbotten and to study the medical consequences of lifestyle and genetics. Out of the 3000 inhabitants in this parish, 1069 participated in the study and met eligibility criteria in terms of age (>15 years). Blood samples collected from individuals were immediately frozen and stored at −70°C and genomic DNA for methylation analysis was extracted from the previously frozen peripheral blood l using the phenol:chloroform protocol. The NSPHS study was approved by the local ethics committee at the University of Uppsala (Regionala Etikprövningsnämnden, Uppsala, Dnr 2005:325) in compliance with the Declaration of Helsinki. All participants gave their written informed consent to the study including the examination of environmental and genetic causes of disease. In case the participant was not full age, a legal guardian signed additionally. The procedure that was used to obtain informed consent and the respective informed consent form has been recently discussed in the light of present ethical guidelines (30). More detailed information about the NSPHS has been published previously (31–33).

Determination of DNA methylation status and QC

Genomic DNA from 432 samples was bisulfite converted using the EZ-DNA methylation kit (ZYMO research) according to the manufacturer's recommendations. Genome-wide DNA methylation status of 476 366 CpG sites was assessed using the HumanMethylation450k BeadChip (Illumina, San Diego, USA) according to the standard protocol. The raw image data generated by the BeadArray Reader were analyzed using the Illumina GenomeStudio 2009, employing the recommended setting from Illumina and the HumanMethylation450_15017482_v.1.2.bpm manifest file provided by Illumina. Background extraction and normalization was performed using the -normalization_controls and -Subtract background options in GenomeStudio. The methylation level at each site was calculated as average beta values (β), in which β equals the intensity of methylated allele (M)/intensity of unmethylated allele (U) + intensity of methylated allele (M). The QC protocol used was in accordance with Illumina's recommendations (individual probe cell rate >0.98, marker detection P-value ≤ 0.01). Surrogate variable analyses (SVA), implemented in the R package sva (34), were used to estimate eventual batch or plate effects. A total of eight control samples were included: two negative controls, two positive controls, two duplicated samples that originates from the same blood samples and two duplicated samples that come from the same individuals but blood samples that were taken with 3 years in between.

Estimation of cell fractions from the methylation data

To ensure that the results are not influenced by variation in cell fraction between samples, we estimated the fraction of CD8T-, CD4T-, NK- and B-cells, monocytes and granulocytes in the samples. This done using the R package minfi (35) that allows for estimating cell fractions in Illumina 450K methylation data from whole blood. This method is based on the methylation data published for flow-sorted cells (36) and algorithms derived from the study by Houseman et al. (37).

Statistics analyses of DNA methylation data

Statistics analyses were performed using the stats package of R version 2.15.0 (38). Since the NSPHS is a population-based study including related individuals, special care has been taken to avoid bias in the results. All statistic analyses will be performed using the R package GenABEL (39), which was developed to enable for statistic analyses of genetic data in related individuals. DNA samples have been genotyped using Illumina Infinium HumanHap300v2 of Illumina Omni Express SNP bead microarrays as described previously (40). Pairwise kinship matrixes were calculated using the genotyped SNPs (N = 180 212) that overlapped between the two microarrays using the ibs function implemented in GenABEL. This function computes a matrix of average IBS (identical by state) for a group of people. The kinship matrix was used to adjust for pedigree structure when analyzing the association between smoking and methylation using a maximum likelihood approach implemented in GenABEL. Each DNA methylation site was analyzed by fitting the β values for the site using a polygenic model with age, sex, exposure to smoke and exposure to snuff as covariates. The exposure to smoke and snuff was coded as a numerical value with 0 being no exposure, 1 being low exposure, 2 medium exposure and 3 high exposure. For smoking, low, medium and high exposure were defined as 1–4, 5–14 and >14 packages of cigarettes per day, respectively. For snuffing, low, medium and high exposure were defined as 1–2, 3–4 and >4 packages of snuff per day, respectively. No significant surrogate variables were identified, and therefore no plate/batch effects were included in the analyses. The FDR adjustment for multiple testing was used to pinpoint differentially methylated sites and estimated using the fdrtool function implemented in the fdrtool R library (41).

Replication analysis

We used the data published by the Zeilinger et al. (7) for replicating our significant sites. That study also used the Illumina 450K BeadChip with an even larger sample size (N = 1793) and genomic DNA from whole blood. This study cohort is of European descent and ranges in age from 32 to 84. The replication was done through a site-by-site comparison of all significant sites in our study to those from the data of the significant sites of current (N = 262) versus never (N = 749) smokers in the discovery panel of the Zeilinger et al. study.

Functional analyses of DNA methylation sites

Annotation of DNA methylation sites were provided by Illumina (www.illumina.com, HumanMethylation450_15017482_v.1.1.csv, accessed: 1 September 2012) as has been described previously (11). The significant sites were imported, as custom tracks, into the UCSC genome browser (Human Feb. 2009 GRCh37/hg19 Assembly, data retrieved 10 May 2013). The location of the associated SNPs was compared with the location of (i) known human protein-coding and non-protein-coding genes taken from the NCBI RNA reference sequences collection (RefSeq genes—last updated 25 April 2012), (ii) regions where transcription factors bind to DNA as assayed by ChIP-seq (Transcription Factor ChIP-seq from ENCODE (USF-1)—last updated: 25 February 2012), and (iii) chromatin state segmentation for nine human cell types (42,43) using the ENCODE Mar 2012 Freeze.

Enrichment analysis

Enrichment analysis was carried out using the GO enrichment analysis and visualization web-based bioinformatics tool Gorilla (http://cbl-gorilla.cs.technion.ac.il) (44). The analyses in Gorilla were performed using a ranked list of genes. A total of 20 551 genes were represented by at least one DNA methylation sites. These genes were ranked according to the smallest P-value among the sites that mapped to that gene. This analysis will aid to identify GO terms (molecular functions or biological processes) that are significantly enriched for genes, which are differentially methylated in smokers.

Effect of smoking on gene expression

To understand the link between effect of smoking on DNA methylation and subsequently gene expression, we analyzed our data and tried to compare our results to the expression data published by Charlesworth et al. (5). These data were produced as part of the San Antonio Family Heart Study (SAFHS) (45) where transcriptional profiling was carried out on the serum lymphocytes of 1240 Mexican American (297 smokers), with mean age of 39.3 years (standard deviation 16.7 years) (5). These analyses included 47 289 unique transcripts, of which 20 413 were detected to be expressed in lymphocytes. Normalized expression values along with information on sex and age were downloaded from the ArrayExpress website http://www.ebi.ac.uk/arrayexpress/ (accession number E-TABM-305). We used FDR q-value < 0.05 as threshold for selecting DNA methylation sites to compare to the gene expression in smokers and non-smokers from the SAFHA.

SUPPLEMENTARY MATERIAL

Supplementary Material is available at HMG online.

FUNDING

This work was supported by the Swedish Medical Research Council (2011-2354), the Swedish Society for Medical Research (SSMF) and Göran Gustafssons Stiftelse. Methylation profiling was performed by the SNP & SEQ Technology Platform in Uppsala, which is supported by Uppsala University, Uppsala University Hospital, Science for Life Laboratory (SciLifeLab)—Uppsala and the Swedish Research Council (Contracts 80576801 and 70374401).

ACKNOWLEDGEMENTS

We would like to thank Ulf Gyllensten, who was the lead scientist in the collection of biological samples and health survey, district nurse Svea Hennix for data collection and Inger Jonasson for logistics and coordination of the health survey. We would also like to thank all the community participants for their interest and willingness to contribute to the study. The computations were performed on resources provided by SNIC through Uppsala Multidisciplinary Center for Advanced Computational Science (UPPMAX) under Project b2012153. We would also like to thank the reviewers for valuable comments.

Conflict of Interest statement. None declared.

REFERENCES

1
Fraga
M.F.
Ballestar
E.
Paz
M.F.
Ropero
S.
Setien
F.
Ballestar
M.L.
Heine-Suner
D.
Cigudosa
J.C.
Urioste
M.
Benitez
J.
, et al.  . 
Epigenetic differences arise during the lifetime of monozygotic twins
Proc. Natl. Acad. Sci. USA
 , 
2005
, vol. 
102
 (pg. 
10604
-
10609
)
2
Fagerstrom
K.
The epidemiology of smoking: health consequences and benefits of cessation
Drugs
 , 
2002
, vol. 
62
 
Suppl. 2
(pg. 
1
-
9
)
3
Lu
L.-M.
Zavitz
C.C.J.
Chen
B.
Kianpour
S.
Wan
Y.
Stampfli
M.R.
Cigarette smoke impairs NK cell-dependent tumor immune surveillance
J. Immunol.
 , 
2007
, vol. 
178
 (pg. 
936
-
943
)
4
Willemse
B.W.M.
ten Hacken
N.H.T.
Rutgers
B.
Lesman-Leegte
I.G.A.T.
Timens
W.
Postma
D.S.
Smoking cessation improves both direct and indirect airway hyperresponsiveness in COPD
Eur. Respir. J.
 , 
2004
, vol. 
24
 (pg. 
391
-
396
)
5
Charlesworth
J.C.
Curran
J.E.
Johnson
M.P.
Goring
H.H.
Dyer
T.D.
Diego
V.P.
Kent
J.W.
Jr
Mahaney
M.C.
Almasy
L.
MacCluer
J.W.
, et al.  . 
Transcriptomic epidemiology of smoking: the effect of smoking on gene expression in lymphocytes
BMC Med. Genomics
 , 
2010
, vol. 
3
 pg. 
29
 
6
Ostrow
K.L.
Michailidi
C.
Guerrero-Preston
R.
Hoque
M.O.
Greenberg
A.
Rom
W.
Sidransky
D.
Cigarette smoke induces methylation of the tumor suppressor gene NISCH
Epigenetics
 , 
2013
, vol. 
8
  
[Epub ahead of print]
7
Zeilinger
S.
Kuhnel
B.
Klopp
N.
Baurecht
H.
Kleinschmidt
A.
Gieger
C.
Weidinger
S.
Lattka
E.
Adamski
J.
Peters
A.
, et al.  . 
Tobacco smoking leads to extensive genome-wide changes in DNA methylation
PLoS ONE
 , 
2013
, vol. 
8
 pg. 
e63812
 
8
Shenker
N.S.
Polidoro
S.
van Veldhoven
K.
Sacerdote
C.
Ricceri
F.
Birrell
M.A.
Belvisi
M.G.
Brown
R.
Vineis
P.
Flanagan
J.M.
Epigenome-wide association study in the European Prospective Investigation into Cancer and Nutrition (EPIC-Turin) identifies novel genetic loci associated with smoking
Hum. Mol. Genet.
 , 
2013
, vol. 
22
 (pg. 
843
-
851
)
9
Joubert
B.R.
Haberg
S.E.
Nilsen
R.M.
Wang
X.
Vollset
S.E.
Murphy
S.K.
Huang
Z.
Hoyo
C.
Midttun
O.
Cupul-Uicab
L.A.
, et al.  . 
450 K epigenome-wide scan identifies differential DNA methylation in newborns related to maternal smoking during pregnancy
Environ. Health Perspect.
 , 
2012
, vol. 
120
 (pg. 
1425
-
1431
)
10
Novik
K.L.
Nimmrich
I.
Genc
B.
Maier
S.
Piepenbrock
C.
Olek
A.
Beck
S.
Epigenomics: genome-wide study of methylation phenomena
Curr. Issues Mol. Biol.
 , 
2002
, vol. 
4
 (pg. 
111
-
128
)
11
Bibikova
M.
Barnes
B.
Tsan
C.
Ho
V.
Klotzle
B.
Le
J.M.
Delano
D.
Zhang
L.
Schroth
G.P.
Gunderson
K.L.
, et al.  . 
High density DNA methylation array with single CpG site resolution
Genomics
 , 
2011
, vol. 
98
 (pg. 
288
-
295
)
12
Dedeurwaerder
S.
Defrance
M.
Calonne
E.
Denis
H.
Sotiriou
C.
Fuks
F.
Evaluation of the infinium methylation 450 K technology
Epigenomics
 , 
2011
, vol. 
3
 (pg. 
771
-
784
)
13
Sandoval
J.
Heyn
H.
Moran
S.
Serra-Musach
J.
Pujana
M.A.
Bibikova
M.
Esteller
M.
Validation of a DNA methylation microarray for 450,000 CpG sites in the human genome
Epigenetics
 , 
2011
, vol. 
6
 (pg. 
692
-
702
)
14
Monick
M.M.
Beach
S.R.H.
Plume
J.
Sears
R.
Gerrard
M.
Brody
G.H.
Philibert
R.A.
Coordinated changes in AHRR methylation in lymphoblasts and pulmonary macrophages from smokers
Am. J. Med. Genet. B Neuropsychiatr. Genet.
 , 
2012
, vol. 
159B
 (pg. 
141
-
151
)
15
Philibert
R.A.
Beach
S.R.H.
Brody
G.H.
Demethylation of the aryl hydrocarbon receptor repressor as a biomarker for nascent smokers
Epigenetics
 , 
2012
, vol. 
7
 (pg. 
1331
-
1338
)
16
Sun
Y.V.
Smith
A.K.
Conneely
K.N.
Chang
Q.
Li
W.
Lazarus
A.
Smith
J.A.
Almli
L.M.
Binder
E.B.
Klengel
T.
, et al.  . 
Epigenomic association analysis identifies smoking-related DNA methylation sites in African Americans
Hum. Genet.
 , 
2013
, vol. 
132
 (pg. 
1027
-
1037
)
17
Cox
D.G.
Dostal
L.
Hunter
D.J.
Le Marchand
L.
Hoover
R.
Ziegler
R.G.
Thun
M.J.
N-acetyltransferase 2 polymorphisms, tobacco smoking, and breast cancer risk in the breast and prostate cancer cohort consortium
Am. J. Epidemiol.
 , 
2011
, vol. 
174
 (pg. 
1316
-
1322
)
18
Dua
P.
Kang
H.S.
Hong
S.-M.
Tsao
M.-S.
Kim
S.
Lee
D.-k.
Alkaline phosphatase ALPPL-2 is a novel pancreatic carcinoma-associated protein
Cancer Res.
 , 
2013
, vol. 
73
 (pg. 
1934
-
1945
)
19
Omura
N.
Li
C.P.
Li
A.
Hong
S.M.
Walter
K.
Jimeno
A.
Hidalgo
M.
Goggins
M.
Genome-wide profiling of methylated promoters in pancreatic adenocarcinoma
Cancer Biol. Ther.
 , 
2008
, vol. 
7
 (pg. 
1146
-
1156
)
20
Khandanpour
C.
Sharif-Askari
E.
Vassen
L.
Gaudreau
M.-C.
Zhu
J.
Paul
W.E.
Okayama
T.
Kosan
C.
Moroy
T.
Evidence that growth factor independence 1b regulates dormancy and peripheral blood mobilization of hematopoietic stem cells
Blood
 , 
2010
, vol. 
116
 (pg. 
5149
-
5161
)
21
Wang
Y.
Ji
J.
Liu
Y.-J.
Deng
X.
He
Q.-Q.
Passive smoking and risk of type 2 diabetes: a meta-analysis of prospective cohort studies
PLoS ONE
 , 
2013
, vol. 
8
 pg. 
e69915
 
22
Willi
C.
Bodenmann
P.
Ghali
W.A.
Faris
P.D.
Cornuz
J.
Active smoking and the risk of type 2 diabetes: a systematic review and meta-analysis
JAMA
 , 
2007
, vol. 
298
 (pg. 
2654
-
2664
)
23
Gold
R.
Epidemiology of bacterial meningitis
Infect. Dis. Clin. North. Am.
 , 
1999
, vol. 
13
 (pg. 
515
-
525
)
24
Sopori
M.
Effects of cigarette smoke on the immune system
Nat. Rev. Immunol.
 , 
2002
, vol. 
2
 (pg. 
372
-
377
)
25
Meri
Z.B.
Irshid
I.B.
Migdadi
M.
Irshid
A.B.
Mhanna
S.A.
Does cigarette smoking affect seminal fluid parameters? A comparative study
Oman Med. J.
 , 
2013
, vol. 
28
 (pg. 
12
-
15
)
26
Oyeyipo
I.P.
Raji
Y.
Bolarinwa
A.F.
Nicotine alters male reproductive hormones in male albino rats: the role of cessation
J. Hum. Reprod. Sci.
 , 
2013
, vol. 
6
 (pg. 
40
-
44
)
27
Borgerding
M.
Klus
H.
Analysis of complex mixtures—cigarette smoke
Exp. Toxicol. Pathol.
 , 
2005
, vol. 
57
 (pg. 
43
-
73
)
28
Talhout
R.
Schulz
T.
Florek
E.
van Benthem
J.
Wester
P.
Opperhuizen
A.
Hazardous compounds in tobacco smoke
Int. J. Environ. Res. Public Health.
 , 
2011
, vol. 
8
 (pg. 
613
-
628
)
29
Baker
R.R.
Pereira da Silva
J.R.
Smith
G.
The effect of tobacco ingredients on smoke chemistry. Part I: flavourings and additives
Food Chem. Toxicol.
 , 
2004
, vol. 
42
 (pg. 
S3
-
37
)
30
Mascalzoni
D.
Janssens
A.C.
Stewart
A.
Pramstaller
P.
Gyllensten
U.
Rudan
I.
van Duijn
C.M.
Wilson
J.F.
Campbell
H.
Quillan
R.M.
Comparison of participant information and informed consent forms of five European studies in genetic isolated populations
Eur. J. Hum. Genet
 , 
2010
, vol. 
18
 (pg. 
296
-
302
)
31
Johansson
A.
Enroth
S.
Gyllensten
U.
Continuous aging of the human DNA methylome throughout the human lifespan
PLoS ONE
 , 
2013
, vol. 
8
 pg. 
e67378
 
32
Johansson
A.
Marroni
F.
Hayward
C.
Franklin
C.S.
Kirichenko
A.V.
Jonasson
I.
Hicks
A.A.
Vitart
V.
Isaacs
A.
Axenovich
T.
, et al.  . 
Common variants in the JAZF1 gene associated with height identified by linkage and genome-wide association analysis
Hum. Mol. Genet.
 , 
2009
, vol. 
18
 (pg. 
373
-
380
)
33
Igl
W.
Johansson
A.
Gyllensten
U.
The Northern Swedish Population Health Study (NSPHS)—a paradigmatic study in a rural population combining community health and basic research
Rural Remote Health
 , 
2010
, vol. 
10
 pg. 
1363
 
34
Leek
J.T.
Johnson
E.Q.
Parker
H.S.
Jaffe
A.E.
Storey
J.D.
2013
 
sva: Surrogate Variable Analysis, Vol. R package version 3.8.0
35
Hansen
K.D.
Aryee
M.
2013
 
minfi: Analyze Illumina's 450k methylation arrays, Vol. R package version 1.8.7
36
Reinius
L.E.
Acevedo
N.
Joerink
M.
Pershagen
G.
Dahlen
S.E.
Greco
D.
Soderhall
C.
Scheynius
A.
Kere
J.
Differential DNA methylation in purified human blood cells: implications for cell lineage and studies on disease susceptibility
PLoS ONE
 , 
2012
, vol. 
7
 pg. 
e41361
 
37
Houseman
E.A.
Accomando
W.P.
Koestler
D.C.
Christensen
B.C.
Marsit
C.J.
Nelson
H.H.
Wiencke
J.K.
Kelsey
K.T.
DNA methylation arrays as surrogate measures of cell mixture distribution
BMC Bioinformatics
 , 
2012
, vol. 
13
 pg. 
86
 
38
R Development Core Team
R: A Language and Environment for Statistical Computing
 , 
2012
R Foundation for Statistical Computing, Vienna, Austria
 
ISBN: 3-900051-07-0
39
Aulchenko
Y.S.
Ripke
S.
Isaacs
A.
van Duijn
C.M.
GenABEL: an R library for genome-wide association analysis
Bioinformatics
 , 
2007
, vol. 
23
 (pg. 
1294
-
1296
)
40
Johansson
A.
Enroth
S.
Palmblad
M.
Deelder
A.M.
Bergquist
J.
Gyllensten
U.
Identification of genetic variants influencing the human plasma proteome
Proc. Natl Acad. Sci. USA
 , 
2013
, vol. 
110
 (pg. 
4673
-
4678
)
41
Strimmer
K.
fdrtool: a versatile R package for estimating local and tail area-based false discovery rates
Bioinformatics
 , 
2008
, vol. 
24
 (pg. 
1461
-
1462
)
42
Ernst
J.
Kellis
M.
Discovery and characterization of chromatin states for systematic annotation of the human genome
Nat. Biotechnol.
 , 
2010
, vol. 
28
 (pg. 
817
-
825
)
43
Ernst
J.
Kheradpour
P.
Mikkelsen
T.S.
Shoresh
N.
Ward
L.D.
Epstein
C.B.
Zhang
X.
Wang
L.
Issner
R.
Coyne
M.
, et al.  . 
Mapping and analysis of chromatin state dynamics in nine human cell types
Nature
 , 
2011
, vol. 
473
 (pg. 
43
-
49
)
44
Eden
E.
Navon
R.
Steinfeld
I.
Lipson
D.
Yakhini
Z.
GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists
BMC Bioinformatics
 , 
2009
, vol. 
10
 pg. 
48
 
45
Mitchell
B.D.
Kammerer
C.M.
Blangero
J.
Mahaney
M.C.
Rainwater
D.L.
Dyke
B.
Hixson
J.E.
Henkel
R.D.
Sharp
R.M.
Comuzzie
A.G.
, et al.  . 
Genetic and environmental contributions to cardiovascular risk factors in Mexican Americans. The San Antonio Family Heart Study
Circulation
 , 
1996
, vol. 
94
 (pg. 
2159
-
2170
)