Creating and Validating a DNA Methylation-Based Proxy for Interleukin-6

Abstract Background Studies evaluating the relationship between chronic inflammation and cognitive functioning have produced heterogeneous results. A potential reason for this is the variability of inflammatory mediators which could lead to misclassifications of individuals’ persisting levels of inflammation. DNA methylation (DNAm) has shown utility in indexing environmental exposures and could be leveraged to provide proxy signatures of chronic inflammation. Method We conducted an elastic net regression of interleukin-6 (IL-6) in a cohort of 875 older adults (Lothian Birth Cohort 1936; mean age: 70 years) to develop a DNAm-based predictor. The predictor was tested in an independent cohort (Generation Scotland; N = 7028 [417 with measured IL-6], mean age: 51 years). Results A weighted score from 35 CpG sites optimally predicted IL-6 in the independent test set (Generation Scotland; R2 = 4.4%, p = 2.1 × 10−5). In the independent test cohort, both measured IL-6 and the DNAm proxy increased with age (serum IL-6: n = 417, β = 0.02, SE = 0.004, p = 1.3 × 10−7; DNAm IL-6 score: N = 7028, β = 0.02, SE = 0.0009, p < 2 × 10−16). Serum IL-6 did not associate with cognitive ability (n = 417, β = −0.06, SE = 0.05, p = .19); however, an inverse association was identified between the DNAm score and cognitive functioning (N = 7028, β = −0.16, SE = 0.02, pFDR < 2 × 10−16). Conclusions These results suggest methylation-based predictors can be used as proxies for inflammatory markers, potentially allowing for further insight into the relationship between inflammation and pertinent health outcomes.

CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

65
Acute inflammation is a necessary component of the biological response to harmful stimuli. 66 However, there is increasing recognition that a chronic, sub-acute elevation of serum concentrations 67 of pro-inflammatory mediators occurring in older age may underpin the development of many age-68 related diseases, including cancer, atherosclerosis and Alzheimer's disease (1)(2)(3)(4). In spite of this, 69 studies of chronic inflammation and its therapeutics, particularly in cognitive decline and dementia, 70 have produced eminently heterogeneous results (5,6). 71 Population studies investigating the relationship between chronic inflammation and incident 72 morbidities typically rely on a single measurement of circulating inflammatory biomarker levels as 73 proxies for individuals' persistent inflammatory states. Interleukin-6 (IL-6), a pleiotropic pro-74 inflammatory cytokine, is a principal stimulator of an extensive range of acute-phase inflammatory 75 proteins, including C-reactive protein (CRP), serum amyloid A and fibrinogen (7). It has been 76 hypothesised that IL-6 is critical in the transition from the acute, beneficial inflammatory response, 77 to chronic and deleterious inflammation, marking it as a key target for research into the process (8). 78 However, inflammatory cytokines are vulnerable to analytical measurement error; specifically, 79 circulating plasma IL-6 levels can show transient variability due to manifold influences, including the 80 obvious precipitate -infection -but also diet and activity levels (9,10). This is an important 81 consideration when utilising it as a surrogate of chronic inflammation (11,12). 82 Epigenetic mechanisms afford an opportunity to further understand the molecular pathophysiology 83 of inflammation. DNA methylation (DNAm), the most commonly-studied epigenetic mechanism, is 84 involved in the regulation of gene expression and chromosomal stability (13). It has been proposed 85 as a putative intermediary, linking inflammation with sequent disease outcomes, with differential 86 DNAm profiles identified in inflammatory diseases (14)(15)(16). Studies are beginning to focus on 87 resolving the link between inflammation and DNAm (17)(18)(19); however, there remains a relative 88 dearth of understanding about the association between DNAm and pro-inflammatory cytokines, and 89 of how this information could be used to inform disease progression or risk stratification. Recently, 90 DNAm alterations have been leveraged to provide peripheral predictors, or biomarkers, of disease 91 status and as archives of habitual traits. These indices have demonstrated potential utility in 92 epidemiological research contexts (20). 93 In this study, we utilise DNAm data from the Lothian Birth Cohort 1936 to train an optimised 94 predictor of IL-6 using a penalised regression model. This DNAm IL-6 score was subsequently applied 95 to a large out-of-sample cohort (Generation Scotland) to (i) characterise its relationship with 96 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 25, 2020. An elastic net penalised regression was run to derive the DNA methylation-based predictor of IL- 6. 145 This method helps to control for collinearity in predictor variables and is particularly useful when 146 dealing with a large number of predictors. The elastic net penalty is an intermediate of LASSO and 147 ridge regression -the LASSO approach yields a sparse solution with a minimal set of non-zero 148 coefficients from the feature set (it will include one feature from a set of correlated features) and 149 ridge regression shrinks the coefficients for correlated features towards each other (so includes a 150 large number of features). 151 The elastic net model was run using the glmnet library in R (25), with twelve-fold cross-validation. To 152 reduce overfitting, folds were specified by methylation analysis batch, yielding between 63 and 83 153 observations per fold. IL-6 and DNA methylation levels at 428,489 CpG sites were entered as the 154 dependent and independent variables, respectively. In order to apply the relevant penalty, the 155 mixing parameter (α) was set to 0.5. Coefficients for the model with the lambda value (regularisation 156 parameter) corresponding to the minimum mean cross-validated error were extracted. These 157 weights were then applied to the corresponding CpGs in an out-of-sample prediction cohort in order 158 to derive the DNAm-based predictor of IL-6. 159 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 25, 2020. Serum IL-6 was quantified at the University of Glasgow using a high-sensitivity commercial enzyme-184 linked immunosorbent assay (R&D systems, Oxon, UK). IL-6 data was available for 417 individuals. 185 These samples had previously been selected as father/offspring pairs in a telomere length study. The 186 data in the current study was subset to exclude any related individuals. This limited the number of 187 samples from older females in the dataset. To correct for positive skew, IL-6 data was log-188 transformed (natural log) prior to analyses. 189 190 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 25, 2020. ; https://doi.org/10.1101/2020.07.20.20156935 doi: medRxiv preprint Correlates of  Several modifiable and non-modifiable factors have been shown to associate with circulating IL-6 193 levels, including smoking (29,30), alcohol intake (31), body mass index (BMI; (32,33)) and 194 socioeconomic status (34,35). These were assessed in GS as follows: smoking was divided into 195 current smokers, ex-smokers (split into those who had quit within the previous 12 months of their 196 blood sample date, and those who had quit prior to that); alcohol consumption (units) in a typical 197 week; BMI was calculated as the ratio of weight in kg divided by height in m 2 ; and social deprivation 198 was measured using the Scottish Index of Multiple Deprivation (SIMD -www.simd.scot). The SIMD 199 ranks geographical areas in Scotland based on current income, employment, health, education, skills 200 and training, geographic access to services, housing and crime. The SIMD provides a standardised 201 measure of relative deprivation throughout Scotland. To reduce positive skew, a log(units+1) 202 transformation was used for the alcohol consumption data prior to analysis. 203 Due to the inclusion of CpG sites that have previously been related to smoking in the DNAm IL-6 204 score, we additionally generated a methylation-based smoking score using the EpiSmokEr R package 205 (36). EpiSmokEr provides a smoking score derived from methylation levels at 187 CpG sites based on 206 an EWAS of current v. never smokers (37,38). 207

Cognitive tests 208 209
To derive a general cognitive ability phenotype, a principal component analysis was applied to the 210 cognitive tests of executive function (verbal fluency), processing speed (digit-symbol coding) and 211 logical memory. Scores on the first un-rotated principal component were used as a general fluid-212 type cognitive ability score for each individual in GS. This component accounted for 50% of the 213 variance and individual test loadings ranged from 0.63 to 0.77. Full details on the cognitive 214 assessments for the GS cohort have been described previously (39,40). 215 Monocytes and Granulocytes). The relationship between IL-6 and the estimated immune cell 219

Statistical analysis
proportions was further investigated in linear regression models. Here, the difference between the 220 R 2 statistic from a basic model (IL-6 ~ age + sex), and the same model adjusted for the imputed 221 immune cell proportions, was calculated to estimate the proportion of variance accounted for by 222 these predictors. 223 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 25, 2020. ; https://doi.org/10.1101/2020.07.20.20156935 doi: medRxiv preprint To investigate the validity of the DNAm IL-6 score, linear regression models were run with 224 phenotypes that have previously been associated with circulating IL-6 levels: BMI, smoking, alcohol 225 intake, and social deprivation. Measured IL-6 models were adjusted for age, sex and methylation set 226 with the DNAm IL-6 models further adjusted for cell proportions imputed from the methylation data. 227 In a sensitivity analysis, the models were re-run adjusting for methylation-predicted smoking score 228 using the EpiSmokEr output. 229 Linear regression models were used to establish the association with age of both measured IL-6 and 230 the DNAm IL-6 score, and their associations with cognitive ability. Sex, methylation set and 231 estimated cell proportions were included as covariates in both models. The cognitive ability model 232 was further adjusted for age and, as above, was repeated adjusting for methylation smoking score to 233 test for confounding. All analyses were conducted in the R statistical environment version 3.5.0 (41). 234 Correction for multiple testing was applied using the false discovery rate (FDR p<0.05). 235 236 3. RESULTS is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

262
To test the predictive performance, Pearson correlations were calculated between the DNAm IL-6 263 score and measured IL-6 in GS. The correlation coefficient between measured log(IL-6) and the 264 elastic net-derived IL-6 score was 0.23 (variance explained = 5.1%). A plot of serum IL-6 versus this 265 DNAm IL-6 score is presented in Figure 1.  Methylation at one of the probes included in the score (cg05575921, AHRR), has consistently been 275 strongly inversely associated with smoking status (42)(43)(44)(45), with four further probes (cg25250132, 276 cg17412005, cg25250132, cg20059928, cg04928129) showing similar, though weaker, associations 277 (46,47). Because of this, the association between the DNAm IL-6 score and log(IL-6) was re-run in a 278 regression analysis, adjusting for a methylation-based score for smoking. In the unadjusted model 279 the DNAm IL-6 score was significantly positively associated with log(IL-6) (β=0.18, SE=0.04, p=2.9x10 -280 6 ). Adjusting for the methylation smoking score had little effect on this relationship (β=0.17, SE=0.04, 281 p=1.4x10 -5 ), suggesting the association between measured IL-6 and the DNAm IL-6 score is largely 282 not influenced by smoking. A correlation plot of log(IL-6), the DNAm IL-6 score and the methylation-283 predicted smoking score is presented in Supplementary Figure 2. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 25, 2020.

288
The correlation between the DNAm IL-6 score and the estimated leukocyte proportions is presented 289 in Supplementary Figure 1. These ranged from -0.69 (CD4 + T cells) to 0.49 (Granulocytes). Due to the 290 high correlations identified here, we additionally tested the association between the imputed cell 291 proportions with measured IL-6 in both LBC1936 (n=875) and GS (n=417). In LBC1936 the multiple R 2 292 statistic (representing the difference between a model adjusted for age and sex and a model 293 adjusted for age, sex and the six cell proportions) was 0.042 and in GS it was 0.087. In GS, the 294 multiple R 2 for the DNAm IL-6 score was 0.53, suggesting a greater proportion of variance was 295 explained by the cell proportions in the methylation score compared to serum IL-6. 296

297
As chronic inflammation is a hallmark of older age, we examined the association between the 298 measured IL-6 phenotype and the DNAm IL-6 score with chronological age. Plots of these 299 associations are presented in Figure 2. Both serum IL-6 and the DNAm IL-6 score were found to 300 increase as a function of age (serum IL-6: β=0.022, SE=0.004 p=1.3x10 -7 ; DNAm IL-6 score: β=0.016, 301 SE=0.0009, p<2x10 -16 ). No significant effect of sex was identified in the serum IL-6 dynamics; 302 however males were found to have a higher DNAm IL-6 score compared to females (β=0. 16,303 SE=0.02, p=3.4x10 -12 ). Furthermore, inclusion of an interaction term between age and sex indicated 304 the DNAm IL-6 score increased more rapidly over time in males compared to females (β=0.015, 305 SE=0.002, p=5.1x10 -16  is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 25, 2020. ; https://doi.org/10.1101/2020.07. 20.20156935 doi: medRxiv preprint Given the composition of the DNAm IL-6 score, the associations with smoking status were 337 unsurprising. As a sensitivity analysis, we re-ran the models assessing the relationship with BMI and 338 social deprivation adjusting for methylation-predicted smoking to test for potential confounding. 339 Here, the association with BMI was strengthened (β=0.19, SE=0.02, pFDR=8x10 -21 , increase = 46%) 340 whereas the association with social deprivation was attenuated but remained significant (β=-0.09, 341 SE=0.02, pFDR=5x10 -6 , attenuation = 43%). 342 343 344 345 346 347 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
In the subset of the cohort with serum IL-6 data (n=417), IL-6 itself was not significantly associated with cognitive ability (β=-0.06, SE=0.05, p=0.19). Though a larger effect size was evident in the association between cognitive ability and the DNAm IL-6 score compared to IL-6 in this subset, this association was not found to be significant (n=417, β=-0.11, SE=0.07, p=0.11).

DISCUSSION
In the current study we created a poly-epigenetic score of the pro-inflammatory cytokine IL-6 based on results from an elastic net penalised regression. This DNAm IL-6 score was found to increase with age, and associate with IL-6 correlates and general fluid-type cognitive ability.
The DNAm IL-6 score highlights the potential utility of exploiting methylation data to predict biological outcomes. The magnitude of the association between the DNAm IL-6 score and serum IL-6 was modest; however, given its temporal variability, IL-6 itself may be an unreliable signature of chronic inflammation, and if the composite epigenetic score ameliorates this measure, this result would be expected. Integration of information from multiple CpG sites is potentially more likely to provide a more reliable estimate of chronic inflammation, overcoming the periodic instability of the cytokine itself. Work to investigate how the DNAm score alters in acute and chronic inflammatory conditions is warranted to assess its accuracy in this regard. We identified strong correlations between the estimated leukocyte proportions and the DNAm IL-6 score, suggesting a relatively large proportion of the variance in the latter is explained by the former. This is perhaps unsurprising given the interplay between inflammation and the immune system. IL-6 is synthesised by multiple cell types including neutrophils, B cells and macrophages in response to infectious or immunologic stimuli. Further, IL-6 enacts a pleiotropic effect on the immune response, including inducing differentiation of B cells into antibody producing cells (48), CD4 + T cells into subsets of effector T helper cells (49), and CD8 + T cells into cytotoxic T cells (50). However, the multiple R 2 from the models assessing the association with measured IL-6 were small in comparison, suggesting a weaker relationship between the cytokine and the imputed immune cells. This indicates a contrast between serum IL-6 and the DNAm IL-6 score, with the methylation score more directly tracking alterations in cell proportions. Determining the causal pathway of this association is challenging given that both . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 25, 2020. ; https://doi.org/10.1101/2020.07.20.20156935 doi: medRxiv preprint the DNAm IL-6 score and cell proportions were predicted from the methylation data. Assessing the relationship between the DNAm IL-6 score and measured cell counts would be beneficial to establish if the association identified here was replicated.
We identified a rise in the DNAm IL-6 score with age, with males exhibiting an accelerated trajectory compared to females. Though this sex effect was not identified in the serum IL-6 pseudotrajectories, it has been reported previously (51,52), and may have been missed here due to the lack of IL-6 data available from older females in the GS cohort. The DNAm IL-6 score showed associations with BMI, social deprivation and smoking status -all established correlates of IL-6. The robust association identified with smoking status was unsurprising given the inclusion of the smokingassociated probes in the DNAm IL-6 score. In the sensitivity analysis, the association with social deprivation was partly attenuated when adjusting for the methylation-based smoking score; however it remained significant and no confounding was identified in the association with BMI. No association was identified between the DNAm IL-6 score and alcohol intake which has previously been associated with IL-6 levels; however, this effect may be more acute at the time of alcohol consumption, rather than arising as a function of average units consumed (31,53).
Though inflammation has previously been associated with cognitive ability and decline, there remains much heterogeneity in the literature assessing the relationship (54). A potential reason for this is the aforementioned acute nature of inflammatory mediators and differences in assays used to quantify them. We found that the DNAm IL-6 score associated with general cognitive ability in the full GS cohort (n=7,028). This echoes previous findings in the same cohort with a similarlyconstructed DNAm score for CRP, in which no association was identified between serum CRP and cognitive ability, but an inverse association was found between the latter and the DNAm CRP score (55). This further demonstrates the potential value and applicability of proxy methylation-based scores for phenotypes that are either not available, or may in themselves be relatively unreliable. It seems the association with cognitive ability was, at least in part, driven by smoking, which has previously been associated with poorer cognitive health (56). However, though attenuated, the association did remain significant upon adjustment for the methylation-based smoking score, suggesting an independent association with the DNAm IL-6 score. Further investigation of these associations in older cohorts and those with longitudinal measures available would help to clarify the relationship, and establish if the DNAm IL-6 score associates with cognitive decline. This is the first study to develop and characterise a DNAm-based predictor of the pro-inflammatory cytokine IL-6. The utility of this proxy phenotype is particularly evident for studies in which DNA methylation is quantified but for which IL-6 is unavailable. Additionally, utilising a composite . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 25, 2020. ; https://doi.org/10.1101/2020.07.20.20156935 doi: medRxiv preprint methylation score that integrates information from multiple sites, could potentially provide a more reliable estimate of chronic inflammation, allowing for clearer insight when assessing its relation to health outcomes. This study is limited by the relatively small number of serum IL-6 samples available in the GS cohort, particularly from older females. Further, the DNAm IL-6 score was devised in a cohort with a restricted age-range all of whom have survived into the eighth decade. This may have limited the ability of the score to predict into a cohort with a broader age range and, potentially, more variable health. To further validate the DNAm IL-6 score, assessing its relationship to serum IL-6 in cohorts with a greater number of measures of the cytokine, and with repeat measures from the same individual, would be valuable.
In summary, we have illustrated that utilising DNA methylation data can provide a proxy for the proinflammatory cytokine IL-6 that associates with pertinent health, lifestyle, and cognitive, outcomes.
Future studies with DNA methylation data, inflammatory disease measures, and longitudinal followup available will allow for further insight into the utility of this DNAm-based predictor and its ability to act as a measure of chronic inflammation compared to the labile phenotype.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 25, 2020.

Acknowledgements
The authors thank all LBC1936 study participants and research team members who have contributed, and continue to contribute, to ongoing studies. LBC1936 is supported by Age UK . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 25, 2020. ; is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

Supplementary
The copyright holder for this preprint this version posted July 25, 2020. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 25, 2020. ; https://doi.org/10.1101/2020.07.20.20156935 doi: medRxiv preprint Supplementary Figure 2. Pearson correlations between measured log(IL-6), the DNAm IL-6 score and the DNA methylation-based smoking score in Generation Scotland.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 25, 2020. ; https://doi.org/10.1101/2020.07.20.20156935 doi: medRxiv preprint