Maternal pre-pregnancy BMI and gestational weight gain, offspring DNA methylation and later offspring adiposity: findings from the Avon Longitudinal Study of Parents and Children

Background: Evidence suggests that in utero exposure to undernutrition and overnutrition might affect adiposity in later life. Epigenetic modification is suggested as a plausible mediating mechanism. Methods: We used multivariable linear regression and a negative control design to examine offspring epigenome-wide DNA methylation in relation to maternal and offspring adiposity in 1018 participants. Results: Compared with neonatal offspring of normal weight mothers, 28 and 1621 CpG sites were differentially methylated in offspring of obese and underweight mothers, respectively [false discovert rate (FDR)-corrected P-value < 0.05), with no overlap in the sites that maternal obesity and underweight relate to. A positive association, where higher methylation is associated with a body mass index (BMI) outside the normal range, was seen at 78.6% of the sites associated with obesity and 87.9% of the sites associated with underweight. Associations of maternal obesity with offspring methylation were stronger than associations of paternal obesity, supporting an intrauterine mechanism. There were no consistent associations of gestational weight gain with offspring DNA methylation. In general, sites that were hypermethylated in association with maternal obesity or hypomethylated in association with maternal underweight tended to be positively associated with offspring adiposity, and sites hypomethylated in association with maternal obesity or hypermethylated in association with maternal underweight tended to be inversely associated with offspring adiposity. Conclusions: Our data suggest that both maternal obesity and, to a larger degree, underweight affect the neonatal epigenome via an intrauterine mechanism, but weight gain during pregnancy has little effect. We found some evidence that associations of maternal underweight with lower offspring adiposity and maternal obesity with greater offspring adiposity may be mediated via increased DNA methylation.


Details of the collection and generation of DNA methylation data
Cord or peripheral blood (whole blood or buffy coats) were collected according to standard procedures, spun and frozen at -80˚C. DNA methylation analysis and data pre-processing were performed at the University of Bristol as part of the ARIES project (ariesepigenomics.org.uk). Following extraction, DNA was bisulfite converted using the Zymo EZ DNA MethylationTM kit (Zymo, Irvine, CA). Following conversion, the genome-wide methylation status of over 485,000 CpG sites was measured using the Illumina Infinium® HumanMethylation450k BeadChip assay according to the standard protocol. The arrays were scanned using an Illumina iScan and initial quality review was assessed using GenomeStudio (version 2011.1). The level of methylation is expressed as a "Beta" value (β-value), ranging from 0 (no cytosine methylation) to 1 (complete cytosine methylation). Samples from all timepoints in ARIES were distributed across slides using a semi-random approach (sampling criteria were in place to ensure that all time-points were represented on each array) to minimize the possibility of confounding by batch effects. In addition, during the data generation process a wide range of batch variables were recorded in a purpose-built laboratory information management system (LIMS). The LIMS also reported QC metrics from the standard control probes on the HumanMethylation450k BeadChip for each sample back to the laboratory. Of all measured batch variables, bisulfite conversion batch (96-well plate) was identified as by far the most influential on the ARIES HumanMethylation450k data ( Figure S1), Slide level batch adjustment is less useful as each slide will only contain a small number of samples for each time point, additionally allocation to bisulfite conversion batch is more likely to contain systematic bias because samples were added to the batch according to lab priorities and convenience. Samples failing quality control (average probe detection p-value ≥ 0.01) were repeated. As an additional quality control step genotype probes on the HumanMethylation450k were compared between samples from the same individual and against SNP-chip data to identify and remove any sample mismatches.
Data were pre-processed in R (version 3.0.1) with the WateRmelon package 1 according to the subset quantile normalization approach described by Touleimat & Tost 2 in an attempt to reduce the nonbiological differences between probes. Sites on sex chromosomes were excluded to reduce complexity due to sex-specific differences and X-chromosome inactivation by DNA methylation in females. We excluded probes identified by Naeem et al. 3 that map to multiple genomic locations, contain known repeat regions, contain known INDELs, contain SNPs, or are affected by other unknown/multiple factors. Finally, we also excluded probes showing a detection P-value >0.05 for >5% samples. This left 284 972, 285 929 and 285 656 probes for analysis in neonatal cord blood, peripheral blood in childhood and peripheral blood in adolescence, respectively.

EWAS regression model optimisation
The U-shaped association between maternal and offspring adiposity (where adiposity is greater in offspring of underweight or obese mothers) led us to hypothesise that any association between maternal adiposity and offspring adiposity might also be non-linear, so we tested the assumption of linear relationships between GWG/pre-pregnancy BMI and offspring methylation by performing linear regression using the top 1000 most variable probes, firstly with the exposure untransformed and secondly with a quadratic term for the exposure. We used a likelihood ratio test to interpret whether or not the model fit was improved by the inclusion of the quadratic term. However, likelihood ratio tests showed that the assumption of linear relationships between pre-pregnancy BMI/GWG and cord blood methylation is valid (for 93.6%-94.9% of probes the model fit was not improved (likelihood ratio test P-value >0.05) by the inclusion of a quadratic term for pre-pregnancy BMI/GWG).
We also tested the hypothesis that there is an interaction between continuous pre-pregnancy BMI and continuous stage-specific or total GWG that should be considered in model design. Again, linear regression was performed on the top 1000 most variable probes, firstly with no interaction and secondly with an interaction (stage-specific or total GWG x pre-pregnancy BMI). Likelihood ratio tests were used to assess whether or not the model fit was improved by the inclusion of the interaction. Likelihood ratio tests also showed that for models where GWG is the exposure, model fit was not improved by including pre-pregnancy BMI as an interaction with GWG rather than as a confounder (the P-value was >0.05 for 89.0 to 94.2% of 1000 probes tested).
It has been suggested that logit-transforming β-values to "M-values" gives a better approximation of a normal distribution and is therefore more statistically valid. However, interpretation of β-values (on a scale of 0 (completely unmethylated) to 1 (completely methylated)) is more intuitive. 4 We performed linear regression of methylation as β-values or M-values on pre-pregnancy BMI using the top 1000 most variable probes. A similar number of differentially methylated CpG sites (P-value <0.05) were identified using M-values (51) and β-values (56). Of these, 52 sites were identified using both methods (67.2% agreement). Therefore, we consider that β-values are appropriate for our analyses.

Longitudinal model
Longitudinal methylation data were extracted for each of these CpG sites. A multilevel model 1,2 including a random intercept and a linear regression spline term to allow for flexibility was fitted to each of these sites sequentially. For example, for sites found when comparing obese and normal weight mothers: where = 1, . . .1018 indexes the children in ARIES, = 1,2,3 indexes the measurement occasion and + = if > 0 or 0 otherwise. 1 gives the average difference in methylation of offspring of normal weight and obese mothers; 2 gives the average change in methylation from birth to adolescence; 3 tells us whether there is any change to this trend (i.e. 2 ) from childhood to adolescence; 4 tells us whether there is a difference in methylation change between obese motherand normal weight mother-offspring; and 5 tells us whether the offspring of obese and normal weight mothers have a different change to the trend (i.e. 2 ) of methylation change from birth to childhood. From these we can calculate the change in methylation from 0-7 for children of normal weight mothers ( 2 ), obese mothers ( 2 + 4 ) and the change from 7-17 for children of normal weight mothers ( 2 + 3 ) and obese mothers ( 2 + 3 + 4 + 5 ). To test whether there is a difference in methylation change between 7 and 17 we test whether 4 + 5 is different from zero, and present a p-value for this in our results.
For each CpG site, we used a multilevel model, adjusting for confounders (offspring sex, maternal age, parity, smoking status and occupation) and the first 20 independent surrogate variable components (which account for cellular heterogeneity between the cord blood and whole blood cells). To correct for multiple testing, across the CpG sites and two parameters of (difference in change during childhood/adolescence) interest we used a cut-off of 0.05/(2*number of CpG sites), which for the obese comparison was 8.9x10 -4 and for the underweight comparison was 1.5x10 -5 .      Table S6. Percentage of sites where the direction of association is the same for the "maternal adiposity-offspring methylation" relationship and the "offspring methylation-offspring adiposity" relationship.  Figure S1. A heatmap to show the effect estimates of associations between different batch variables (BCD_plate (bisulfite-conversion batch); Chip_ID Chip_row), cell type proportions (B cell, CD4-T cells, CD8-T cells, granulocytes, monocytes and natural killer (NK) cells, sex and principal components for cord blood DNA methylation. BCD_plate was identified as the major batch variable in ARIES. Slide level batch (Chip_ID) adjustment is less useful because samples from all time points in ARIES were distributed across slides using a semi-random approach (sampling criteria were in place to ensure that all time points were represented on each array), therefore, (for any time point) each slide (chip) will contain only a small number of samples. Allocation to BCD_plate is more likely to contain systematic bias, as samples were added to the batch according to laboratory priorities and convenience. Figure S2. Quantile-quantile (Q-Q) plots with genomic inflation values (Lambdas) for each EWAS.