We have identified a novel molecular phenotype that defines a subgroup of newborns who have highly disrupted epigenomes. We profiled DNA methylation in cord blood of 114 children selected from the lowest and highest quintiles of the birth weight distribution (irrespective of their mode of conception) at 96 CpG sites in genes we have found previously to be related to birth weight or growth and metabolism. We identified those individuals in each group who differed from the mean of the distribution by the greatest magnitude at each site and for the largest number of sites. Such ‘outlier’ individuals differ substantially from the rest of the group in having highly disrupted methylation levels at many CpG sites. We find that children from the lowest quintile of the birth weight distribution have a significantly greater number of disrupted CpGs than children from the highest quintile of the birth weight distribution. Among children from the lowest quintile of the birth weight distribution, ‘outlier’ individuals are significantly more common among children conceived in vitro than children conceived in vivo. These observations are novel and potentially important because they associate a molecular phenotype (multiple and large DNA methylation differences) in normal somatic tissues (cord blood) with both a prenatal exposure (conception in vitro) and a clinically important outcome (low birth weight). These observations suggest that some individuals are more susceptible to environmentally mediated epigenetic alterations than others.
Children conceived in vitro are at modestly increased risk for a number of undesirable outcomes. These include rare imprinted gene disorders, such as Angelman and Beckwith–Wiedemann syndromes (1–11), but also much more common conditions, such as low birth weight (LBW) and small for gestational age (SGA) (4,12–17). The molecular basis for these epidemiological associations is uncertain, but one hypothesis is that alterations in epigenetic regulatory mechanisms are caused by many of the clinical or laboratory procedures used in assisted reproductive technology (ART) (18–19). Animal models of ART support this hypothesis because multiple alterations in DNA methylation and gene expression have been demonstrated to be associated with procedures used in ART (20–29).
We (30–32) and others (33–40) have identified DNA methylation and gene expression differences between children conceived in vitro and children conceived in vivo. However, such differences have not been observed in all studies (41–50), and it is possible that the effect of ART procedures in humans differs from that observed in animal models. The most potent argument against a universally disruptive effect of ART procedures on epigenetic regulatory mechanisms is that the vast majority of children conceived in vitro have no phenotype that demands clinical attention. We hypothesize that the reason for this disparity is that individuals differ in the degree to which they are susceptible to environmentally mediated alteration of epigenetic marks. We hypothesize, further, that the fraction of children who are ‘susceptible’ are responsible for the majority of undesirable outcomes associated with ART. As an initial test of this hypothesis, we have measured DNA methylation in cord blood of 114 children selected from the lowest and highest quintiles of the birth weight distribution at 96 CpG sites in genes we have found previously to be related to birth weight or growth and metabolism (51). We identified those individuals in each group who differed from the mean of the distribution by the greatest magnitude at each site and for the largest number of sites. Such ‘outlier’ individuals differ substantially from the rest of the group in having highly disrupted methylation levels at many CpG sites. We find that LBW children have significantly greater numbers of disrupted CpGs than children from the highest quintile of the birth weight distribution, on average, and that ‘outlier’ individuals in the LBW group are much more common among children conceived by ART than conceived in vivo. We propose that these individuals are not only responsible for the statistically significant differences in mean methylation level between ART children and controls but are also at greater risk of additional future undesirable outcomes for which LBW children are at increased risk.
Selection of CpG sites and cord blood samples
We profiled DNA methylation levels (see the Materials and Methods section) at 96 CpG sites (Supplementary Material, Table S1) in cord blood DNA from 114 children (Supplementary Material, Table S2). Fifty-seven children were selected from the lowest quintile of the birth weight distribution, beginning with the 1st percentile and proceeding, in ascending order (the LBW group), and 57 children were selected from the highest quintile of the birth weight distribution, beginning with the 100th percentile and proceeding in descending order [the high birth weight (HBW) group], irrespective of their mode of conception (ART versus conception in vivo) but maintaining equal numbers of males and females in each group. All birth weights were corrected for gestational age. The CpG sites were selected from more than 27 000 sites profiled, genome-wide, in our previous studies and have been shown to have a consistent relationship to birth weight (51) or ART-associated differential methylation (30–32).
Identification of ‘outliers’
Because one hypothesis being tested in this study is that some individuals are more susceptible to epigenetic alterations than others, we wished to identify those individuals with the largest differences in DNA methylation at each CpG site within each birth weight group. We approached this problem by identifying those individuals who were statistical ‘outliers’ of the distribution of all individuals in each group at each of the 96 CpGs. This concept is illustrated for the LBW group in Figure 1. We used a common algorithm to designate ‘outliers’ as any individual whose beta-value (fraction of DNA molecules methylated) at the CpG under consideration exceeded the interquartile range of the distribution by >50% (see the Materials and Methods section). The eight individuals circled in Figure 1A are ‘outliers’ of the distribution of methylation fractions at GRB10 cg24274319. The seven individuals designated as symbols other than closed circles in Figure 1A are also ‘outliers’ at two or more of the birth weight-related CpGs (51) shown in Figure 1A–D. Note that most ‘outlier’ individuals remain ‘outliers’ regardless of whether this designation applies to hypo- or hyper-methylation of the CpG in question; i.e. six of the individuals exhibiting hypo-methylation at the GRB10 CpG in Figure 1A also exhibit hyper-methylation at SHMT2 cg15670177 (Fig. 1D).
‘Outlier’ CpGs are not distributed randomly among individuals
Being designated an ‘outlier’ at any single CpG does not demonstrate that the individual in question is more susceptible to epigenetic alterations than any other. Any individual's epigenome may be disrupted in random fashion such that they appear as an ‘outlier’ for some CpGs. However, if an individual appears as an ‘outlier’ at multiple CpGs, in a statistically nonrandom fashion, it suggests that that individual has a more highly disrupted epigenome than average. In order to determine whether any individuals fulfilled these criteria, we tabulated the number of times each individual was designated an outlier for each of the 96 CpGs (Supplementary Material, Table S1). These distributions, stratified by birth weight group (LBW versus HBW) and mode of conception (ART versus Control), are shown in Figure 2. Two things are apparent, by inspection: (1) individuals in the LBW group (Fig. 2A) have, on average, many more ‘outlier’ CpGs (mean of 14 outlier CpGs/individual, ART and Controls, combined) than individuals in the HBW group (mean of 5 outlier CpGs/individual, ART and Controls, combined) (Fig. 2B) (Fisher's exact test, P = 0.05); (2) within the LBW group, ART individuals (mean of 21 outlier CpGs/individual) have, on average, many more ‘outlier’ CpGs than individuals conceived in vivo (mean of 10 outlier CpGs/individual) (Controls) (Fisher's exact test, P = 0.04) (Supplementary Material, Table S1). Restricting outlier identification to a single CpG per locus (82 loci rather than 96 CpGs) leads to similar conclusions. The seven most extreme LBW samples (6 ART and 1 Control; ranging from 36 outlier CpGs to 59 outlier CpGs) are still the 7 most extreme LBW samples (ranging from 33 outlier loci to 53 outlier loci). Similarly, the most extreme individuals in the HBW group based on CpGs remain the most extreme based on loci (last column, Supplementary Material, Table S1).
‘Outlier’ individuals are not distributed randomly among groups
If each birth weight distribution (all 57 LBW ART and Control, combined, and all 57 HBW ART and Control, combined) is used to define which individuals in each group are the most variant over all 96 CpGs (using the same algorithm to define ‘outliers’ of the overall distribution as used to define ‘outliers’ for each individual CpG, see the Materials and Methods section), the individuals depicted as open symbols in Figure 2 constitute the ‘outliers’ of these groups. In the LBW group, 6 of the 22 individuals conceived by ART but only 1 of the 35 individuals conceived in vivo (Fig. 2A) are designated as such. While the difference in distribution of ART and Control individuals who have extreme alterations in the epigenome in the LBW group is apparent (Fig. 2A), the difference in the number of outliers found in the ART group versus the Control group is also highly significant (Fisher's exact test, P = 0.01). In the HBW group (24 ART individuals and 33 Controls), there are no differences in either the average number of ‘outliers’ per individual or the number of ART versus Control individuals who are ‘outliers’ of the overall distribution. We confirmed ‘outlier’ status of the most extreme CpGs and individuals with the largest number of ‘outlier’ CpGs in the array data by bisulfite pyrosequencing (see the Materials and Methods section and Supplementary Material, Fig. S1).
We have assayed DNA methylation levels at 96 CpG sites in cord blood DNA from 114 children selected from the highest and lowest quintiles of the birth weight distribution, including 46 children conceived by ART and 68 children conceived in vivo. We stratified the population in this way because we were seeking to associate a molecular phenotype (a severely disrupted epigenome) with an environmental exposure (ART) and an undesirable clinical outcome (LBW). We used these data to test the hypothesis that some individuals are much more susceptible to disruption of epigenetic marks than others by identifying those individuals with ‘outlier’ DNA methylation levels at each of the 96 sites and determining which individuals had the largest number of highly disrupted sites.
We have made three novel observations: (1) LBW individuals have significantly more ‘outlier’ CpGs, on average, than HBW individuals; (2) among LBW individuals, children conceived by ART have significantly more ‘outlier’ CpGs than LBW individuals conceived in vivo; and (3) among the LBW group, ‘outlier’ individuals—those with the largest number of highly disrupted CpGs—are more prevalent among the ART group than the Control group. These observations are novel and potentially important because they associate a molecular phenotype (highly variant site-specific DNA methylation) with an environmental exposure (ART) and an undesirable clinical outcome (LBW). Moreover, children suffering the exposure (ART) are known to be at increased risk for the undesirable outcome (LBW) (12–17). Although we have not characterized the ‘outlier’ individuals in great clinical detail or followed them longitudinally with respect to future outcomes, it is our hypothesis that these individuals are responsible for the greatest share of the modest epidemiological risks associated with both LBW and ART. In this regard, it is of interest that the single individual who is characterized as an ‘outlier’ in the Control group in Figure 2A was born to a mother with preeclampsia.
In terms of the molecular phenotype being characterized (highly variant DNA methylation at multiple CpG sites), we note that the association of ART with greater variance in CpG site methylation is consistent with our previous observations at a smaller number of sites (31). It is also worth noting that ‘outlier’ individuals are responsible for almost all of the statistical differences in mean methylation level between ART and Control groups and LBW and HBW groups. For example, at almost all of the CpGs for which there is a difference in mean methylation level between the HBW and LBW groups (i.e. not simply the observed and validated correlation or inverse correlation between methylation level and birth weight (51) but a statistically significant difference in mean methylation level between the two groups, P < 0.05) or between the ART and Control groups, removal of the ‘outlier’ samples abolishes all statistically significant differences. In this regard, it is perhaps not surprising that of the more than 20 studies in which DNA methylation levels have been compared between ART and Control groups, approximately half find differences (30–40) and half do not (41–50). Unless the number of individuals examined and the number of CpGs examined in any particular study are large enough to include a significant number of individuals who are ‘outliers’ (or their observations are not discarded), it is unlikely that statistically significant differences between mean methylation level will be observed.
As to the degree to which the ‘outlier’/highly disrupted epigenome phenotype defined here is associated with a clinical outcome, without any stratification of the data, we note that if all 114 individuals are combined and ‘outlier’ CpG methylation recalculated at each CpG for all 114 individuals, ‘outlier’ individuals are 2-fold more likely to be LBW than HBW (data not shown). This relative risk is similar to the relative risk of LBW for ART children, compared with conception in vivo (17). A major remaining question concerns the origin of the inter-individual differences in susceptibility to ‘outlier’ methylation levels. The inter-individual differences in susceptibility to methylation alterations could be genetic, epigenetic, environmental or a combination of all three. Future studies should be designed with these possibilities in mind.
Limitations to the present study
We note that most of the 96 CpGs assayed in the present study were selected primarily based on correlation between methylation level and birth weight (51) in genome-wide profiling using the Illumina 27 K array. This platform profiles <0.1% of the CpGs present in the human genome and, as such, the sites assayed in our study are biased toward promoters identified from approximately half of human genes. Larger screens using newer platforms are likely to identify additional sites and genes whose methylation level is correlated with birth weight. However, the point of interest in the present study is not whether the 96 CpGs selected represent an exhaustive collection of genes that are strongly correlated with birth weight but whether some individuals have highly variant methylation levels in a substantial and non-random fraction of all sites profiled.
We also wish to point out that the tissue examined, cord blood, represents only a single embryonic tissue. We chose this tissue because it is the only embryonic tissue available to us for which we had samples from all individuals. It is formally possible that ‘outliers’ (or, indeed, any other differences) are distinguished by differences in the cell make-up of cord blood samples. However, while the common assertion that each cell type has its own DNA methylation ‘signature’ is true, such signatures are inevitably derived from a small fraction of all CpGs profiled. Most CpGs (>80%) have very similar methylation levels between different blood cell types but still differ between individuals (e.g. 52,53). The fact that we have validated multiple CpGs as birth weight-correlated in independent populations (51 and the present study) suggests that the DNA methylation levels of the CpGs examined are not affected dramatically by cord blood cell make-up.
Materials and Methods
Samples and sample preparation
Cord blood samples were collected from 57 LBW and 57 HBW children as described (30). Both ART and invivo-conceived children were included (Supplementary Material, Table S1). Genomic DNA and bisulfite-converted DNA were prepared as described previously (30,31). Genomic DNA was isolated from cord blood using Purelink Genomic DNA Mini kit (Invitrogen, Life Technologies Corp., USA). Bisulfite conversion was carried out with EZ DNA Methylation kit (Zymo Research Corp., USA). Bisulfite-converted DNA was stored at −20°C until further use.
Methylation analysis using Illumina Veracode array
Methylation levels of the bisulfite-converted samples were assayed at 96 CpG sites linked to birth weight, growth and metabolism or ART based on previous studies (30,31,50) (Supplementary Material, Table S2) using high-throughput VeraCode technology (VeraCode array) according to the manufacturer's protocol (Illumina, Inc., USA) and processed at the Children's Hospital of Philadelphia array facility. We obtained the custom oligo pool [allele-specific (ASO) and locus-specific (LSO) oligonucleotides] designed by Illumina for our custom methylation assay. The bisulfite-converted DNA was mixed with this custom oligo pool, hybridized followed by allele-specific extension (of ASO) and ligation to LSO based on the methylation status. This serves as a template for universal polymerase chain reaction (PCR) that uses labeled PCR primers. The resulting PCR products are then hybridized to specific VeraCode beads. The signals emitted by the methylated and unmethylated allele are read on a BeadXpress Reader. Methylation status is calculated as the ratio of signal from the methylated allele relative to the sum of both methylated and unmethylated alleles. This methylation value, known as β, ranges from 0 (unmethylated) to 1 (fully methylated). The data were analyzed by GenomeStudio software (Methylation module, v1.0).
Statistical outlier methylation values were identified at individual CpG sites using SPSS ver. 15. Fisher's exact test was used to compare the equality of proportion of outlier/non-outlier status between ART- and in vivo-conceived children in the LBW and HBW groups. The correlation between methylation levels assayed by the Veracode array and pyrosequencing assays was determined by calculation of Pearson's correlation coefficient using the CORREL function in Excel 2013.
Selection of candidate CpGs for technical validation by bisulfite pyrosequencing
Thirteen CpGs (GRB10 cg01720588, SHMT2 cg15670177, RASSF5 cg17558126, NDN cg12532169, ATP10A cg17260954, EDA2R cg14372520, IGF2H19 cg19731870, MMP10 cg00347729, GABRR2 cg06445611, ANGPT4 cg26540515, DSG4 cg13445249, RLN3 cg00722300 and MEST cg09059945) were selected for validation based on the largest number and most extreme values of outlier samples for each CpG.
Selection of ‘outliers’ and ‘non-outliers’ for validation
Samples that exhibited extremes of methylation at 14 or more CpG sites were considered ‘outliers’ for the purpose of technical validation. This cutoff criterion was selected based on deflection from the expected logarithmic decay curve. A graph of the number of outlier samples versus the number of CpGs was drawn for both the hyper- and hypo-methylated outliers (not shown). If outlier individuals occurred randomly, the expected graph should show a logarithmic decay distribution. However, the number of CpGs at which this distribution was not obeyed (i.e. a ‘bump’ is observed) was taken as cutoff for selecting samples for validation. Thirteen samples fulfilled these criteria and were matched for birth weight with ‘non-outliers’.
Pyrosequencing assay for validation
Pyrosequencing primers (Supplementary Material, Table S3)/assays were designed for the CpG sites using PyroMark Assay Design Software 2.0. Bisulfite-converted DNA was amplified using the designed PCR primers followed by pyrosequencing by the sequencing primer to estimate methylation levels.
Conflict of Interest statement. None declared.
This work was supported by the National Institute of Health (Grant Number P50-HD-068157 to C.C. and C.S.).