Two-stage Genome-wide Methylation Profiling in Childhood-onset Crohn's Disease Implicates Epigenetic Alterations at the VMP1/MIR21 and HLA Loci

Article first published online 20 August 2014. Supplemental Digital Content is Available in the Text.

T he last decade has seen tremendous success in identifying genetic loci associated with inherited susceptibility to Crohn's diesease (CD), with 140 loci identified in the most recent GWAS meta-analysis. 1 However, these determinants collectively explain only an estimated 13.6% 1 of disease variability, and the biological variation each confers is unclear. The importance of noninherited factors in pathogenesis has been highlighted by studies on the increasing incidence of CD, especially in children, 2,3 and in the developing world, 4,5 and by a greater understanding of the effects of gut microbiota and diet on risk. 6,7 A critical objective for CD research is to characterize the interaction between genetic and environmental factors.
Epigenetic alteration has emerged as a potential mechanism through which these interactions may occur. 8 Developments allowing rapid assaying of cytosine methylation at nearly 5 · 10 5 positions, and insights into confounding effects in study design 9 have provided the impetus to build on promising pilot data from previous generation technology to allow epigenome-wide association studies (EWAS) to become a valuable complement to the more mature GWAS. 10 Epigenome mapping has been used to identify DNA regulatory elements, explore cancer biology, and provide a growing body of findings in complex diseases, such as rheumatoid arthritis, 11 multiple sclerosis, 12 type 2 diabetes, and obesity. 13,14 Although the relationship between methylation and gene expression and function is incompletely understood, relevant modifying influences include age, ethnicity, smoking, gut microbiota, and diet. [15][16][17][18] DNA-binding factors can directly influence methylation, and in turn, altering methylation can directly influence expression. 19,20 We hypothesized that identification of altered levels of methylation, which are significantly associated with disease state, whether predating or following disease, offers the potential for discovering new pathways integral to the disease process and for predicting disease status.

Study Design
Illumina 450k DNA methylation analysis was performed separately in 2 pediatric cohorts (see Table, Supplemental Digital Content 1, http://links.lww.com/IBD/A557 that contains demographic data for pediatric cohorts). Patients in the first cohort were treatment-naive, newly diagnosed CD cases. Patients in the second cohort were children with an established diagnosis of CD. Both cohorts used symptomatic controls, which were investigated by colonoscopy, but in whom no pathology or abnormality was discovered at initial investigation or subsequently. Linear discriminant analysis (LDA) was applied to data generated in the discovery cohort to identify biomarker candidates, which were then tested in the replication cohort.
As the 2 pediatric stages were of similar design and showed strong replication, they were amenable to a single joint analysis, an approach recommended for suitable data sets 9 because of the increase in power it gives in discovering CpG sites with diseaseassociated methylation changes. Seven of the most significant CpG sites implicated by the combined pediatric analysis were assessed by pyrosequencing in an adult cohort with established CD, and disease-associated expression changes were explored (Fig. 1A).

Pediatric Patient Selection and Ethics
Pediatric samples were collected from centers across Scotland. The Bacteria in Inflammatory bowel disease (IBD) in Scottish Children Undergoing Investigation before Treatment (BISCUIT) study provided peripheral blood leukocyte DNA for the discovery cohort from 18 treatment-naive newly diagnosed patients and 18 matched nondiseased controls from Aberdeen, Glasgow, and Dundee. Controls had been rigorously investigated for gastrointestinal symptoms but did not have or subsequently develop any organic gastrointestinal pathology, including IBD. 21 The replication cohort comprised DNA samples from 18 children with established CD supplied by the Paediatric-onset IBD Cohort and Treatment Study (PICTS), 22 analyzed against a second set of 18 controls from the BISCUIT study. Within both cohorts, patients and controls were matched for age and gender. The BISCUIT study was approved by the North of Scotland Research Ethics Committee (09/S0802/24) and PICTS by ethics committees FIGURE 1. A, Schematic representation of study design, n ¼ ratio of CD samples to control samples. B, Log fold-change (log 2 mean methylation in CD/mean methylation in controls) for all probes with nominally significant (uncorrected P , 0.05, n ¼ 3620) methylation changes in both pediatric cohorts. Data are binned; colors of shorter wavelength indicate higher frequency. C, A 50 kb of genomic regions are more likely to contain GWAS SNPs for IBD or CD if they also contain more significant disease-associated methylation changes (P ¼ 3.66 · 10 27 ). D, Manhattan plot of diseaseassociated methylation changes; horizontal line corresponds to significance after Bonferroni correction. at participating centers (Edinburgh, Glasgow, Aberdeen, and Dundee-LREC 2002/6/18). Written informed consent was obtained from the parents of all participating children. Informed assent was also obtained from older children capable of understanding the nature of the study. Demographics for both cohorts are in Table, Supplemental Digital Content 1, http://links.lww.com/IBD/A557.

Genome-wide Methylation Profiling
Peripheral blood leukocyte DNA was bisulfite converted and analyzed using the Illumina Human Methylation 450k platform (Illumina, San Diego, CA) 23 with cases and controls distributed across chips. Probes were filtered to remove any with a detection P value of $0.01, those from sex chromosomes, and those that had single nucleotide polymorphisms (SNPs) with a minor allele frequency of $0.01 in the European population in the 1000 Genomes Project 24 within CpGs assayed by the array. Samples were removed if there was a gender mismatch or if more than 5% of probes failed.
Data were corrected using background removal and quantile normalization in the lumi R package 25,26 followed by beta mixture quantile dilation. 27 Batch effects were controlled for using ComBAT. 28 Differential leukocyte counts from the same day that DNA samples taken were available for 24 patients and 19 controls. Linear models were created for all Illumina 450k probes and disease status in these samples. Probes were selected, which had F test with P values ,10 28 but a P value for disease association of .0.05. Combinations of 100 probes were tested, and the best probe set was used to predict the differential cell counts for samples without measured differential leukocyte counts. This model is similar to that described by Houseman et al. 29 Analysis of the methylation status of cases versus controls was performed using limma 30 in R using linear modeling of beta values with measured or predicted neutrophil, other granulocyte, lymphocyte, and monocyte numbers as covariates.
The Benjamini-Hochberg false discovery rate (FDR) 31 was calculated for each probe, with a FDR corrected P ,0.05 used to define significance in analysis of broader methylation patterns, such as identifying differentially methylated regions. For significance of individual probes, the more conservative Bonferroni correction was used.

Differently Methylated Regions
Differently methylated regions (DMRs) were defined using a reimplementation of the probe lasso DMR-calling technique used by the ChAMP pipeline 32 in R. This defined a DMR as 3 or more sequential probes with significant (FDR adjusted P , 0.05) unidirectional methylation changes falling within the lasso distance threshold. The distance threshold uses a base size of 2 kb, modified based on the methylation patterns of local genomic features. 33

Replication of Methylation Findings in Adults
Twenty whole-blood DNA samples from adults with CD were recruited from gastroenterology clinic while their disease was quiescent, with 20 healthy controls collected during the same time period (LREC 2000/4/192). In addition, methylation changes in VMP1/MIR21 were replicated in an extended adult replication cohort of 87 adults with CD and 85 healthy controls (see Table, Supplemental Digital Content 2, http://links.lww.com/IBD/A558 that contains demographic data), of which the smaller group was a subset.

VMP1 and MIR21 Expression
MIR21 primary transcript (pri-miR-21, primer details in Table, Supplemental Digital Content 3, http://links.lww.com/ IBD/A559) was assayed by qPCR, with all patients and controls giving written, informed consent (LREC 06/S1101/16, LREC 2000/4/192). Suitable patients with CD were prospectively recruited from gastroenterology clinic and endoscopy lists, and healthy controls were recruited from volunteers. Blood samples were taken using a 21-gauge butterfly needle and 9 mL K3 EDTA vacuette (Greiner, Germany) and stored at 48C for up to 2 hours. Total RNA was then extracted from 1.5 mL whole blood using QIAamp RNA blood mini kit (Qiagen) and stored at 2808C. cDNA was converted using SuperScript Vilo cDNA synthesis kits (Invitrogen, Carlsbad) and analyzed on a Corbett Rotor-Gene 6000 (Qiagen) with DyNAmo Flash SYBR green reagent (Thermo Scientific, Waltham). Expression of pri-miR-21 was normalized to reference gene TBP, (primer details in Table, Supplemental Digital Content 3, http://links.lww.com/IBD/A559) after initial optimization against 4 reference genes (GAPDH, TBP, SDHA, and ACTB) and analyzed by the DDCt method in R. Expression in inflamed and uninflamed intestinal biopsies from patients with CD, ulcerative colitis, and healthy controls was assessed in previously reported microarray data. 34 Linear Discriminant Analysis LDA of methylation beta values in the pediatric discovery cohort was used to create biomarkers for the presence of CD, using the LDA function in the R package "MASS." 35 All probes with FDR adjusted P values ,0.05 in the discovery cohort (see Table, Supplemental Digital Content 4, http://links.lww.com/IBD/A560 that lists pediatric discovery results with FDR adjusted P , 0.05) were used as covariates, regardless of performance in the replication cohort, with each model including 2 probes. Models were tested using the pediatric replication cohort methylation beta values.

GWAS Colocalization
For range thresholds between 25 kb and 4 Mb, the lowest P value within range of each GWAS SNP for CD or IBD 1 was compared by Wilcoxon rank sum test to 1000 randomly selected bins of the same genomic size, matched for probe density.

GO Term Enrichment Analysis
The R package GOseq 36 was used as described by Geeleher et al 37 to correct for bias introduced by the variation in Illumina 450k probes per gene and analyze gene ontology (GO) term enrichment. The number of probes sharing each gene symbol annotation and whether that annotation covers at least one differentially methylated probe (FDR corrected P , 0.05) was used to create a probability weighting function, used in the GO term enrichment analysis.

Pediatric Illumina 450k
Nine probes with FDR corrected P , 0.05 were identified in the discovery cohort, 8 of which achieved nominal significance in the replication cohort (see Table, Supplemental Digital Content 4, http://links.lww.com/IBD/A560 that lists pediatric discovery results with FDR adjusted P , 0.05). Correlation between cohorts was high, with 89% of probes reaching nominal significance in both cohorts showing the same direction of change (Fig. 1B).
Combined analysis of the 2 cohorts identified 1319 probes with significant FDR adjusted P values. Of these, 65 CpGs (Table 1, Fig. 1D) retained epigenome-wide significance after the more stringent Bonferroni correction for multiple testing. At these probes, there were absolute differences in mean methylation between CD and control groups of up to 16% (mean 6%), with 89% of probes showing hypomethylation in CD. Methylation variance in CD at these probes was greater than controls in 65%, 53% of which differences were statistically significant (17% for probes where variance was greater in controls). The mean ratio of variances between CD and control were 2.4 and 1.4 for probes where CD or control, respectively had the greater variance.
Nineteen DMRs were identified (Table 2) from the 1319 probes with significant FDR adjusted P values. These regions involve several genes in pathways relevant to CD including TNF within the HLA region, MIR21, Toll-like receptor signaling (TOLLIP), and apoptosis (VMP1, PRF1 and DIABLO).
Colocalization of significant Illumina 450k methylation changes with GWAS SNPs was found across distance thresholds between 25 kb and 4 Mb, with peak correlation between 50 kb (P ¼ 3.66 · 10 27 ) and 100 kb (P ¼ 2.41 · 10 27 ) in line with previously published work, 38 (Fig. 1C). This relationship remained significant if VMP1/MIR21 was excluded from analysis.

Adult Replication with Pyrosequencing
Pyrosequencing assays were designed for a series of 7 regions corresponding to significant disease-associated methylation changes in the combined pediatric Illumina 450k data. Methylation changes were assayed in a group of 20 adults with CD and 20 controls with resultant P values between 0.004 and 2 · 10 25 (Fig. 2B). As with

Linear Discriminant Analysis
Using the pediatric discovery cohort methylation beta values as learning set for LDA, models were created for each possible combination of 2 probes to predict the presence of CD, which were then tested using the beta values from the pediatric replication cohort. Area under the curve values for the performance of these models in the replication cohort ranged from 0.79 to 0.98 (median 0.93). Figure 2A shows the separation in 2-dimensional beta values by diagnosis in 10 two-probe combinations.

Interpretation and Selection of Genes for Further Study
To highlight genomic regions for further study, we considered 3 criteria: the significance of individual CpGs, clustering of CpGs into DMRs, and colocalization of methylation changes with risk loci identified by GWAS. Genes that scored highly in multiple categories were given the highest priority for further investigation (Fig. 3), with VMP1/MIR21 emerging as the strongest candidate. Similarly, the TNF locus within the HLA region was enriched for highly significant methylation changes within a DMR, in a region of established interest in CD. Other genes of interest include SBNO2 and IL18RAP, where highly significant CpGs are found within risk loci established by GWAS studies and ZBTB16 and RUNX3, where DMRs contain or neighbor highly significant individual CpGs.

VMP1/MIR21
Five probes within the VMP1/MIR21 locus, 4 of which lie within a DMR (Fig. 4), had disease-associated changes in methylation surviving Bonferroni correction. These probes are clustered at the 3 0 end of VMP1, around the 11th exon, within 50 kb of a GWAS SNP (rs1292053). The DMR is directly adjacent to the transcription start site and promoter region for the primary transcript of MIR21 (pri-miR-21).
We confirmed CD-associated hypomethylation of this region in blood by pyrosequencing in 172 adults (P ¼ 6.6 · 10 25 , Fig. 5A). The qPCR for pri-miR-21 in 43 adults with CD and 23 healthy controls demonstrated an increase of expression in CD (P , 0.005, Fig. 5B). Analysis of previously published data 34 demonstrated an increased expression of MIR21 (P ¼ 1.4 · 10 26 ) and VMP1 (P ¼ 2.6 · 10 23 ) in biopsies from inflamed versus uninflamed mucosa in CD, which was not observed in controls (Fig. 5C). MIR21 showed significantly increased expression in inflamed versus uninflamed UC biopsies (5.1 · 10 27 ), but there was no inflammation-related increase in VMP1 expression, unlike that seen in CD.

Principal Findings
This study establishes a significant and highly replicable pattern of DNA methylation associated with pediatric CD, with further replication in adults for many of the most significant pediatric results. We show a significant enrichment of methylation changes in proximity to GWAS risk loci, offering a novel approach in exploring the biological variation associated with common genetic variants and have derived biomarkers, which show remarkable accuracy in determining the presence of CD. As such, this study provides an important confirmation of the validity and feasibility of methylation screening in complex disease and complements the emerging evidence implicating epigenetic alterations in IBD and other immune-mediated diseases and complex traits, such as rheumatoid arthritis, 11 obesity, 14 and diabetes. 13

VMP1 and MIR21
The discovery of methylation alterations within the VMP1/ MIR21 locus emerges as the strongest individual result. Further confirmation of altered methylation of this region in CD by pyrosequencing in adults is augmented with data showing increased expression of MIR21 in blood in CD and increased expression of MIR21 in inflamed intestinal biopsies in CD but not controls. VMP1 encodes a transmembrane protein located in the Golgi apparatus, endoplasmic reticulum, and vacuoles with high degrees of expression in the intestine, kidney, ovary, and placenta. 39 There is high transspecies conservation of VMP1, and it is noteworthy that expression induces autophagy through interactions with BECN1. 40,41 MIR21 was one of the earliest described microRNAs and has been implicated in numerous cancers, including IBD-associated colorectal cancer. 42 The mature sequence is produced from a precursor overlapping with the 3 0 end of VMP1. This region is highly conserved, exhibits DNase I hypersensitivity and is associated with the promoter-associated histone marks H3K4Me1 and H3K4Me3. 43 MIR21 has a known role in T-cell differentiation and development. [44][45][46][47] Increased expression of MIR21 in active IBD and IBD-associated dysplasia has been described elsewhere, 48,49 and MIR21 knockout mice have been shown to be protected from DSS-induced colitis. 50 There is a growing body of evidence for numerous micro-RNAs being involved in CD such as the regulation of NOD2 by microRNAs 51,52 and NOD2 genotype influencing IL-23 production in dendritic cells by regulation of MIR29 production. 53 Recent work has shown ATG16L1 can be regulated by multiple microRNAs, with resulting effects on autophagy, 54,55 particularly interesting about our data as ATG16L1 contains an MIR21 target motif. 56 Other than the VMP1/MIR21 discovery, a number of the other loci implicated by our study are noteworthy in the context of disease pathogenesis and will bear further investigation. The data implicating the HLA region, and, in particular, the TNF locus complement the genetic data implicating this region in determining IBD susceptibility and phenotype and the body of evidence implicating TNF in disease.
Other regions that are of great interest to intestinal immune regulation showing highly significant replicable alterations in methylation in our study include SOCS3, a suppressor of cytokine signaling to the JAK/STAT pathway, 57,58 TOLLIP, 59 and RPS6KA2, 60 a ribosomal S6 kinase interacting with MAPkinase1/3.

Linear Discriminant Analysis
DNA methylation of specific loci has found use as a biomarker in diagnosis and prognosis of cancer, 61,62 such as methylation at a tumor suppressor CpG Island. The results of our LDA serve as a proof of concept for the development of methylation-based diagnostic biomarkers in complex diseases. Future work should also seek to establish prospective links with other clinical outcomes, such as response to treatment and disease course.
The use of children who required colonoscopy to rule out IBD as controls precisely models the clinical scenario in which a diagnostic biomarker would find use. CD-specific methylation patterns weaken with increased age and were absent in the elderly (data not shown), possibly due to the accumulation of confounding factors, such as environmental exposure, comorbidity and polypharmacy, or inherent effects of aging on methylation. 18 It remains to be determined if this approach is equally pertinent for conditions with a later age of onset.

Strengths and Limitations
This study provides an impetus for further analysis of alterations of leukocyte DNA methylation in IBD and other complex diseases, with many targets emerging for further study. In comparison with GWAS data, we show highly reproducible and significant disease-associated methylation changes using a modest number of samples. Indeed, the strength and reproducibility of our findings compare favorably with epigenetic data generated to date in IBD and other complex diseases 10,11,63 and also with the results of theoretical modeling based on predicted disease-associated methylation patterns. In particular, the magnitude and variance of observed methylation changes in whole blood contrasts with models used to predict required group sizes (see Fig., Supplemental Digital Content 8, http://links.lww.com/IBD/A564, shows power to detect methylation changes similar to VMP1/MIR21). 9 These data may inform future study design in IBD and other complex diseases.  In designing this study, we addressed the key confounding issues relevant to epigenome analysis, which are currently subject to intensive scientific debate. Our approach of basing the study initially in pediatric disease has been successful in generating data replicable in children and adults. Studies in children have the advantage of reducing the influence of age, comorbidity, polypharmacy, smoking, and environmental factors, which could confound epigenetic changes. The focus on circulating leukocytes in IBD rather than intestinal mucosa in this study is strongly supported by scientific evidence of immune dysregulation, the wellrecognized clinical extraintestinal manifestations, and indeed the recent evidence of an encouraging response to autologous bone marrow transplant in refractory disease. 64 Methylation at numerous sites has also been shown to influence PBMC response to stimulation of toll-like receptors ex-vivo with multiple ligands. 16 Ease of access to blood is clearly advantageous in biomarker discovery.
The heterogeneity of studied tissues is a commonly cited concern in epigenome-wide analysis. 65 We demonstrated the ability to use genome-wide methylation data with contemporaneous clinical full blood count data to correct for whole-blood heterogeneity. If such data are not available, comparison with reference methylation data sets 29,66 from separated cells and alternative techniques 67 has been demonstrated effective and accurate. These strategies are feasible for translational studies, especially high throughput clinical investigations, where cell separation adds considerable processing and expense.
Our data are highly significant even after applying Bonferroni correction for multiple testing-this correction that is widely applied in GWAS studies is likely to emerge as overly conservative in the context of EWAS because it ignores the correlation of methylation between neighboring probes. The establishment of a consensus on the limit of epigenome-wide significance for DNA methylation arrays remains a priority for future reporting of epigenetic findings in complex diseases.
Although the combined factors of moderate study size and conservative correction for multiple testing may well contribute to false negatives (type II error), the reproducibility in 3 independent cohorts and level of statistical significance provide a high degree of confidence in our positive findings. Parallels may be drawn with the early linkage and association studies in IBD, which allowed modeling of the genetic architecture and delivered "low-hanging fruit" in terms of NOD2 and HLA associations, [68][69][70] findings that have subsequently been unequivocally replicated in large scale experiments.
The emerging evidence of a role for MIR21 in IBD from other approaches enhances the biological plausibility of this finding and strengthens the case for using EWAS in CD and other complex diseases to discover novel biologically significant genes. Moreover, the enrichment of methylation differences near to genetic risk loci variants seen in these data and previous work 38, 71 raises the possibility that epigenetic modifications may help identify specific points within large genetic susceptibility loci where genetic and biological variation overlap.
There is at present intense interest in the application of epigenomic analyses in complex diseases, and technologies and analytic approaches are evolving rapidly. The strengths and limitations of the approach are becoming better understood, leading to the very real hope that EWAS will now evolve to complement GWAS in understanding pathogenesis. To date, methylation profiling has been limited by the application of analytic approaches developed for genetic rather than epigenomic analysis. In cases, this has led to overestimating the strength of results, such as failing to appreciate the bias introduced by the wide variety of probe numbers per gene in pathway analysis. 37 However, failing to appreciate the difference between a SNP of limited possible states weakly correlated with disease risk, and the bounded continuous variable of DNA methylation has led to underestimations of the power of moderate-scale epigenetic studies.

CONCLUSIONS
Overall, these observations serve to highlight the need to integrate methylation, genetic, and expression data in future studies of the pathogenesis of complex diseases and provide insight into potential mechanisms involved in gene-environmental interaction. There are exciting and immediate implications for early clinical translation; the discovery of easily accessible biomarkers in peripheral blood to predict disease susceptibility, progression or response to therapy, and the potential for new therapeutic targets.
Future studies should evaluate altered methylation and expression at these sites, including MIR21, both in whole blood and specific cell types, before initiation of disease and in association with environmental factors to better understand causality.