Discovery and Validation of Methylation Biomarkers for Ulcerative Colitis Associated Neoplasia

Abstract Background and aims Ulcerative colitis (UC) is associated with a higher background risk of dysplasia and/or neoplasia due to chronic inflammation. There exist few biomarkers for identification of patients with dysplasia, and targeted biopsies in this group of patients are inaccurate in reliably identifying dysplasia. We aimed to examine the epigenome of UC dysplasia and to identify and validate potential biomarkers Methods Colonic samples from patients with UC-associated dysplasia or neoplasia underwent epigenome-wide analysis on the Illumina 450K methylation array. Markers were validated by bisulphite pyrosequencing on a secondary validation cohort and accuracy calculated using logistic regression and receiver-operator curves. Results
 Twelve samples from 4 patients underwent methylation array analysis and 6 markers (GNG7, VAV3, KIF5C, PIK3R5, TUBB6, and ZNF583) were taken forward for secondary validation on a cohort of 71 colonic biopsy samples consisting of normal uninflamed mucosa from control patients, acute and chronic colitis, “field” mucosa in patients with dysplasia/neoplasia, dysplasia, and neoplasia. Methylation in the beta-tubulin TUBB6 correlated with the presence of dysplasia (P < 0.0001) and accurately discriminated between dysplasia and nondysplastic tissue, even in the apparently normal field mucosa downstream from dysplastic lesions (AUC 0.84, 95% CI 0.81–0.87). Conclusions Methylation in TUBB6 is a potential biomarker for UC- associated dysplasia. Further validation is needed and is ongoing as part of the ENDCAP-C study.


INTRODUCTION
Ulcerative colitis (UC) is a systemic autoimmune inflammatory disorder, primarily targeted at the colon and rectum, leading to widespread transmural mucosal inflammation. It is associated with an increased risk of cancer 1 and a recent meta-analysis demonstrated that in population-based cohorts, cancer risk increases 2.4-fold, with male sex, young age at diagnosis, and extensive colitis further increasing the risk. 2 UC-associated cancers arise in a similar fashion to sporadic carcinomas, however, they arise through an inflammation-dysplasia-carcinoma sequence 3, 4 that leads to accelerated carcinogenesis. The risk of cancer increases exponentially with duration of disease and has led to screening guidelines that take account of duration of symptoms.
The British Society of Gastroenterology guidelines 5 recommend that a single surveillance colonoscopy should be performed within 8 years of onset of symptoms, followed by 5 yearly in the second decade, 3 yearly in the third decade, and yearly in the fourth decade. It also is recommended that as part of these colonoscopies approximately 2-4 serial biopsies be taken every 5-10 cm within the colon. This represents a considerable burden for the screening endoscopist, although technologies such as chromoendoscopy 6 and narrow band imaging 7 have been used, to varying degrees of success. The use of random biopsies, even where they are taken frequently, may also reduce the chance of successful detection of UC-associated dysplasia or cancer. 8 A need exists for biomarkers that can reliably detect UC-associated dysplasia and cancer within the colon as part of a screening program. 9 A number of different approaches and technologies have been used to find an effective biomarker. Bronner et al 10 utilised fluorescent in situ hybridisation (FISH) of fresh frozen colonic biopsy specimens, finding 100% sensitivity and 92% specificity in discriminating patients who went on to develop neoplasia from those who did not.
Leedham et al 11 investigated mutational spectrum in UC-associated dysplasia and found that TP53 mutation was most frequent, with KRAS driver mutations being found in a subset of tumours. They also demonstrated that the lesions seen in UC-associated neoplasia were monoclonal in origin, with a driver mutation leading to multiple clones possessing the same mutation. Risques et al found that shortening of telomeres, associated with cellular senescence 12 in a field effect, seemed to indicate which areas of UC-associated dysplasia were likely to progress to cancer.
Other genetic markers for high risk UC-associated dysplasia that have been studied include chromosomal instability and microsatellite instability. Rubin et al 13 studied the rates of aneuploidy in patients with UC-associated dysplasia, finding an association between aneuploidy in biopsy specimens and progression to neoplasia, however this association has not been consistently reported by other studies. 14 Microsatellite instability (MSI) also has been studied as a potential marker, however no strong correlations between MSI and progression to neoplasia has been found. 15,16 It is obvious that current markers for high risk UC-associated neoplasia are lacking in accuracy. Analysis of methylation, an epigenetic modification to DNA, has several advantages. Firstly, DNA methylation is a stable change that is detectable in formalin fixed, paraffin-embedded biopsy samples. 17 Secondly, disease associated regions of methylation change tend to occur in longer stretches of DNA than point mutations 18 and in fields of change within affected tissues, allowing easier assay design and less need for targeted biopsies.
We therefore aimed to study the changes seen in DNA methylation in high risk UC-associated dysplasia to develop potential biomarkers for this disease.

Patients and Ethical Approval
Patients were recruited from the University Hospital Birmingham inflammatory bowel disease (IBD) clinic using an in-house IBD database to identify patients. For patients with neoplasia, patients were included in the study if they had UC of greater than 8 years in duration, had developed UC-associated neoplasia, had a proctocolectomy for UC-associated neoplasia with frozen material taken, and had archival histological material (tumor and paired normal) available for analysis. Patients were excluded if they did not have histological material available for analysis or the duration of their disease was less than 8 years. Acute colitis was defined as a first presentation of colitis, and chronic colitis was defined as colitis of greater than 8 years in duration. All patients with both "acute" and "chronic" colitis were sampled when they were receiving (at the most) only 5-ASA therapy for their colitis, rather than immunosuppressive agents that may bias methylation measurements.
Patients with chronic inflammation caused by UC without the development of neoplasia also were recruited to serve as nonneoplastic but inflamed controls. This was to exclude biomarkers that may have been associated with inflammation rather than driving tumoriogenesis.
The study was carried out with full ethical approval from the South Birmingham Research Ethics Committee (08/ H1207/104).

Sample Collection, Extraction, and Quantification
Tissue blocks were retrieved from histology archives, and associated H&E sections were reviewed by a consultant histopathologist (Phillipe Taniere) to ensure accuracy of histological diagnosis. For the discovery set, frozen tissue was obtained at the time of proctocolectomy by opening the specimen and representative samples of tumor and normal mucosa were obtained. Normal mucosa was obtained at the maximal possible distance from the tumor specimen. Histological type was confirmed by frozen sectioning, and all tumor material was confirmed to be adenocarcinoma. Dysplasia was categorized according to the methodology of Riddell et al 19 as either "high grade" or "low grade". This was snap frozen in liquid nitrogen then stored at -80 C until needed. For archival specimens, blocks were sectioned into 10μM sections and placed on slides. Needle macrodissection using white light microscopy comparing to a representative H&E section was used to enhance for tumor content. For DNA extraction of both sets, samples were immersed in 300 uL of buffer ATL (Qiagen Ltd, Manchester, UK) and 20μL of 20mg/μL Proteinase K (VWR Jencons, Lutterworh,UK). Samples were incubated overnight in a tissue oven, spun down to form a wax plug that was then punctured (for FFPE samples), and the lysate retrieved. This was cleaned and purified using a Qiagen DNeasy Blood & Tissue kit (Qiagen Ltd, Manchester UK). Extracted DNA was quantified for purity using a Nanodrop ND-2000 spectrophotometer and for quantity using a Qubit fluorimeter. If samples did not pass a quality threshold of A260/280 >1.8 they were reextracted. DNA was stored at -80 C until ready for use.

Methylation Microarray Discovery
To quantify methylation across the whole genome, the Illumina HumanMethylation450 array system was used on fresh tissue from the first part of the study. This is an oligonucleotide-based microarray platform that has over 458,000 probes targeted at CpG dinucleotides selected by an international consortium of epigenetics researchers to cover gene promoter regions, differentially methylated regions (DMRs), and other regions of interest.
One microgram of extracted DNA was bisulphite converted using the Zymo EZ-DNA Methylation kit with a modified protocol suitable for use on Illumina microarrays. A standard amplification, hybridization, labeling, and wash procedure was carried out by the Core Genomics Facility at the Wellcome Trust Centre for Human Genetics, University of Oxford. Microarrays were scanned on an Illumina iScan array scanner and detected intensities were converted to IDAT files and exported for further use.
Exported intensity data were analyzed using a combination of limma/Bioconductor and the ChAMP pipeline for methylation array analysis. 20 Data were imported into R 2.15.1 and were filtered to remove all probes that had failed the detection threshold (P > 0.05). Quality control plots were also produced and any samples failing lllumina standard QC were excluded. Probes were then normalized to adjust for Type 2 bias using BMIQ normalization, underwent SVD identification for components of variation, and batch correction using COMBAT. Top differentially methylated probes were called using a 3 level regression and eBayes shrinkage of moderated t-statistics using limma and DMRs called using DMRHunter. Copy number variation was called using the copy number function of the ChAMP package.

Validation
To provide validation samples, the IBD database was interrogated to provide a further cohort of samples for validation using bisulphite pyrosequencing. Targets for validation were identified from the top hits from the CHaMP analysis. The Illumina probe identifier for each hit was retrieved and genomic coordinates for the relevant CpG dinucleotide were identified from the HumanMethylation450 manifest file.
A primer set for bisulphite pyrosequencing was designed flanking the CpG dinucleotide of interest using Qiagen PyroMark 2.0 software using standard conditions. If a primer set could not be designed using standard conditions, the relevant conditions (Tm, amplicon length) were adjusted until a set could be designed. A maximum amplicon length of 250 bp was set as the validation samples originated from FFPE samples and our previous experiences in FFPE primer design had demonstrated that this was the maximum achievable primer sequence for this type of sample. Designed primers were biotinylated in either the forward or reverse direction, depending on design characteristics and primers were obtained from Sigma-Aldrich (Primer sequences available on request).
Obtained forward and reverse primers were diluted to 20 uM and used in a gradient PCR reaction using the Qiagen Pyromark PCR kit under standard conditions in a reaction volume of 25 uL to obtain optimum Tm. For each pyrosequencing reaction, 20 ng of bisulphite treated DNA in a volume of 2uL was used in a 25 uL reaction using the Qiagen Pyromark PCR kit under standard conditions with the observed Tm. Products were then cleaned and pyrosequenced on a Pyromark 96 ID machine using a 1:100 dilution of 20 uM sequencing primer. Percentage methylation values were calculated using standard default software settings on the Pyromark Q-CpG software package. All reaction plates were run with 100% methylated (generated with MSSl treatment of genomic DNA) and unmethylated (generated by whole genome amplification using Qiagen Repli-G kit) DNA.

Statistical Analysis
To predict accuracy and other metrics in the identified markers, methylation data were exported to Stata 12.1 (StataCorp, Texas, USA) as percentages. As methylation data were likely to be nonnormally distributed, a Wilcoxon rank sum test was performed on case-control data. To compare differences between tissue types, analysis of variance (ANOVA) testing was carried out. To estimate test accuracy a logistic regression model using outcome (cancer/no cancer) as the dependent variable and using percentage methylation for each marker was used as the independent variables. Markers identified as significant (P < 0.05) were then subjected to a sensitivity analysis using ROC curves to identify a cutoff of methylation to identify neoplasia to optimize sensitivity and specificity. This threshold was then modeled using the diagt function of Stata, correcting for a population prevalence of UC-associated dysplasia of ~3%.

Patients
For the discovery cohort, a cohort of 4 patients were obtained with tumor (adenocarcinoma) material and matched normal mucosa (Table 1), giving a total of 8 samples. A further 4 samples of normal mucosa in patients with chronic UC were obtained, giving 12 samples that were run successfully on the HumanMethylation450 array. For the patients with UC-associated neoplasia 75% (3/4), patients were male with an average age of 48 years. For the chronically inflamed normal mucosa, 50% (2/4) patients were male with an average age of 41 years. The duration of UC in these patients is shown in Table 1.
For the validation cohort, a total of 71 samples from 71 patients were used. A decision was made to obtain a single sample from each patient to attempt to reduce bias. These samples were biopsy samples consisting of acute colitis (n = 16), UC-associated cancer (n = 11), chronic (>8 years) colitis (n = 9), UC-associated dysplasia (n = 9, all high grade dysplasia), normal mucosa downstream of neoplasia (n = 19), and completely normal mucosa from unaffected controls (n = 7). The gender and age data of these samples are shown in Table 2. To expand and enhance the dataset, we used Human Methylation450 array data from the TCGA (The Cancer Genome Atlas) project, with 245 cases of colorectal cancer and 38 normal mucosa controls.

Differentially Methylated Positions (DMP)
A multilevel analysis of change in methylation was carried out, assuming that methylation would change between chronically inflamed mucosa, tumor-associated mucosa, and tumor itself as part of a field effect. This analysis revealed 12,412 diffferentially methylated probes with a Manzel-Haenszel adjusted Pvalue of < 0.05. The top 20 probes are shown in Table 3.
The top ranked CpG, cg08626004 (logFC = -4.32, B = 8.97) lies within a CG-rich region of exon 5 of GNG7 (G-protein subunit gamma 7), a membrane bound GTPase linked to 7-TM receptors. The next ranked CpG, cg03507241 (logFC = -0.31, B = 8.86) lies within the promoter region of TUBB6 (Tubulin Beta 6 class V) that codes for a gene that acts as 1 of the tubulin scaffold components of microtubules. The third ranked CpG, cg025848557 lies within intron 1 of VAV3-AS1 (VAV3-antisense 1), a noncoding RNA. The fourth ranked CpG, cg03280624 lies within the promoter region of ZNF583, a zinc finger-related transcription factor. The fifth ranked CpG, cg12035092 lies within exon 1 of KIF5C, a kinesis heavy chain subunit. The sixth ranked CpG, cg12863545 lies within the promoter of PIK3R5, a PI3 kinase-related gene.

Differentially Methylated Regions
A similar analysis was carried out for diffentially methylated regions using the dmr.lasso function of ChAMP (Table 4) examing the differences between chronically inflamed mucosa, tumor-associated mucosa, and tumor itself. The top differentially methylated region was within SGCE (Sarcoglycan epsilon), a transmembrane protein that is a component of the dystrophin-glycoprotein complex. The second highest differentially methylated region was within SOX2OT (SOX2 overlapping transcript), a long noncoding RNA that overlaps the coding region of the SOX2 gene and has been shown to regulate SOX2 expression. 21,22

Copy Number Variation
Copy number was called for all 12 samples successfully using the ChAMP package. In the tumor group, a recurrent deletion of variable length was seen at the very end of the q-arm of chromosome 5 in 3 out of 4 tumors (Chr 5:178017667-180876320). This CNA is frequently seen in colorectal cancer, and has been reported at a high frequency in UC-related cancers. 23 A recurrent variable length gain of the p-arm of chromosome 5 was also seen, which has also been previously reported in UC-associated cancer 23 In agreement with previous studies, a diverse pattern of copy number alteration was seen in both normal mucosa downstream of a tumor and in matched chronically inflamed  mucosa. In the matched normal mucosa from downstream of the tumor, a recurrent copy number loss was seen in 3 out of 4 samples in chromosome 17q, localized to a 2 mb region (Chr17:40169693-41993127). In the chronically inflamed normal mucosa, recurrent chromosomal loss was seen in chromosomes 1,2,6,10,11,17, 19, and 20 where recurrent loss was defined as >75% of samples.

Validation of Observed Methylation Changes
To validate changes seen in methylation, bisulphite pyrosequencing was carried out on the validation set of samples as described in methods. Based on the observed DMPs, the following assays were designed: ZNF583, GNG7, PIK3R5, TUBB6, and KIF5C. An attempt was made to design a primer set to amplify VAV3, but this region was found to be very GC rich (GC content  >85%) making amplification very challenging and, therefore, further study of this region was stopped. To simplify analysis, sample groups were consolidated into 2 groups for the purposes of the initial analysis, control (which included normal colon, acute colitis, and chronic colitis) and case (which included field mucosa, dysplasia, and cancer). Validation on the remaining panel demonstrated nonsignificance on all markers in cases versus controls except TUBB6 (Table 5), which demonstrated a median methylation of 11% in controls (IQR 4) and 37% (IQR 42) in cases. Further study of TUBB6 by tissue group demonstrated a significant (ANOVA P = 0.0043, F = 3.81) progression in methylation from normal mucosa, where methylation was at its lowest, through acute inflammation to dysplasia and neoplasia where methylation was as its highest (Fig. 1). Interestingly, methylation within the "field mucosa", ie, the mucosa downstream of a dysplastic or neoplastic lesion also had increased methylation as compared to control suggesting a field effect in these patients. Due to limited sample material, we were unable to perform immunochemistry for TUBB6 expression to compare to methylation levels, however an analysis of the TCGA dataset for colorectal cancer suggested no correlation between methylation and expression of TUBB6 (Supplementary figures 1).
Modeling of the association between TUBB6 methylation levels and presence of invasive disease (taken as either "field" mucosa, dysplasia, or cancer) using logistic regression found that a threshold of 17% methylation was sufficient to discriminate invasive disease correcting for a population prevelance of UC-associated dysplasia of 3% (AUC 0.84, 95% CI 0.81-0.87), Supplementary figure 2. This gave a final post hoc test sensitivity of 70.1%, specificity of 98.6%, positive predictive value of 60.3%, negative predictive value of 99.1% and a Youden index of 0.68.

CONCLUSIONS
In this study, we have carried out an exploratory analysis of the changes observed in methylation in tissue samples from patients with UC in the progression from normal mucosa, to disease associated "normal" mucosa, to dysplasia/neoplasia. We have then subsequently successfully validated differential methylation of TUBB6 as being a biomarker for progression in UC to invasive disease. To our knowledge, this is the first study that has utilized the power of epigenome-wide analysis to identify potential biomarkers of progression in UC . Abnormal methylation in cancer (both hyper-and hypomethylation) has been demonstrated to be adversely associated with survival across multiple cancer types. [24][25][26] Our study has a significant weakness, such that the initial cohort consists of 4 patients with paired tumor and associated normal "field" mucosa with normal mucosa obtained from normal control mucosa. This could potentially lead to test inflation caused by small sample size leading to a high false discovery rate, and this probably accounts for our observation that 4 out the 5 markers did not pass statistical validation. However, 2 of them -KIF5C and ZNF583 -only just did not reach significance and were therefore potential markers but were dropped because their observed methylation values included 0 values that would make test failures difficult to discriminate from true results.
However, our top ranked marker, TUBB6 validated in our independent cohort to a high significance level and demonstrated its possibility as a potential biomarker for UC-associated dysplasia. This demonstrates that it is possible to demonstrate statistically significant biomarkers from small genome-wide datasets given careful statistical analysis, making this a cost effective way to perform these studies in other diseases.
The function of our identified markers remains obscure and ideally a biomarker should have a functional relevance to the disease being studied. Two genes that almost reached significance-KIF5C and ZNF583-have functions not directly related to UC pathogenesis or progression. KIF5C (kinesin family member 5C) is a kinesin heavy chain subunit involved with protein trafficing within the central nervous system and has been linked to intellectual disability. 27 Little is known of the function of ZNF583 (Zinc Finger 583) beyond its description as a zinc finger protein, 28 making it presumably involved in transcriptional regulation. TUBB6 codes for Tubulin beta 6, a tubulin scaffold protein associated with the formation of microtubules that are part of the cytoskeleton. 29 A recent celluar genome-wide associated study highlighted the importance of differential expression 30 of TUBB6 in the promotion of inflammatory cell death, known as pyroptosis. The study demonstrated that increased expression of TUBB6, in this case caused by an intragenic single nucleotide polymorphism, lead to decreased pyroptosis. This has potential associations with the mechanisms of cell death in UC , as deficiencies in commensal-induced pyroptosis has been shown to increase the severity of UC in a murine model. 31 Variable TUBB6 expression has also been seen in other malignancies such as nonsmall cell lung cancer 32 and prostate, ovary, and breast cancer. 33 Some evidence has been demonstrated in cell lines by Mariani et al 34 that TUBB6 expression is partially controlled by androgen receptor status, with women showing higher expression of TUBB6 than men. However, our study is concerned with methylation of TUBB6 and the relationship between expression and methylation is complex and therefore it is difficult to understand whether there is an impact of gender on methylation of TUBB6 as we have not shown any particular bias. Potentially deficient microtubules could be stabilized by an agent such as paclitaxel 35 offering a potential target for "high risk" colitis.
In conclusion, we have identified and validated a potential biomarker of UC-associated dysplasia in the form of abnormal methylation of TUBB6, identified by whole epigenome analysis. Further validation is essential and this marker will form part of a marker panel in the prospective clinical trial module of epigenetic biomarkers in the NIHR funded ENDCAP-C study.

SUPPLEMENTARY DATA
Supplementary data is available at Inflammatory Bowel Diseases online.