Paediatric leukaemia DNA methylation profiling using MBD enrichment and SOLiD sequencing on archival bone marrow smears

Background Acute Lymphoblastic Leukaemia (ALL) is the most common cancer in children. Over the past four decades, research has advanced the treatment of this cancer from a less than 60% chance of survival to over 85% today. The causal molecular mechanisms remain unclear. Here, we performed sequencing-based genomic DNA methylation profiling of eight paediatric ALL patients using archived bone marrow smear microscope slides. Findings SOLiD™ sequencing data was collected from Methyl-Binding Domain (MBD) enriched fractions of genomic DNA. The primary tumour and remission bone marrow sample was analysed from eight patients. Four patients relapsed and the relapsed tumour was analysed. Input and MBD-enriched DNA from each sample was sequenced, aligned to the hg19 reference genome and analysed for enrichment peaks using MACS (Model-based Analysis for ChIP-Seq) and HOMER (Hypergeometric Optimization of Motif EnRichment). In total, 3.67 gigabases (Gb) were sequenced, 2.74 Gb were aligned to the reference genome (average 74.66% alignment efficiency). This dataset enables the interrogation of differential DNA methylation associated with paediatric ALL. Preliminary results reveal concordant regions of enrichment indicative of a DNA methylation signature. Conclusion Our dataset represents one of the first SOLiD™MBD-Seq studies performed on paediatric ALL and is the first to utilise archival bone marrow smears. Differential DNA methylation between cancer and equivalent disease-free tissue can be identified and correlated with existing and published genomic studies. Given the rarity of paediatric haematopoietic malignancies, relative to adult counterparts, our demonstration of the utility of archived bone marrow smear samples to high-throughput methylation sequencing approaches offers tremendous potential to explore the role of DNA methylation in the aetiology of cancer.

DNA from 40 individuals for sequencing on the Sequencing by Oligonucleotide Ligation and Detection (SOLiD™) sequencing platform (SOLiD™MBD-Seq, Life Technologies, Carlsbad, USA). MBD2 has been shown to bind to double-stranded methylated DNA molecules and used to interrogate the human methylome [1]. By comparing the enriched fraction to the "input" total genomic DNA fraction, genomic regions of DNA methylation can be inferred after sequencing both fractions. The samples analysed are comprised of the following: three model cell lines, JWL (an in-house non-leukaemic cell line [2]), CEM-CCRF (childhood T-cell acute lymphoblastic leukaemia [ALL]   (Table 1). Genomic DNA from archived bone marrow smear microscope slides from ALL patients, cells and cell lines were extracted as previously described [3] and used for the enrichment of CpG methylation with the MethylMiner™ Methylated DNA enrichment kit (Life Technologies) according to the manufacturer's protocols.
The fragmented input genomic DNA (I) and enriched E5 fraction (E) were isolated from each sample for library preparation and sequencing using SOLiD™ v3 and v4 chemistry according to the manufacturer's protocols (Life Technologies).
Single and paired-end SOLiD™ sequencing reads were aligned using LifeScope™ Genomic Analysis Suite (Life Technologies) with default parameters against the hg19 reference genome. Alignment efficiency (the ratio of uniquely aligned reads to total sequenced reads for each sample) ranged from 26.57% to 93.15% across all samples in this study (Table 1).
This study is unique in a number of ways. This is the first sequencing-based DNA methylation profiling study in childhood ALL using archived bone marrow samples of similar quality to formalin-fixed paraffin-embedded (FFPE) tissue samples [7]. We have selected samples that have been interrogated using an orthogonal platform, the Illumina Infinium Human Methylation 450K BeadArray [3,8], and included replicate samples to assess the reproducibility of SOLiD™MBD-Seq and to identify regions of differential DNA methylation of interest to childhood ALL.
We performed replicate DNA methylation enrichment analysis using the JWL cell line with 1 μg and 5 μg of starting genomic DNA to determine if 1 μg of starting material was sufficient for DNA methylation enrichment. This was less than the recommended quantity but a typical amount obtainable from our primary patient samples.
We isolated four haematopoietic cell populations (CD34, CD19, CD33, CD45) at major stages of development corresponding to the arrested stages of development in paediatric leukaemia. This was achieved by positive selection using fluorescent-labelled antibodies and Fluorescent Activated Cell Sorting (FACS) from four individuals. This would enable us to track changes in DNA methylation between cell lineages and contrast them with leukaemic cells. After MACS enrichment peak analysis, a large proportion of peaks were common between the CD19 cells from three individuals, confirming the premise of tissue-specific DNA methylation profiles in haematopoietic cells ( Figure 1A).
When comparing DNA methylation enrichment peaks between leukaemic and remission samples (tumour versus normal) from the same individual, distinct enrichment peaks are seen; these are likely to correlate to disease state ( Figure 1B). The number of overlapping peaks between leukaemic and remission samples were fewer compared to the haematopoietic cell analyses ( Figure 1C and 1D) and could be indicative of the difference in sample qualities.
For each of the samples analysed in this study, we have generated track hubs that can be uploaded and visualised on the UCSC Genome Browser. This permits the immediate visualisation of regions of differential DNA methylation with potential biological significance. Moreover, we have performed Infinium analysis on these samples, and visualisation using the Genome Browser permits direct comparison to other publicly available data such as The Cancer Genome Atlas (TCGA) [9] and TARGET (Therapeutically Applicable Research to Generate Effective Treatments) [10]. This also permits further analysis and comparison to publicly available data using the Galaxy [11,12] and Cistrome [13] web servers.
In summary, our data represent one of the first DNA methylation enrichment analyses using SOLiD™MBD-Seq on archival bone marrow smears from children diagnosed with ALL. Such specimens are readily available in most pathology laboratories across the world and are amenable to genomic-scale analysis, as we have demonstrated here. These data should prove valuable for other DNA methylation studies in childhood ALL in haematopoeitic cell development.

Availability of supporting data
Supporting data is available from the GigaScience Database, GigaDB [14] and at NCBI under BioProject PRJNA272864.

Data file details
• SRA Files included BioProject PRJNA272864 • MACS and HOMER output files of peaks and peak locations • Track Hubs for UCSC Genome Browser