-
PDF
- Split View
-
Views
-
Cite
Cite
Guillaume Velasco, Giacomo Grillo, Nizar Touleimat, Laure Ferry, Ivana Ivkovic, Florence Ribierre, Jean-François Deleuze, Sophie Chantalat, Capucine Picard, Claire Francastel, Comparative methylome analysis of ICF patients identifies heterochromatin loci that require ZBTB24, CDCA7 and HELLS for their methylated state, Human Molecular Genetics, Volume 27, Issue 14, 15 July 2018, Pages 2409–2424, https://doi.org/10.1093/hmg/ddy130
- Share Icon Share
Abstract
Alterations of DNA methylation landscapes and machinery are a hallmark of many human diseases. A prominent case is the ICF syndrome, a rare autosomal recessive immunological/neurological disorder diagnosed by the loss of DNA methylation at (peri)centromeric repeats and its associated chromosomal instability. It is caused by mutations in the de novo DNA methyltransferase DNMT3B in about half of the patients (ICF1). In the remainder, the striking identification of mutations in factors devoid of DNA methyltransferase activity, ZBTB24 (ICF2), CDCA7 (ICF3) or HELLS (ICF4), raised key questions about common or distinguishing DNA methylation alterations downstream of these mutations and hence, about the functional link between the four factors. Here, we established the first comparative methylation profiling in ICF patients with all four genotypes and we provide evidence that, despite unifying hypomethylation of pericentromeric repeats and a few common loci, methylation profiling clearly distinguished ICF1 from ICF2, 3 and 4 patients. Using available genomic and epigenomic annotations to characterize regions prone to loss of DNA methylation downstream of ICF mutations, we found that ZBTB24, CDCA7 and HELLS mutations affect CpG-poor regions with heterochromatin features. Among these, we identified clusters of coding and non-coding genes mostly expressed in a monoallelic manner and implicated in neuronal development, consistent with the clinical spectrum of these patients’ subgroups. Hence, beyond providing blood-based biomarkers of dysfunction of ICF factors, our comparative study unveiled new players to consider at certain heterochromatin regions of the human genome.
Introduction
Cytosine DNA methylation is among the best-studied epigenetic modifications in vertebrates and is essential for normal embryonic development (1,2). After global erasure of DNA methylation patterns during early preimplantation development and germline formation, re-establishment of new patterns of DNA methylation requires the catalytic activity of the DNA methyltransferases (DNMT) DNMT3A and DNMT3B to add methyl-groups to unmodified DNA de novo, whereas DNMT1 is responsible for maintaining methylation patterns through cell divisions (3). Hence, genome-wide distribution of DNA methylation is highly dynamic during early development. In contrast, once established, methylation patterns are very stable in somatic cells. Importantly, DNA methylation is not randomly distributed throughout the genome, and is preferentially found at repetitive sequences and transposable elements, imprinted loci and inactive X-chromosome in female cells, where it is thought to control illicit transcription, recombination or transposition in somatic cells (4). Against all odds, CpG-dense regions known as CpG islands (CGI) and often located at gene promoters are largely devoid of DNA methylation, with the exception of CGIs found at promoters of genes that are developmentally regulated, associated with the germinal programs, linked to the inactive X chromosome in females or at DMRs (differentially methylated regions) of imprinted genes (5). Promoter methylation has been traditionally associated with gene repression although the link might not be that straightforward (6).
Given the pivotal role of DNA methylation in the control of gene expression and key biological processes, it is not surprising that perturbed DNA methylation patterns are hallmarks of many human diseases (7,8). In this context, the DNA methylation machinery that classically comprises writers (DNMTs), erasers [the recently discovered ten-eleven-translocation (TET) enzymes involved in DNA demethylation (9)], and readers [methyl DNA-binding proteins (MBDs) (10)] is frequently perturbed. Importantly, the existence of monogenic diseases established the causal link between the perturbation of these factors, perturbed DNA methylation landscapes and the emergence of abnormal phenotypes. Search for the causes of such inherited diseases has recently fueled the field of DNA methylation with candidate players whose link with DNA methylation was not obvious.
The ICF syndrome (Immunodeficiency with Centromeric instability and Facial anomalies syndrome; OMIM no. 242860) is a very rare autosomal recessive disorder, with less than 70 cases reported worldwide, which is most notably characterized by primary immunodeficiency (11). Recurrent infections are the presenting symptom, usually in early childhood. More variable features include mild facial anomalies, intellectual disability, congenital malformations and developmental delay (12). The ICF syndrome was referred to as a heterochromatin disease because hypomethylation markedly affects facultative heterochromatin on the inactive X chromosome in females and constitutive heterochromatin found at pericentromeric satellite repeats: Satellites type II (Sat II) of chromosomes 1, and 16, and to a lesser extent Satellites type III (Sat III) of chromosome 9. Hypomethylation of satellite DNA is an invariant hallmark of the syndrome that leads to heterochromatin decondensation and chromosomal anomalies, and is used to establish diagnosis (13). Mutations in the DNMT3B gene were the first identified genetic cause of the syndrome (14). However, they accounted for only 55% of the cases, classified as ICF sub-type 1 (ICF1), suggesting a genetic heterogeneity consistent with a notable molecular heterogeneity whereby ICF patients without DNMT3B mutation exhibit additional hypomethylation at centromeric alpha-satellite (α-Sat) repeats (15,16). The clinical picture also highlighted distinct genotype–phenotype relationship, with patients negative for DNMT3B mutations showing greater incidence of intellectual disability whereas patients with mutations in DNMT3B exhibit a more severe immunodeficiency (17,18).
The search for the causes of the ICF syndrome in these patients identified three additional mutated genes: ZBTB24 (ICF2) (19), and more recently CDCA7 (ICF3) and HELLS (ICF4) (20). This led to the striking assumption that factors devoid of DNMT activity and with virtually unknown functions are required for DNA methylation, at least at (peri)centromeric repeats, and maintenance of genome stability.
Here, we performed a comparative analysis of perturbed DNA methylation landscapes in a cohort of ICF patients using an array-based assay to (i) evaluate the similarities and differences in methylation landscapes depending on patient genotypes as an index of a functional link between the four ICF factors, (ii) identify genomic regions that rely on DNMT3B, ZBTB24, CDCA7 or HELLS for their methylation status and (iii) assess genomic and epigenomic contexts of regions prone to loss of DNA methylation as a consequence of ICF mutations.
Results
Comparative analysis of DNA methylation landscapes discriminates ICF1 from ICF2, 3 and 4 patients
We compared genome-wide DNA methylation profiles from whole blood of ICF patients to avoid culture- and transformation-induced changes in DNA methylation profiles and as a good non-invasive cross-tissue proxy (21,22). We analyzed 15 ICF patients representing all four genotypes together with 10 age- and gender-matched non-affected (NA) donors as controls. Patient mutations and references to their clinical records are listed in Supplementary Material, Table S1. The reduced methylation at Sat II repeats in all ICF patients, and at α-Sat only in ICF2, 3 and 4, was previously verified by Southern blot (23) as well as by bisulfite conversion-based assays (Supplementary Material, Fig. S1). We then used the Illumina Infinium HumanMethylation450 BeadChip (HM450K) that assays >485 000 CpGs throughout the human genome, covering 99% of Refseq genes, although approximately 25% of the probes were designed in intergenic regions (24). After quality control and normalization (Supplementary Material), 479 259 probes were retained for further analysis. We calculated the median DNA methylation β-values and standard deviation (SD) for each probe within each group of samples (NA or ICF subtypes), except for ICF4 for which only one patient is available, to evaluate methylation levels at each CpG covered by the array and to assess their inter-individual variability. Probes were considered as differentially methylated (DMP, differentially methylated probe) between two genotypes when showing a difference in median β-value of at least 0.2 (Supplementary Material, Fig. S2A), a threshold recommended to detect differential methylation from HM450K with 99% confidence (25). For most DMPs, inter-individual variability remained low (median < 0.1; IQR values: NA = 0.06, ICF1 = 0.09, ICF2 = 0.05 and ICF3 = 0.1) (Supplementary Material, Fig. S2B), confirming the reliability of the differences in DNA methylation values reported in this study. DMPs between ICF and NA subjects represented 2% (ICF1) to 3% (ICF2, 3 and 4) of total probes, of which between 68 and 82% where less methylated in ICF samples compared with NA subjects (Supplementary Material, Fig. S2C). The comparison of DNA methylation values between ICF patients taken as a whole and NA subjects identified 27 469 DMPs on autosomes, the majority being hypomethylated (HypoMPs: 19 689 probes; 72% of total DMPs) compared with hypermethylated probes (HyperMPs: 7780 probes; 28% of total DMPs) (Supplementary Material, Fig. S3A), in agreement with the widespread DNA hypomethylation characteristic of the disease (13).
Distribution of autosomal DMPs in ICF patients relative to CpG context (Illumina annotation)
. | Hypomethylated probes (HypoMPs) . | Hypermethylated probes (HyperMPs) . | ||||||
---|---|---|---|---|---|---|---|---|
. | ICF1 . | ICF2 . | ICF3 . | ICF4 . | ICF1 . | ICF2 . | ICF3 . | ICF4 . |
Total number | 6942 | 8414 | 9623 | 8708 | 1921 | 2661 | 2166 | 4120 |
% in CpG Island | 36% | 3% | 4% | 4% | 3% | 11% | 15% | 44% |
(nb probes) | (2523) | (232) | (348) | (351) | (67) | (294) | (328) | (1808) |
% in CG Shore | 28% | 9% | 10% | 13% | 18% | 29% | 29% | 24% |
(nb probes) | (1917) | (798) | (978) | (1092) | (340) | (775) | (632) | (985) |
% in CG Shelf | 6% | 11% | 11% | 12% | 16% | 16% | 13% | 5% |
(nb probes) | (429) | (896) | (1067) | (1075) | (300) | (424) | (282) | (225) |
% in Open sea | 30% | 77% | 75% | 71% | 63% | 63% | 43% | 27% |
(nb probes) | (2070) | (6488) | (7230) | (6190) | (1214) | (1214) | (924) | (1102) |
. | Hypomethylated probes (HypoMPs) . | Hypermethylated probes (HyperMPs) . | ||||||
---|---|---|---|---|---|---|---|---|
. | ICF1 . | ICF2 . | ICF3 . | ICF4 . | ICF1 . | ICF2 . | ICF3 . | ICF4 . |
Total number | 6942 | 8414 | 9623 | 8708 | 1921 | 2661 | 2166 | 4120 |
% in CpG Island | 36% | 3% | 4% | 4% | 3% | 11% | 15% | 44% |
(nb probes) | (2523) | (232) | (348) | (351) | (67) | (294) | (328) | (1808) |
% in CG Shore | 28% | 9% | 10% | 13% | 18% | 29% | 29% | 24% |
(nb probes) | (1917) | (798) | (978) | (1092) | (340) | (775) | (632) | (985) |
% in CG Shelf | 6% | 11% | 11% | 12% | 16% | 16% | 13% | 5% |
(nb probes) | (429) | (896) | (1067) | (1075) | (300) | (424) | (282) | (225) |
% in Open sea | 30% | 77% | 75% | 71% | 63% | 63% | 43% | 27% |
(nb probes) | (2070) | (6488) | (7230) | (6190) | (1214) | (1214) | (924) | (1102) |
Distribution of autosomal DMPs in ICF patients relative to CpG context (Illumina annotation)
. | Hypomethylated probes (HypoMPs) . | Hypermethylated probes (HyperMPs) . | ||||||
---|---|---|---|---|---|---|---|---|
. | ICF1 . | ICF2 . | ICF3 . | ICF4 . | ICF1 . | ICF2 . | ICF3 . | ICF4 . |
Total number | 6942 | 8414 | 9623 | 8708 | 1921 | 2661 | 2166 | 4120 |
% in CpG Island | 36% | 3% | 4% | 4% | 3% | 11% | 15% | 44% |
(nb probes) | (2523) | (232) | (348) | (351) | (67) | (294) | (328) | (1808) |
% in CG Shore | 28% | 9% | 10% | 13% | 18% | 29% | 29% | 24% |
(nb probes) | (1917) | (798) | (978) | (1092) | (340) | (775) | (632) | (985) |
% in CG Shelf | 6% | 11% | 11% | 12% | 16% | 16% | 13% | 5% |
(nb probes) | (429) | (896) | (1067) | (1075) | (300) | (424) | (282) | (225) |
% in Open sea | 30% | 77% | 75% | 71% | 63% | 63% | 43% | 27% |
(nb probes) | (2070) | (6488) | (7230) | (6190) | (1214) | (1214) | (924) | (1102) |
. | Hypomethylated probes (HypoMPs) . | Hypermethylated probes (HyperMPs) . | ||||||
---|---|---|---|---|---|---|---|---|
. | ICF1 . | ICF2 . | ICF3 . | ICF4 . | ICF1 . | ICF2 . | ICF3 . | ICF4 . |
Total number | 6942 | 8414 | 9623 | 8708 | 1921 | 2661 | 2166 | 4120 |
% in CpG Island | 36% | 3% | 4% | 4% | 3% | 11% | 15% | 44% |
(nb probes) | (2523) | (232) | (348) | (351) | (67) | (294) | (328) | (1808) |
% in CG Shore | 28% | 9% | 10% | 13% | 18% | 29% | 29% | 24% |
(nb probes) | (1917) | (798) | (978) | (1092) | (340) | (775) | (632) | (985) |
% in CG Shelf | 6% | 11% | 11% | 12% | 16% | 16% | 13% | 5% |
(nb probes) | (429) | (896) | (1067) | (1075) | (300) | (424) | (282) | (225) |
% in Open sea | 30% | 77% | 75% | 71% | 63% | 63% | 43% | 27% |
(nb probes) | (2070) | (6488) | (7230) | (6190) | (1214) | (1214) | (924) | (1102) |
Unsupervised hierarchical clustering on the basis of DNA methylation β-values clearly discriminated ICF from NA subjects (Fig. 1A), but also differentiated patients with mutations in DNMT3B (ICF1) from those with mutations in ZBTB24, CDCA7 or HELLS (ICF2, 3 or 4 respectively) (Fig. 1A). Heat map representation of the β-values of the 27 469 autosomal DMPs further illustrated that NA subjects, ICF1 patients, and ICF2, 3, 4 patients form distinct groups (Fig. 1A). We first verified that this clustering did not result from confounding effect of blood cell composition. Indeed, unsupervised clustering on the basis of β-values of a set of 333 probes known to vary with blood cell type (26) did not particularly segregate subjects according to their genotype (Supplementary Material, Fig. S2D). More specifically, we found a higher number of HyperMPs than HypoMPs probes within this reference data set (ICF1: 50 HyperMPs, 1 HypoMP; ICF2: 38 HyperMPs, 2 HypoMPs; ICF3: 11 HyperMPs, 1 HypoMP; ICF4: 7 HyperMPs, 2 HypoMPs). These data strongly suggest that the identification of HypoMPs, but not that of HyperMPs, is not influenced by blood cell composition. We then performed a pair-wise comparison of median β-values of individual DMPs. The relatively high correlation coefficient between ICF1 and NA subtypes (r = 0.74) and, to a slightly lesser extent, between ICF2, 3 and 4 and NAs (r = 0.63 to 0.68; Supplementary Material, Fig. S3B), is consistent with only 2% (ICF1) to 3% (ICF2, 3 or 4) of the probes being affected by changes in DNA methylation levels. However, we found a high correlation between ICF2, 3 and 4 patients, the strongest correlation being between ICF2 and ICF3 (Pearson correlation coefficient r = 0.85) (Fig. 1B), with almost 65% of DMPs in common (Supplementary Material, Fig. S3A). A high correlation was also found between ICF2 and ICF4 as well as between ICF3 and ICF4 (r > 0.7; >50% common DMPs). In clear contrast, the lowest correlation was found between ICF1 and any other ICF subtype (r < 0.5; Fig. 1B), with less than 20% of the DMPs being common to ICF1 and ICF2, 3 or 4 patients (Supplementary Material, Fig. S3A and B).

Genome wide DNA methylation profiling distinguishes two groups of ICF patients. (A) Unsupervised hierarchical clustering of ICF patients (ICF1, ICF2, ICF3 and ICF4) and NA subjects, on the basis of β-values of the DMPs mapping to autosomes after normalization (27 469 probes retained for analysis). Below is the heat map of β-values of the 27 469 probes located on autosomes that showed a difference of DNA methylation of at least 20% (delta β≥0.2) between at least one ICF sub-type and NA subjects. High methylation levels are shown in red and low methylation levels in blue, according to the scale bar next to the figure. Each column represents a patient and each row represents a probe. (B) Correlation plot comparing the β-values of the 27 469 probes between 2 ICF sub-types. The plot shows the median β-values of one ICF subtype versus another one, and probes in red or blue represent DMPs between two genotypes. DMPs shown in red are significantly more methylated in the ICF subtype indicated on the Y axis and probes in blue are significantly more methylated in the ICF subtype indicated on the X axis (differences >20%). β-Values of probes that do not change between two genotypes are shown as grey dots. The black dashed line represents a linear regression on the basis of β-values of all probes analyzed and black line represents a linear regression on the basis of DMPs. The Pearson correlation coefficient (r) is indicated above each scatter plot. (C and D) Distribution of DMPs relative to CpG context. Bar plots show the distribution of (C) HypoMPs and (D) HyperMPs relative to CpGs islands, shores, shelves and open sea in the different ICF subtypes. The distribution of the HM450K probes relative to each CpG context is also indicated. P-values (Chi-square test) assessing significant enrichment in a given category relative to the HM450K array composition are indicated (***P < 0.001).
Altogether, the analysis of DNA methylation patterns in ICF patients clearly segregated two groups, i.e. patients carrying mutations in DNMT3B from those who do not. In support of the high correlation between methylation landscapes in patients with mutations in ZBTB24 (ICF2) and CDCA7 (ICF3), we found that, in both low-passage fibroblasts and lymphoblastoid cell lines (LCLs) from patients, CDCA7 expression was dependent on the integrity of ZBTB24 not only at the RNA level (Supplementary Material, Fig. S4A and B), consistent with data from a previous study (27), but also at the protein level (Supplementary Material, Fig. S4C). Together with the co-immunoprecipitation of endogenous ZBTB24 with CDCA7 and HELLS proteins (Supplementary Material, Fig. S4D), these data are suggestive of a substantial connection between these ICF factors that contributes to the observed strong correlation between methylation landscapes in ICF2, ICF3 and ICF4 patients (Fig. 1B).
Distinct genomic context of DMPs in ICF1 compared with ICF2, 3 and 4 patients
Because the clustering of DNA methylation data clearly suggested that the genomic regions affected in ICF are distinct depending on patient genotype, we first examined the genomic and CpG contexts of hypomethylated probes. On the basis of CpG density, CpGs on the HM450K array have been assigned to CpG islands (CGI), ‘shores’ (2-kbs flanking CGIs), ‘shelves’ (2 kbs extending from shores), and ‘open sea’ (isolated CpGs in the rest of the genome). The distribution of the median β-values of the 27 469 autosomal DMPs was represented on cumulative histograms for each CpG category (Supplementary Material, Fig. S5A). Compared with NA subjects, DNMT3B mutations in ICF1 patients were associated with a global decrease in CpG methylation, illustrated by a shift of the number of probes with high to low β-values, most prominently observed at CGIs and shores whereas the distribution of probes was globally unchanged in shelf and open sea between NA and ICF1 (Supplementary Material, Fig. S5A). In addition, the percentage of HypoMPs was statistically significantly enriched in CGIs and shore compared with their representation on the HM450K array (Fig. 1C and Table 1). This was in clear contrast to ICF2, 3 and 4 patients where the overall distribution of β-values in CGIs remained similar to that in NA subjects, whereas a prominent decrease in CpG methylation within open sea was observed (Fig. 1C and Supplementary Material, Fig. S5A, Table 1). Increased DNA methylation, although not the main feature of ICF syndrome (Supplementary Material, Fig. S2A and C), mostly affected open sea in ICF1, 2 and 3 whereas CGIs seemed to be the main targets in ICF4 patient (Fig. 1D and Table 1).
We then categorized DMPs into gene feature groups using Illumina annotations: we grouped probes annotated upstream a transcription start site (TSS: TSS200, TSS1500), at gene starts, 5′ UTR and first exons into a ‘regulatory regions’ category, whereas annotations of the other probes defined gene body, 3′ UTR, and intergenic categories (25). The distribution of HypoMPs and HyperMPs in these categories was similar for all patient subtypes (Supplementary Material, Fig. S5B). HypoMPs were enriched in intergenic regions (≈ 40% of HypoMPs) compared with their representation on the array (25%), and, to a lesser extent, in gene regulatory and body categories (≈ 20% of HypoMPs) although they are more represented on the array (35 and 30% of total probes, respectively). The distribution of HyperMPs did not show significant differences among ICF patients (Supplementary Material, Fig. S5C).
As a whole, the distinctive CpG density context in which HypoMPs are located depending on whether patients have mutations in DNMT3B or not, strongly suggests that the three new factors incriminated in ICF syndrome are generally not functionally related to DNMT3B, and hence, supports the view that ZBTB24, CDCA7 and HELLS are not mere guides for targeting DNMT3B at DNA methylation sites.
Unifying features in ICF patients
Because hypomethylation of Sat II DNA repeats is a hallmark of all ICF patients (13), we analyzed the methylation status of the 76 683 probes (16% of total probes) designed in various repeated sequences, for the four ICF subtypes compared with healthy subjects. We intersected the dataset of probes on the HM450K array with the Repeat Masker track available from the UCSC genome browser to identify probes mapping to annotated repetitive elements of the human genome. When compared with HypoMPs datasets, we found that less than 4% of probes designed in DNA repeats were hypomethylated in ICF patients, suggesting that their global hypomethylation in not a general hallmark of the ICF syndrome. The breakdown of repeat-associated probes into major types of repeats showed that the distribution of HypoMPs within interspersed repeats (LINE, SINE, LTR, etc.) was barely statistically different from that on the array. In contrast, HypoMPs within satellite repeats gave the highest score with up to 15% of probes designed in satellite repeats being hypomethylated in ICF1 patients (Supplementary Material, Fig. S6A). In addition to the characteristic hypomethylation of Sat II and III repeats located in cytobands 1q12, 16q11.2 or 9q12, respectively, further breakdown into different classes of satellite repeats confirmed the already known hypomethylation of juxta-centromeric SST1 repeats (also known as NBL2) (28) on chromosomes 7, 16 and 17. We also identified alphoid α-Sat repeats (ALR/Alpha) in the centromere of chromosomes 7, 10, 16 and 19, centromeric or centromere-adjacent non-alphoid beta-satellites (BSR/Beta) on chromosomes 7, 8, 16 and 19, blocks of gamma-satellites (GSAT and GSATII) in 8q11.1 and 12p11.1. We also identified non-centromeric SATR1 satellite repeats present on several chromosomes arms. As shown on the heat map (Supplementary Material, Fig. S6B), hypomethylation of various classes of satellite repeats in more pronounced in ICF1 than in ICF2, 3 or 4 patients, with the exception of centromeric alpha- and beta-satellite repeats, the known distinguishing feature of ICF2, 3 and 4 subtypes.
Because ICF patients share molecular and phenotypic features (17,18), we also evaluated the perturbed methylation landscapes common to all patients, outside of DNA repeats. We found that 433 probes were hypomethylated in all ICF patients regardless of their genotype (Supplementary Material, Fig. S6C). Gene Ontology (GO) analysis of genes linked to these common HypoMPs showed enrichment for biological processes associated with Olfactory Receptor (OR), the largest family of genes scattered throughout the genome, and PCDH genes (Supplementary Material, Table S2). Indeed, the most prominently affected locus in all ICF subtypes was a large region spanning around 750 kb on chromosome 5q31.3, containing three gene clusters encoding protocadherins alpha (PCDHA), beta (PCDHB) and gamma (PCDHG), although their hypomethylation was more pronounced in ICF1 patients (Supplementary Material, Fig. S7A and B). We chose PCDHB13 and PCDHB18 in this cluster as examples of genes with medium or high methylation values in NA subjects, respectively, to validate their CpG hypomethylation using COBRA assay (Supplementary Material, Validation of DNA methylation data) on genomic DNA from blood cells of ICF patients (Supplementary Material, Fig. S7C). We confirmed the loss of CpG methylation at PCDHB13 promoter in all ICF patients, and at PCDHB18 mostly in ICF1 cells. For each experimental validation, the median β-values (mβ) ± standard deviation (SD) of the corresponding HM450K probes are provided in Supplementary Material, Table S3 for comparison.
Distribution of X-linked HypoMPs in female ICF patients with respect to CpG context (Illumina annotations)
. | X-linked HypoMPs in female ICF subtypes . | ||
---|---|---|---|
ICF1 . | ICF2 . | ICF3 . | |
Total number of HypoMPs | 3377 | 750 | 1071 |
% in CpG Island | 58% | 4% | 7% |
(nb probes) | (1965) | (32) | (71) |
% in CG Shore | 23% | 21% | 20% |
(nb probes) | (762) | (154) | (209) |
% in CG Shelf | 4% | 19% | 21% |
(nb probes) | (139) | (140) | (222) |
% in Open sea | 15% | 57% | 53% |
(nb probes) | (511) | (424) | (569) |
. | X-linked HypoMPs in female ICF subtypes . | ||
---|---|---|---|
ICF1 . | ICF2 . | ICF3 . | |
Total number of HypoMPs | 3377 | 750 | 1071 |
% in CpG Island | 58% | 4% | 7% |
(nb probes) | (1965) | (32) | (71) |
% in CG Shore | 23% | 21% | 20% |
(nb probes) | (762) | (154) | (209) |
% in CG Shelf | 4% | 19% | 21% |
(nb probes) | (139) | (140) | (222) |
% in Open sea | 15% | 57% | 53% |
(nb probes) | (511) | (424) | (569) |
In bold are the most represented CpG contexts for HypoMPs in each ICF subtype.
Distribution of X-linked HypoMPs in female ICF patients with respect to CpG context (Illumina annotations)
. | X-linked HypoMPs in female ICF subtypes . | ||
---|---|---|---|
ICF1 . | ICF2 . | ICF3 . | |
Total number of HypoMPs | 3377 | 750 | 1071 |
% in CpG Island | 58% | 4% | 7% |
(nb probes) | (1965) | (32) | (71) |
% in CG Shore | 23% | 21% | 20% |
(nb probes) | (762) | (154) | (209) |
% in CG Shelf | 4% | 19% | 21% |
(nb probes) | (139) | (140) | (222) |
% in Open sea | 15% | 57% | 53% |
(nb probes) | (511) | (424) | (569) |
. | X-linked HypoMPs in female ICF subtypes . | ||
---|---|---|---|
ICF1 . | ICF2 . | ICF3 . | |
Total number of HypoMPs | 3377 | 750 | 1071 |
% in CpG Island | 58% | 4% | 7% |
(nb probes) | (1965) | (32) | (71) |
% in CG Shore | 23% | 21% | 20% |
(nb probes) | (762) | (154) | (209) |
% in CG Shelf | 4% | 19% | 21% |
(nb probes) | (139) | (140) | (222) |
% in Open sea | 15% | 57% | 53% |
(nb probes) | (511) | (424) | (569) |
In bold are the most represented CpG contexts for HypoMPs in each ICF subtype.
Distribution of HypoMPs mapping to germline genes [database (29)] with respect to CpG context (Illumina annotation)
. | Females/HypoMPs . | Males/HypoMPs . | ||||
---|---|---|---|---|---|---|
ICF1 . | ICF2 . | ICF3 . | ICF1 . | ICF2 . | ICF4 . | |
Total number of HypoMPs | 747 | 392 | 384 | 454 | 272 | 275 |
% in CpG Island | 51% | 4% | 5% | 63% | 5% | 7% |
(nb probes) | (378) | (10) | (18) | (284) | (13) | (19) |
% in CG Shore | 24% | 16% | 15% | 24% | 11% | 19% |
(nb probes) | (181) | (61) | (57) | (108) | (31) | (51) |
% in CG Shelf | 2% | 6% | 3% | 3% | 5% | 5% |
(nb probes) | (15) | (22) | (12) | (14) | (13) | (13) |
% in Open sea | 23% | 76% | 77% | 11% | 79% | 70% |
(nb probes) | (174) | (299) | (298) | (48) | (215) | (192) |
. | Females/HypoMPs . | Males/HypoMPs . | ||||
---|---|---|---|---|---|---|
ICF1 . | ICF2 . | ICF3 . | ICF1 . | ICF2 . | ICF4 . | |
Total number of HypoMPs | 747 | 392 | 384 | 454 | 272 | 275 |
% in CpG Island | 51% | 4% | 5% | 63% | 5% | 7% |
(nb probes) | (378) | (10) | (18) | (284) | (13) | (19) |
% in CG Shore | 24% | 16% | 15% | 24% | 11% | 19% |
(nb probes) | (181) | (61) | (57) | (108) | (31) | (51) |
% in CG Shelf | 2% | 6% | 3% | 3% | 5% | 5% |
(nb probes) | (15) | (22) | (12) | (14) | (13) | (13) |
% in Open sea | 23% | 76% | 77% | 11% | 79% | 70% |
(nb probes) | (174) | (299) | (298) | (48) | (215) | (192) |
Distribution of HypoMPs mapping to germline genes [database (29)] with respect to CpG context (Illumina annotation)
. | Females/HypoMPs . | Males/HypoMPs . | ||||
---|---|---|---|---|---|---|
ICF1 . | ICF2 . | ICF3 . | ICF1 . | ICF2 . | ICF4 . | |
Total number of HypoMPs | 747 | 392 | 384 | 454 | 272 | 275 |
% in CpG Island | 51% | 4% | 5% | 63% | 5% | 7% |
(nb probes) | (378) | (10) | (18) | (284) | (13) | (19) |
% in CG Shore | 24% | 16% | 15% | 24% | 11% | 19% |
(nb probes) | (181) | (61) | (57) | (108) | (31) | (51) |
% in CG Shelf | 2% | 6% | 3% | 3% | 5% | 5% |
(nb probes) | (15) | (22) | (12) | (14) | (13) | (13) |
% in Open sea | 23% | 76% | 77% | 11% | 79% | 70% |
(nb probes) | (174) | (299) | (298) | (48) | (215) | (192) |
. | Females/HypoMPs . | Males/HypoMPs . | ||||
---|---|---|---|---|---|---|
ICF1 . | ICF2 . | ICF3 . | ICF1 . | ICF2 . | ICF4 . | |
Total number of HypoMPs | 747 | 392 | 384 | 454 | 272 | 275 |
% in CpG Island | 51% | 4% | 5% | 63% | 5% | 7% |
(nb probes) | (378) | (10) | (18) | (284) | (13) | (19) |
% in CG Shore | 24% | 16% | 15% | 24% | 11% | 19% |
(nb probes) | (181) | (61) | (57) | (108) | (31) | (51) |
% in CG Shelf | 2% | 6% | 3% | 3% | 5% | 5% |
(nb probes) | (15) | (22) | (12) | (14) | (13) | (13) |
% in Open sea | 23% | 76% | 77% | 11% | 79% | 70% |
(nb probes) | (174) | (299) | (298) | (48) | (215) | (192) |
The categorization of probes on the basis of CpG density indicated that nearly 60% of the HypoMPs common to all ICF subtypes mapped to open sea (Supplementary Material, Fig. S7D), the genomic context that is the most prominently affected in ICF2, 3 and 4 patients (Supplementary Material, Fig. S5A). In addition, these common HypoMPs were also enriched in intergenic regions (Supplementary Material, Fig. S7D). Altogether, we could infer from these data that the four ICF factors are functionally related at a few hundreds of common loci, where ZBTB24, CDCA7 and HELLS may serve as guides for the recruitment of DNMT3B at isolated CpGs in intergenic regions.
Preferential hypomethylation of CpG islands in ICF1 patients
In agreement with ICF1 patients forming a distinct group, 80% of HypoMPs found in ICF1 were specifically hypomethylated in this subtype and not in ICF2, 3 or 4 patients (5600 HypoMPs; mean β-value = 0.39 in ICF1; 0.75 in NA and 0.78 in ICF2, 3 and 4 subjects; Student’s t-test ICF1 versus NA or ICF2, 3 or 4 P < 2.2 × 10−16) (Supplementary Material, Fig. S6A). Among these HypoMPs, we found several classes of known heavily methylated CGIs in germline gene promoters and DMR that control imprinted genes expression, as detailed later. Because methylated CGI methylation is also a known feature of the inactive X chromosome in females, we extended our methylation analysis to the sex chromosomes of patients.
In male ICF patients, hypomethylation on the active X chromosome was barely detectable (Supplementary Material, Fig. S8A), suggesting that the hypomethylation detected in female ICF patients mainly comes from the inactive X chromosome (Xi). In ICF1 female patients, 30% of the probes designed on the X chromosome (3377 of 10 316 probes) were hypomethylated (Fig. 2A and B;Supplementary Material, Fig. S8B), mostly at CGIs (58% of HypoMPs; Table 2). The X chromosome was also affected in ICF2 and ICF3 female patients (Fig. 2A;Supplementary Material, Fig. S8B), although to a lesser extent than in ICF1 (Fig. 2B), and mainly in open sea (57% of HypoMPs in ICF2 and 53% in ICF3; Table 2). Around half of the few X-linked HypoMPs common to all females ICF (181 probes; Supplementary Material, Fig. S8B) corresponded to open sea and promoter-proximal contexts (Supplementary Material, Fig. S8C and D). On the Y chromosome, 7–10% of the probes were hypomethylated, mainly within CGIs in ICF1 or open sea in ICF2 and ICF4 male patients (Supplementary Material, Fig. S8E). We further validated the loss of CpG methylation at MAGEB10 and MAGEB3, whose promoters are heavily methylated in healthy controls, using COBRA (Supplementary Material, Fig. S9A) and pyrosequencing (Fig. 2C and Supplementary Material, Fig. S9B and Table S3). In addition, MSRE-qPCR validated that hypomethylation of isolated CpGs within MAGEB3 promoter was common to all ICF female patients (Supplementary Material, Fig. S9C).

DNA methylation profiling of the X chromosome in female ICF patients. (A) Heat map of β values of the 10 316 probes analyzed after quality control and mapping to the X chromosome, in ICF female patients. ICF subtypes are indicated above and name of patients below the heat map. (B) Histogram showing the percentage of X-linked hypomethylated probes (HypoMPs) within each ICF subtype. (C) DNA methylation levels of CpGs within the promoter of MAGEB10 promoter using pyrosequencing. The methylation levels at each CpG were measured in NA male and female subjects (CTL, M or F; grey lines) and ICF1 patients (P5 and P3; orange lines). Dashed orange line represents CpG methylation levels at the X-linked MAGEB10 promoter in ICF1 female patient. Error bars represent standard error. Significant P-values from Student’s t-test are indicated by asterisks (*P < 0.05; **P < 0.01).
GO analysis of genes linked to hypomethylated probes as a consequence of DNMT3B mutations in ICF1 cells, revealed a highly significant enrichment for biological processes related to germ cells (Supplementary Material, Fig. S10A and Table S4). Hence, we compared our list of genes linked to hypomethylated regulatory regions to lists of genes whose expression has been identified as elevated compared with other tissues [The Human Protein Atlas (www.proteinatlas.org/humanproteome/testis)] or highly enriched in germ cells (29). Because many germline genes are linked to the X chromosome, we used datasets of HypoMPs according to the sex of the patients. Out of the ≈2000 genes identified to have elevated expression levels in germ cells compared with other tissues [The Human Protein Atlas and (29)], we identified 437 (1155 probes) and 347 (795 probes) genes with abnormally hypomethylated regulatory regions in blood of female and male ICF patients, respectively, which could be reduced to 178 and 140 genes when considering the category ‘tissue enriched’ or to 77 and 63 genes for the ‘Highly enriched’ category of these databases (Supplementary Material, Table S5). As suspected from previous studies (23,30), and as illustrated by the heatmap representation of CpG methylation levels within the promoters of these genes, hypomethylation of germline gene was a strong signature of ICF1 (Fig. 3A and Table 3). In contrast, the reduced CpG methylation at germline genes was less dramatic in ICF2, 3 and 4 than in ICF1 patients (Fig. 3B), and again, mainly affected open sea (Table 3). We confirmed that the loss of methylation at highly methylated CGI promoters of MAEL, SPO11 and TDRD6 was more prominent in ICF1 patients than in ICF2, 3 and 4 patients, using COBRA (Supplementary Material, Fig. S10B, Table S3) and pyrosequencing (Fig. 3C and Supplementary Material, Fig. S10C). In open sea of ADAM2, shore of SPO11 and within CGI of C15ORF60, the more pronounced CpG hypomethylation found in ICF2, 3 and 4 patients compared with ICF1 (Supplementary Material, Table S3) was confirmed by COBRA assay (Supplementary Material, Fig. S10D).

DNA methylation profiling of germ line genes in ICF patients. (A) Heat maps of β values of 1155 and 795 HypoMPs within the promoter of germline genes (according to The Human Protein Atlas) identified respectively in female (top) and male (bottom) ICF patients. DNA methylation levels are indicated as in the legend to Figure 1. (B) Histograms showing the percentage of hypomethylated probes (HypoMPs) in germ line genes within each ICF subtype. (C) DNA methylation levels of CpGs within CGI of MAEL promoter using pyrosequencing. The methylation levels at each CpG were measured in NA male and female subjects (CTL, M or F; grey lines) and ICF1 patients (P5 and P3; orange lines). Error bars represent standard error. Significant P-values from Student’s t-test are indicated by asterisks (*P < 0.05; **P < 0.01).
Because a subset of CGIs overlap with DMRs of imprinted genes (31), we annotated hypomethylated CGIs in ICF patients relative to the position of imprinted DMRs identified in previous studies (32,33). We found that the 61 DMRs represented on the array remained unaffected in ICF patients with the exception of the DMR identified in the ZNF597 CGI-promoter, one of the three somatic DMRs described so far (33). This DMR was hypomethylated in ICF1 patients but not in ICF2, 3 and 4 patients, suggestive of a loss of somatic imprinting at this DMR in ICF1 patients (Supplementary Material, Table S3). We used COBRA, pyrosequencing and bisulphite sequencing to confirm CpG hypomethylation of ZNF597 promoter in blood cells of ICF1 patients, (Supplementary Material, Fig. S11A–C), which remained undetectable using COBRA in ICF2, ICF3 or ICF4 patients (Supplementary Material, Fig. S11A).
Alteration of DNA methylation at gene clusters in ICF2, ICF3 and ICF4
GO analysis performed on genes linked to HypoMPs common to ICF2, ICF3 and ICF4 first indicated that genes related to neuron biology and function seemed to be particularly affected in these patients (Supplementary Material, Fig. S12A, Table S6). We further confirmed their hypomethylated status in blood cells from ICF2, 3 and 4 patients, using COBRA for the promoters of Dopamine Receptor D3 (DRD) and Necdin (NDN) (Supplementary Material, Fig. S12B, Table S3), and MSRE-qPCR for the promoter of Neurexin 3 (NRXN3) gene (Supplementary Material, Fig. S12C, Table S3). As shown by GO analysis, we also observed that the 4569 HypoMPs (484 linked genes) common to ICF2, ICF3 and ICF4 subtypes (Supplementary Material, Fig. S13A) mapped to several clusters of functionally related genes (Supplementary Material, Fig. S13B) including, in addition to PCDH clusters mentioned earlier (Supplementary Material, Fig. S6B), Olfactory Receptors (OR) on several chromosomal locations, Late Cornified Envelope (LCE), small proline-rich protein (SPRR), NLR Family Pyrin Domain Containing (NLRP), Keratin Associated Protein (KRTAP), Zinc Finger proteins (ZNF), and Defensin β (DEFB) clusters (Supplementary Material, Fig. S13C, Table S7). In addition to these clusters of coding genes, our analysis also revealed CpG hypomethylation within several clusters of small regulatory non-coding RNAs (ncRNA) hosted in imprinted loci (34,35). Among these, we identified HypoMPs on chromosome 15 and 19, respectively, linked to large clusters of small nucleolar RNAs (snoRNAs) embedded within the Prader-Willi locus (PWS), and to the largest microRNA gene cluster of the human genome, C19MC (Fig. 4A and B, Supplementary Material, Fig. S13C).

CpG hypomethylation within clusters of non-coding RNA genes located on chromosomes 15 and 19. (A) Heat map of β values of 479 probes on chromosome 15 including the SNORD cluster embedded in the Prader–Willi Syndrome locus and (B) heat map of β values of 21 DMPs within the C19MC cluster located on chromosome 19. Genomic coordinates are indicated on the right of the heat map. DNA methylation levels and category of subjects are indicated as in legend to Figure 1. (C–E) Validation of HM450K DNA methylation data using various techniques depending on CpG context (see Supplementary Methods section). CpG methylation (C) at the SNORD115-9 promoter was determined by MSRE-qPCR, (D) at the SNORD115-14 promoter by bisulfite sequencing and (E) at the MIR521–2 promoter by MSRE-qPCR. For MSRE-qPCR, the probe ID and the methylation-sensitive enzyme are indicated in parenthesis under the gene name. For bisulfite sequencing, white and black dots represent unmethylated and methylated CpGs, respectively, within the analyzed sequence (black line). (F) DNA methylation of CpGs within the promoter of MIR518D gene using pyrosequencing assay. The methylation level of this CpG (probe ID: cg10583119) was measured in non-affected (CTL, M or F, age; in grey) and ICF subjects with different genotypes. Error bars represent standard error. Significant P-values from Student’s t-test are indicated by asterisks (*P < 0.05; **P < 0.01).
We validated the significant drop in CpG methylation levels (Supplementary Material, Fig. S13C and Table S3), almost exclusively in ICF2, ICF3 and ICF4 patients, at SNORD115–9 and SNORD115–5 using MSRE-qPCR (Fig. 4C and Supplementary Material, Fig. S14A), at SNORD115–14 using bisulfite sequencing (Fig. 4D) and COBRA (Supplementary Material, Fig. S14B), and at the C19MC locus using MSRE-qPCR at MIR521–2 and MIR518D (Fig. 4E and F;Supplementary Material, Fig. S14C). Quantification of methylation levels at the MIR518D promoter also confirmed the loss of methylation to less than 20% in ICF2, 3 and 4 patients compared with the 50% found in NA and ICF1 subjects as expected for an imprinted locus (Fig. 4F, Supplementary Material, Table S3).
Validation of potential methylation or transcriptional biomarkers in cell lines from patients
We first tested whether the HypoMPs identified as common to ICF2, ICF3 and ICF4 in whole blood were also found in other cell types. Using patients’ derived fibroblasts, we confirmed by COBRA assays that Sat II and α-Sat were hypomethylated in all ICF patients or only in ICF2, 3 and 4 patients, respectively (Supplementary Material, Fig. S15A). We then assessed methylation levels at the same markers previously used for validation, using COBRA assays (Supplementary Material, Fig. S15B and Table S3). We verified that hypomethylation of MAEL and TDRD6 promoters in ICF1 patients was conserved in fibroblasts, although we could observe a slight hypomethylation at MAEL promoter in ICF2, 3 and 4 patients fibroblasts, consistent with our previous study showing that time in culture may impact methylation profiles (23). Likewise, we confirmed almost full hypomethylation at the imprinted ZNF597 gene in fibroblasts of ICF1 patients but not in that of ICF2, 3 and 4 patients. For hypomethylated loci common to all ICF subtypes we verified that the PCDHB18 gene showed almost complete hypomethylation in fibroblasts. Analysis of genes affected in ICF2, 3 and 4 patients confirmed the hypomethylated status of SNORD115–14 in ICF2, 3 and 4, and of NDN, although to a lesser extent (Supplementary Material, Fig. S15B). Importantly, we extended this validation to low-passages skin fibroblasts derived from an ICF2 patient who was not initially included in the DNA methylation array analysis (pP patient; Supplementary Material, Fig. S15C).
Because patients LCLs are more commonly available than blood or fibroblasts, we performed the same validations in these cells (Supplementary Material, Fig. S15A and B). Data from COBRA analysis were consistent with methylation data from blood and fibroblast only at Satellite repeats, MAEL, SNORD115–14 and NDN. Hypomethylation of SNORD115–14 and NDN was observed only in ICF2, 3 and 4 but not in ICF1 patients, establishing hypomethylation at these loci as powerful biomarkers for ICF2, 3 and 4 patients, regardless of the cell type used. We also tested the potential of transcriptional biomarkers in both LCLs and fibroblasts (Supplementary Material, Fig. S16). As we have already proposed, MAEL expression is confirmed as a powerful biomarker for DNMT3B dysfunction that can be measured in any cell type derived from ICF1 patients. In contrast, increased expression of TDRD6 was not detectable in LCLs, consistent with our previous data on a panel of germline genes (23). Likewise, increased expression of genes affected in ICF2, 3 and 4 patients was highly variable among patients and cell types. Hence, in contrast to DNA methylation analysis, expression analysis did not generally allow the identification of reliable biomarkers for this DNA methylation disease.
Because defects in DNA methylation were observed in at least two cell types of patients derived from different embryonic germ layers, they may well originate from defects in establishment at early stages of development, especially in the case of most of the imprinted genes for which the methylation status is set at early stages, rather than defects in methylation maintenance. In favor of this hypothesis, we further showed that knockdown of ZBTB24, CDCA7 or HELLS expression (Supplementary Material, Fig. S17A and B) in somatic cells where DNA methylation patterns have already been established was not sufficient to perturb DNA methylation levels at unique genes, imprinted genes nor at centromeric or pericentromeric DNA repeats (Supplementary Material, Fig. S17C and D).
Perturbations of DNA methylation at genomic loci with heterochromatin hallmarks in ICF2, 3 and 4 patients
To further question epigenetic contexts of genomic loci prone to loss of DNA methylation as a consequence of ICF mutations, we assigned the differentially HypoMPs identified in ICF patients to different epigenomic contexts annotated in healthy cells. We used a recent categorization of chromatin states segmentation of the human genome into fifteen states to predict functional elements (Supplementary Material), using the ChromHMM track from the B-LCL GM12878. We also interrogated genome-wide DNA replication timing data using the Repli-seq track (Supplementary Material) to extract replication timing values available for GM12878 cells sorted into six cell cycle fractions (G1, 4 fractions in S phase, G2), for the genomic regions of interest. These analyses showed that the majority of HypoMPs were statistically enriched into the ‘Heterochromatin/low signal’ (state 13) chromatin state (Fig. 5A) and rather late replicating regions (Fig. 5B). However, there was again a striking difference between ICF1 and ICF2, 3, 4 patients. More than 90% of HypoMPs in ICF2, 3 or 4 subgroups, or common to ICF2, 3 and 4 (ICF2/3/4), mapped to the heterochromatin state compared with less than 70% in ICF1. In contrast, predicted functional promoters and enhancers represented 25% of HypoMPs in ICF1 but were virtually absent from HypoMPs common to ICF2, 3 and 4 cells (Fig. 5A). Along the same lines, more than 60% of HypoMPs in ICF2, 3, 4 mapped to late-replicating regions (categories S3, S4 and G2) compared with the same proportion mapping to early-replicating regions (G1, S1, S2) in ICF1 patients (Fig. 5B). Interestingly, HypoMPs common to all ICF patients (ICF1/2/3/4) also mapped to predicted heterochromatin regions with a late replication timing. Visualization of data using the Integrative Genomics Viewer (IGV) illustrates the coverage of heterochromatin and intergenic regions by probes on the HM450K array. We show as an example a high density of HypoMPs common to ICF2, 3 and 4 patients at clusters of OR genes on chromosome 11, in CpG-poor regions and in the heterochromatin and late replicating categories, in contrast to transcribed and early replicating nearby regions that are otherwise covered by a higher density of probes (Fig. 5C). The search for over represented motifs in HypoMPs common to ICF2, 3 and 4 compared with the bulk of the genome returned an [ATTTTA] motif which partially overlaps with consensus binding sites of known transcription factors (Supplementary Material, Fig. S18), whereas the [AAAAGAAA] motif does not match significantly to known consensus sequences. Interestingly, both of these enriched motifs can also be found in AT-rich α-Sat repeats at which the AT-hook of ZBTB24 could direct binding in certain cellular contexts.

Epigenomic features of HypoMPs in ICF syndrome. (A and B) Distribution of HypoMPs for each ICF subtypes or common to several subtypes (ICF2/3/4 and ICF1/2/3/4) relative to annotations of (A) 15 chromatin states (ChromHMM from ENCODE/Broad) and (B) replication timing (Repliseq from UW) patterns established in the GM12878 LCL. The repartition of HM450K probes within each chromatin and replication context is indicated. 450K: repartition of all HM450K probes. HypoMPs: differentially hypomethylated probes between ICF and NA subjects. P-values (Chi-square test) assessing significant enrichment in a given category relative to the HM450K array composition (***P< 0.001). (C) Visualization of epigenomic contexts prone to hypomethylation in ICF subtypes, taking the cluster of OR genes on chromosome 11 as an example. The different lanes indicate the position of Refseq genes, HypoMPs in ICF1 to 4 patients, all probes designed on the array (HM450K) and CpG islands (CGi). Chromatin states (ChromHMM) and replication timing (G1, 4 fractions in S, and G2 phases) are represented with the same color code as in the histograms in (A) and (B).
Together, these data suggest that ZBTB24, CDCA7 and HELLS are required for the methylated state of a number of genomic loci with heterochromatin hallmarks, supplementing the list of common heterochromatin loci, in addition to satellite repeats, where these factors play key functions.
Discussion
Here we report the comparative methylome analysis in patients with ICF syndrome, a monogenic disease caused by DNA methylation defects, as the first attempt to functionally link the four factors identified so far in this disease—DNMT3B, ZBTB24, CDCA7 and HELLS—through the documentation of common and distinguishing methylation defects. Indeed, the striking discovery of mutations in factors devoid of DNMT activity, and with a virtually unknown connection with DNA methylation [except for Hells in the mouse (36,37)], suggested functional redundancy and implied that ZBTB24, CDCA7 and HELLS would function as a platform for DNMT3B recruitment at methylation sites. Instead, our methylome analysis clearly discriminated patients with DNMT3B mutations from those with mutations in ZBTB24, CDCA7 or HELLS, and identified genomic and epigenomic contexts where DNA methylation is differentially affected in ICF1 or in ICF2, 3 and 4 patients. More specifically, our data lend support to a key role of ZBTB24, CDCA7 and HELLS in the methylated status of common CpG-poor genomic regions that have hallmarks of heterochromatin in healthy cells, including the remarkable presence of gene clusters expressed in a randomly or imprinted monoallelic manner.
It has never been really clear how reduced DNA methylation in ICF patients may cause immunodeficiency. This question is even more pressing today with our demonstration that patients, grouped under the same disease name on the basis of common cytogenetic defects that serve to establish diagnosis, exhibit striking distinctive methylation landscapes. On the basis of DNA methylation data, our comparative analysis revealed the striking segregation of ICF patients into two groups, which actually reflects the molecular and phenotypic heterogeneity already described in the disease (12). Indeed, with respect to clinical manifestations, immunodeficiency is more severe in ICF1 patients whereas ICF2, 3 and 4 patients are more affected by intellectual disability (17,18). Yet, we identified a few hundreds of genomic regions whose methylated state relies on the integrity of any of the four ICF factors. The logical assumption would be that perturbations to the methylation status of these loci downstream of ICF mutations leads to their deregulated expression and accounts for common clinical manifestations in ICF patients. However, there are many instances showing that hypomethylation does not necessarily drive transcription [example in the ICF syndrome: (23)]. In addition, earlier transcriptomic studies failed to provide a clear explanation for the absence of memory B-cells and agammaglobulinemia that characterize all ICF patients (38–42). Of note, the mentioned studies were mainly done on LCLs from ICF1 patients at a time when mutations in other factors were not yet identified. Instead, these studies suggested that the observed transcriptional defects probably originated from, rather than being causally involved in, maturation defects that characterize patients B-cells (43). In fact, hypomethylation of pericentromeric Sat II repeats remains the main common defect among all ICF sub-groups. It is also the hallmark of cancer cells (44) although the germline versus somatic origin and further consequences on cellular phenotypes are obviously different. In both cases, no causal link could be established, and the identification of proximal events that initiate the dysregulation of the lymphoid compartment in ICF patients and why lymphoid cells are most prominently affected compared with other cell types, are questions that are far from being solved. Our comparative study raised additional questions as to whether hypomethylation of Sat II repeats early during development is the major initiating event regardless of the ICF subtype, implying that perturbed methylation profiles assessed in somatic cells are mere consequences of the resulting developmental defects and genotype of patients. Hence, understanding the consequences of genetic mutations on immunodeficiency in general, but also on differences between subgroups, would require being able to analyse molecular defects early in development or in tissue-specific progenitor cells relevant for the disease.
Nonetheless, DNA methylation profiling can reliably help to establish biomarkers for the disease, as we have already described in the case of ICF1 (23). We have now identified hypomethylation of genes with suspected functions in the central nervous system (CNS) (45) (C19MC, SNORD clusters of the PWS, etc.) that are particularly relevant for neurodevelopmental defects that affect ICF2, ICF3 and ICF4 patients, but also provide blood-based biomarkers that were shown to reflect DNA methylation changes in various tissues including in the CNS (21,46).
In addition to the attempts to understand the molecular basis of the disease, studying the etiology of monogenic diseases affecting DNA methylation machinery or landscapes offers a unique opportunity to uncover actors and mechanisms and further infer genomic regions controlled by these factors. For example, the concomitant discovery of de novo DNMTs in the mouse and the implication of DNMT3B in the ICF syndrome (14,47) firmly established this enzyme in the control of DNA methylation at centromeric repeats in the mouse and pericentromeric repeats in humans, and implied a causal link between methylation of satellite repeats and maintenance of chromosomal stability. Transcriptomic analysis coupled to ChIP assays further established Dnmt3b at the promoter of a subset of germ line genes in primary cells from mouse models (47–51). As mentioned earlier, transcriptional studies may mesestimate alterations of the epigenome. However, the present methylome profiling confirmed our previous notable observation that germ line genes are indeed the main genes affected downstream of DNMT3B mutations. In addition, and in line with previous data obtained in the mouse or inferred from studies in ICF1 patients (23,30,52–56), our methylome analysis underscores the prominent role played by DNMT3B in the methylated status of the inactive X chromosome in females and that of germ line genes, with a clear preference for CpG-rich (CGI) contexts at promoters, but also in intergenic regions. In addition, we provide an extensive list of germ line genes at which DNMT3B is required for their promoter methylated state in the soma. Because our data were not biased by cell culture or transformation effects, and experimentally validated in two somatic tissues, it represents a strong signature for DNMT3B dysfunction that can be applied to other pathological contexts like cancer in which DNMT3B catalytically inactive splice variants have been identified (57–59).
A critical question is to understand how DNMT3B is targeted to discrete places on the genome. Indeed, DNA methylation is not randomly distributed, implying that de novo DNMTs must be guided to specific places. Several studies have reported that CpG content and chromatin context promote or prevent recruitment of de novo DNMTs (5,60–63). In addition, transcription factors, by virtue of their binding to specific sequences, may also shape DNA methylation patterns by recruiting DNMTs (64). All ICF patients have reduced methylation at pericentromeric Sat II repeats, which strongly suggests that DNMT3B, ZBTB24, CDCA7 and HELLS cooperate to establish or maintain the methylated state of these repeats (15,16). At non-repeated sequences, previous studies in the mouse allowed the identification of the transcriptional repressor E2f6 as required for DNA methylation (65) and Dnmt3b recruitment at CGI promoters of the above-mentioned germ line genes (48). Likewise, the member of the structural maintenance of chromosomes (SMCs) family Smchd1, through its hinge domain, engages with methylated promoters at clustered PCDH genes to maintain DNA methylation patterns and transcriptional repression (66) suggesting that they may be required for Dnmt3b recruitment at CGI promoters on the inactive X chromosome and inactive X-linked promoters, respectively (48,65,67). Our comparative methylome analysis led to the important observation that CGIs, which are almost completely covered by the array (96%), are barely affected in ICF patients with mutations in ZBTB24, CDCA7 or HELLS. Another important conclusion from these findings is that these factors are probably not implicated in the recruitment of DNMT3B at CGI promoters, in agreement with their preference for CpG-poor open sea that can be inferred from our study of methylation defects in ICF2, 3 or 4 patients. In contrast, at a few hundreds of hypomethylated CpGs common to all ICF patients, ZBTB24 or CDCA7 with DNA binding activities could function as recruiting factors for DNMT3B at these loci. Interestingly, these common loci are enriched in open sea and intergenic categories, a signature of ICF2, 3 and 4 cells, strongly suggesting that ZBTB24, CDCA7 and HELLS might preferentially target DNMT3B at some CpG-poor regions like in the PCDHB cluster of genes or at certain OR genes scattered throughout the genome. However, and in contrast to the hypothesis of a CDCA7–HELLS ICF-related nucleosome remodeling complex (CHIRRC) could facilitate DNMT3B access to DNA in Xenopus oocytes (68), our data in cells from ICF patients clearly imply that DNMT3B occupancy at methylation sites is mostly independent of the other ICF factors outside the PCDHB genes cluster and Sat II repeats.
Owing to the significant representation of intergenic and CpG-poor regions on the HM450K array (25 and 38%, respectively), we could analyze the impact of ICF mutations on the methylation status of these regions. Among the genomic loci affected by ZBTB24, CDCA7 and HELLS mutations, but not DNMT3B mutations, GO analysis identified several clusters of coding and non-coding genes. These clusters, for most of them, are expressed in a monoallelic manner, resulting from either random or imprinted choices (69,70). Among those, we found several clusters of small non-coding RNA genes: Small Nucleolar RNA, C/D Box 115 and 116 Clusters (SNORD115–116 on chromosome 15), the primate-specific microRNA gene cluster C19MC on chromosome 19, and to a lesser extent, the SNORD114 and the miRNA cluster C14MC on chromosome 14 (data not shown). C19MC and C14MC are large microRNA imprinted clusters whose expression is restricted to the placenta and undifferentiated cells (35,71,72). The SNORD115 and 116 clusters of small ncRNAs are hosted in the imprinted locus on chromosome 15q11-q13 whose deletion is the genetic cause of the Prader–Willi syndrome (PWS, OMIM#176270), a neurobehavioral disorder (34). A protein complex containing the zinc finger protein ZNF274 and the H3K9 lysine methyltransferase SETDB1 was recently involved in DNA methylation of the PWS imprinting center (PWS-IC) that controls the transcriptional repression of SNORD116 cluster (73). Our data adds a new zinc finger protein, ZBTB24, whose loss of function specifically alters DNA methylation at SNORD clusters without affecting the PWS-IC.
Compared with the obvious contribution of DNMTs to DNA methylation, virtually nothing was known about the functions of the newly identified factors in ICF syndrome, which are devoid of DNMT activity but whose loss of integrity has dramatic impact on methylation patterns and development. In fact, the role of Hells, also known as Lsh (Lymphoid-specific helicase), in DNA methylation has been well documented in the mouse, especially at tandem and interspersed DNA repeats (37,74,75), at certain gene clusters like Hox, Rhox and Pcdh (76), and at intergenic and CpG poor promoters in mouse embryonic fibroblasts (77). In agreement with these data, methylome analysis of ICF4 patient showed hypomethylation within the PCDH cluster and in CpG-poor regions, suggesting a role of HELLS at these regions that is conserved between mouse and humans.
Knowing the role of Dnmt3b in establishing DNA methylation profiles early in mouse development, it is likely that DNA methylation defects in all somatic cells tested derived from ICF1 patients are inherited and then perpetuated starting from the early steps of human embryogenesis. However, we cannot totally exclude that defects in ICF patients arise from a failure of DNA methylation maintenance in line with the more confidential role reported for DNMT3B in this pathway (50,78). We previously demonstrated that Zbtb24, Cdca7 and Hells are required for methylation maintenance in the very specific case of centromeric repeats in the mouse (20). In contrast, our loss of function experiments performed in human primary fibroblasts showed that it is not the case at human (peri)centromeric repeats and imprinted loci, suggesting that ZBTB24, CDCA7 and HELLS are involved in establishment of DNA methylation profiles at early steps during human development.
The remarkable location of hypomethylated probes within genomic domains with heterochromatin hallmarks in normal cells suggests that loss of DNA methylation in patients with mutations in ZBTB24, CDCA7 or HELLS could also indirectly result from the loss of repressive histone marks. This idea is somewhat reinforced by a two-hybrid screen that identified SUV39H1, the major enzyme that generates the H3K9me3 hallmark of heterochromatin, as a ZBTB24 partner (79). Likewise, HELLS was shown to cooperate with the G9A/GLP complex of histone methyltransferases at a number of gene promoters, although it was clearly implicated in de novo DNA methylation during the early steps of embryonic development (80). We cannot exclude that other indirect mechanisms could lead to large-scale alterations of methylation profiles in ICF patients like modifications of the replication timing as proposed for other chromatin modifiers (81). Likewise, perturbations to pericentromeric heterochromatin compartments as a consequence of the loss of methylation at the underlying DNA repeats may have long range impact on the epigenome and transcription of certain genomic regions (82).
Whether ZBTB24, CDCA7 and HELLS function as guides for the DNA methylation machinery or through more intricate mechanisms that involve histone modifications remains to be tested although it will clearly depend on the stage of development, the type of tissue and the genomic context. Nonetheless, such comparative genome-wide characterization of DNA methylation profiles in an extremely rare disease is timely and established ZBTB24, CDCA7 and HELLS as serious candidates in future studies on the mechanisms of DNA methylation at repetitive elements, and more specifically, in future studies on the mechanisms of DNA methylation at heterochromatin regions of the human genome.
Materials and Methods
Full experimental procedures are provided in the Supplementary Material.
Ethic approval and informed consent
Approval for this study was obtained from the institutional review board of Necker Hospital and informed consent was obtained from all patients or their families (for minors), in agreement with the Helsinki Declaration (CNIL authorization: no. 908 256, October 14, 2008).
Healthy donors and patients
Blood samples were collected in EDTA-containing tubes to prevent coagulation. Peripheral blood from 10 healthy volunteers (aged 2, 4, 5, 11, 15, 24, 26, 27, 36 and 42 years) were collected and numbered from 1 to 10. Donors 1, 4, 5, 7 and 10 are women. Our cohort of patients included eight ICF1 with mutations in DNMT3B, four ICF2 patients with mutations in ZBTB24, two ICF3 patients with mutations in CDCA7 and one ICF4 patients with mutations in HELLS. These ICF patients have been previously described (23) and their clinical picture is summarized in Supplementary Material, Table S1.
Primary cells and cell lines
Primary skin fibroblasts from NA subjects and ICF patients have been previously described (23). Additional primary skin fibroblasts from ICF patient P10 were derived by expanding primary cultures of skin biopsies. LCLs have also already been described (16).
Sample preparation for genome-wide DNA methylation analysis
A sample of 1 μg of genomic DNA from cells lines or whole blood was converted by sodium bisulfite using the EpiTect® 96 Bisulfite Kit (Qiagen). Bisulfite converted samples were then hybridized on the Illumina Infinium HumanMethylation450 BeadChip (Illumina) according to the manufacturer’s protocol at the genotyping facility of the Centre National de Génotypage (Evry, France) (83).
Genome-wide DNA methylation data processing
Illumina 450K microarray signals were extracted with GenomeStudio software (Illumina) and were processed and normalized by an updated version of the previously developed pipeline (83). The procedure is described in the Supplementary Material.
The data from this study have been submitted to the NCBI Gene Expression Omnibus (GEO; http://www.ncbi.nlm.nih.gov/geo/) under accession number GSE95040.
RNA extraction and analysis of gene expression
Total RNA was isolated as previously described (23). Primer sequences are provided in Supplementary Material, Table S8.
Statistical analysis
For the annotation of probes relative to the various categories defined in terms of CpG density, genomic and epigenomic contexts, the proportion of probes assigned to each category was calculated, and category distributions were compared through a Chi-square test.
Student’s t-tests were used to determine the statistical significance between values of RT-qPCR, MSRE-qPCR and bisulfite pyrosequencing experiments from three independent experiments.
Webservers and Web Resources
Coriell Cell Repositories (USA) (http://ccr.coriell.org/)
MethPrimer (http://www.urogene.org/methprimer/)
GenomeStudio Illumina (https://www.illumina.com/techniques/microarrays/array-data-analysis-experimental-design/genomestudio.html)
Galaxy tools (http://main.g2.bx.psu.edu)
IGV visualization tool (http://software.broadinstitute.org/software/igv/)
UCSC genome Browser (http://genome.ucsc.edu)
Enrichr Bioinformatics Resources (http://amp.pharm.mssm.edu/Enrichr)
Human Protein Atlas project (http://www.proteinatlas.org/humanproteome/testis)
HOmo sapiens COmprehensive MOdel COllection (HOCOMOCO) v11 (http://hocomoco11.autosome.ru/)
Regulatory Sequence Analysis Tools (RSAT) (http://rsat.sb-roscoff.fr/)
Repeat Masker (http://www.repeatmasker.org/)
Supplementary Material
Supplementary Material is available at HMG online.
Acknowledgements
The authors would like to thank Drs Déborah Bourc’his, Anne-Valérie Gendrel, Antoine Peters and Prs Reiner Veitia and Jonathan Weitzman for insightful discussions about this work, Florent Hubé for his advices with bioinformatics tools, Laure Ferry and the Epigenomic Core Facility of the UMR7216 (http://parisepigenetics.com/ecf-en/) for help with Pyrosequencing, Drs Paola Ballerini (Hôpitaux Universitaires Est Parisien, Trousseau) for the gift of whole blood from control children, Dr Kamila Kebaili (Centre de Référence Déficits Immunitaires Héréditaires, CHU de Lyon) for the gift of whole blood from ICF4 patient, Dr Bertrand Roquelaure (Hôpital de la Timone, CHU Marseille) for the gift of whole blood from ICF3 pC patient, and all the patients and their family members for their participation in this study.
Conflict of Interest statement. None declared.
Funding
Agence Nationale pour la Recherche (ANR) (http://www.agence-nationale-recherche.fr/en/; ANR-14-CE14–0011 METHYL-MEMORY), Fondation Jérôme Lejeune (https://www.fondationlejeune.org/en/), Ligue Nationale contre le Cancer (https://www.ligue-cancer.net/) and GEFLUC (http://www.gefluc-paris.fr/). CF is supported by the Institut National de la Recherche Médicale (INSERM). GG was supported by the French Ministry of Research and Fondation ARC (http://www.fondation-arc.org/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
References
Author notes
Guillaume Velasco and Giacomo Grillo authors contributed equally.
Present address: Department of Medical Biophysics, Princess Margaret Cancer Centre, University Health Network, University of Toronto, Toronto, ON, Canada.