In vertebrates, methylation of cytosine at CpG sequences is implicated in stable and heritable patterns of gene expression. The classical model for inheritance, in which individual CpG sites are independent, provides no explanation for the observed non-random patterns of methylation. We first investigate the exact topology of CpG clustering in the human genome associated to CpG islands. Then, by pooling genomic CpG clusters on the basis of short distances between CpGs within and long distances outside clusters, we show a strong dependence of methylation on the number and density of CpG organization. CpG clusters with fewer, or less densely spaced, CpGs are predominantly hyper-methylated, while larger clusters are predominantly hypo-methylated. Intermediate clusters, however, are either hyper- or hypo-methylated but are rarely found in intermediate methylation states. We develop a model for spatially-dependent collaboration between CpGs, where methylated CpGs recruit methylation enzymes that can act on CpGs over an extended local region, while unmethylated CpGs recruit demethylation enzymes that act more strongly on nearby CpGs. This model can reproduce the effects of CpG clustering on methylation and produces stable and heritable alternative methylation states of CpG clusters, thus providing a coherent model for methylation inheritance and methylation patterning.
Cytosine methylation in vertebrates occurs predominantly at CG dinucleotide sequences (1), termed CpG sites. The intense experimental interest in this modification is due to its potential to provide epigenetic regulation of gene expression (2,3). To qualify as an epigenetic mark, the CpG methylation state needs to be stable and heritable through cell division. The symmetry of the CpG sequence has served as the basis for a simple model where the methylation state of a single CpG can be inherited without dependence on the state of neighboring DNA (4,5). During replication, DNA polymerase inserts non-methylated cytosines, copying an unmethylated CpG site to unmethylated sites on the two daughter strands, and copying a fully methylated CpG site into two hemimethylated sites. The fully methylated state is then re-established by efficient recognition of these hemimethylated sites by DNA methyltransferases (DNMTs).
However, a number of observations indicate that this ‘classical’ model is now untenable (6–8). First, the model requires a high fidelity of methylation of hemimethylated sites as well as non-methylation of unmethylated sites, features that are not matched by the activity of DNMTs in vitro (8) or in vivo (9) and are compromised by active removal of methyl groups by demethylation pathways (8,10–15). Indeed, the frequencies of hemimethylated CpG sites observed in vivo by hairpin bisulfite polymerase chain reaction (16) indicate high error rates for individual CpG sites. Second, CpG sites display group behavior that is not predicted from a model where CpG sites are independent. Measurement of methylation patterns among clusters of CpG sites in vivo reveal bimodality of methylation—different clusters tend to be either hyper- methylated or hypo-methylated, infrequently existing in intermediate methylation states (17–20). Bimodal methylation is often displayed by the same CpG cluster, with the cluster being in distinct methylation states in different cells, or even in different alleles in the same cell (17).
An alternative class of model is to assume that CpGs are not independent, rather that the methylation of a given CpG site is affected by the methylation of the surrounding CpG sites. We have proposed a model where methylated and hemimethylated CpG sites recruit DNMTs, and unmethylated CpGs recruit demethylases, with the recruited enzymes acting on CpG sites in the vicinity (7). Simulations show that this positive feedback could allow CpG sites to collaborate to dynamically maintain either an overall hyper- or hypo-methylated state of a cluster. This bimodal methylation arises naturally as a result of the inherent bistability of the system. Importantly, the hyper- or hypo-methylated state of a CpG cluster could each be robustly inherited over many cell generations, even in the presence of high error rates.
The availability of genome-wide methylation mapping, for example whole genome bisulfite sequencing (21,22), allows examination of how CpG collaboration could operate on a genomic scale. The 28 million CpG sites in the human genome are predominantly methylated and occur at low frequencies (on average 1/100 bp) across the genome (16,23). Strong interest has however been drawn by the methylation patterns of comparably dense regions of CpG sites. These regions, termed CpG-islands (CGIs) (24), have traditionally been considered to be largely unmethylated (24–26), but more recent evidence is supportive of a picture where CpG islands can also be in predominantly methylated states (1,17,27). Some of the interest in CGIs stems from the association of their methylation patterns with promoter activity (28–30). Common to a range of definitions and descriptions of CGIs, the density of CpG content (24,31) is the crucial parameter used to identify CGIs. Overall, the level of methylation has been considered to be anti-correlated with CpG density (17,32).
Effects of CpG topology on methylation are a natural corollary of collaborative models, since they propose that the methylation status of a CpG site is dependent on the methylation status of nearby CpGs. To understand how the topology of CpG sites affects their methylation, we systematically analyzed the clustering of CpG sites in the human genome, finding that a large fraction of the CpGs can be defined as existing in isolated ‘clusters’ of 1–60 sites with inter-CpG distances <25 bp and separated by at least 65 bp from surrounding CpG sites. Examining the methylation status of these and other clusters in four human methylomes, we find the expected bimodal methylation pattern, where clusters were either hypo- or hyper-methylated. We also saw a strong trend where the probability of hypo-methylation increases with increasing number and density of CpGs in the cluster. We show that these geometric effects on methylation can be reproduced by a modified collaborative model, in which the efficiencies of the recruitment-based methylation and demethylation reactions decay differently with increasing separation between CpGs. Our work suggests that ubiquitous collaborative interactions between CpGs could provide much of the patterning of genomic methylation and would allow clusters of moderate size to exist stably in heritable alternative methylation states to support epigenetic gene regulation.
MATERIALS AND METHODS
Distances and positions of CpGs were analyzed for the human genome (hg18, downloaded from http://genome.ucsc.edu/) (33). d = 2 bp for adjacent CpGs. IMR90 methylome data were from http://neomorph.salk.edu/human_methylome/data.html (IMR90 C basecalls) (21), and brain tissue methylome data (22) were from http://www.ncbi.nlm.nih.gov/geo GEO accessions: GSM1163695 fetal frontal cortex, GSM1164630 and GSM1164632 middle frontal gyrus from 12 and 25 year old males. Data for CpGs with coverage of at least 10 was used for methylation averages, except for Figure 3B.
We simulate a CpG cluster including its surroundings using a collaborative distance-dependent model (Figure 3A). In the limit of an infinite number of CpG sites and assuming that each CpG site interacts equally with any other CpG site (mean-field assumption) the equations describing the fraction of CpG sites in u (unmethylated), h (hemimethylated) and m (methylated), are:
Alternations in the distance parameters (a, b, d0 and α) affect how the inside and outside CpG densities and cluster sizes control the inside and outside methylation status. Increasing α to α ≈ 1000 bp makes the demethylation reactions stronger and smaller islands become unmethylated, i.e. no N* would be found. Decreasing α to α ≈ 100 bp makes larger islands more methylated. Increasing d0 to d0 ≈ 300 bp leads to unmethylated small islands and thereby no N* is found. Decreasing d0 to d0 ≈ 120 bp leads to methylated islands where the demethylation reactions are weaker than the methylation reactions. With low a (a ≈ 100 bp) the demethylation reactions are stronger and the islands are predominantly unmethylated independent of cluster size. The opposite is observed for higher a (a ≈ 750 bp). Methylated islands dominate when b is decreased to b ≈ 4 and unmethylated islands dominate when b is increased to b ≈ 11. Generally, the transition from methylated small islands to unmethylated large islands is lost when the parameters are perturbed and consequently no N* is found.
Clustering of CpG sites in the human genome
Systematic analyses of the distribution of CpG sites within vertebrate genomes have shown a highly non-random pattern, with the frequencies of short and long distances between CpG sites enhanced at the cost of intermediate distances (34,35). This is shown in Figure 1A and B, which compares the frequencies of observed CpG–CpG distances in the human genome (33) with that expected from a random arrangement of the same number of CpG sites. The distribution of the null model (Figure 1A) approaches an exponential distribution and there is a small peak at distances close to 10 bp, a distance that is observed in dense regions of CpG sites (36). However, such analyses only partially capture the clustering of CpGs because they do not address higher order clustering due to correlations between neighboring CpG–CpG distances.
We thus counted the occurrences of each possible combination of successive CpG distances (i.e. CpG-d1-CpG-d2-CpG, where d1 and d2 denote distances between the CpG sites) in the human genome and compared these to the case where all observed CpG–CpG distances are maintained but are randomly arranged (Figure 1C). This randomization leaves the observed frequencies of distances intact while removing correlations between neighboring distances. Plotting the ratio between the observed d1–d2 counts and those in the randomized genome (Figure 1D) shows that short-short and long-long distance combinations are strongly enhanced, while short-long and long-short combinations are under-represented. The enhanced regions in Figure 1D set natural scales for CpG clusters; considering the lines of unit ratio, clustering of the distances occurs for distances less than ∼25 bp and for distances greater than ∼65 bp.
Accordingly, genomic CpGs can be captured by a definition of a CpG cluster that requires every pair of neighboring CpG sites in the cluster to be separated by a distance shorter than a threshold dmax = 25 bp, and the terminal CpG sites of the cluster to be separated by a distance larger than a threshold Dmin = 65 bp from both flanking CpG sites (Figure 1E). (Note that a single CpG that is >Dmin from both neighboring CpGs is scored as a ‘cluster’ of 1). This definition includes ∼30% of the CpG sites in the human genome as existing in cluster sizes NC ranging from 1 to 60 CpG sites, with the majority of the CpG clusters in the NC range 1–11 (Figure 1F).
Plotting the average distance as a function of CpG position in and around the pooled dmax = 25 bp, Dmin = 65 bp clusters of size NC = 15 (Figure 1G) shows that a ‘boundary’ of 65 bp around the clusters causes them to be surrounded by typical CpG densities, since the average inter-CpG distances 〈d〉 around the cluster immediately return to close to the genomic average. Thus, on average, these clusters are not strongly associated with other clusters. In contrast, the larger set of clusters defined by use of a smaller boundary Dmin = 45 bp tend to be surrounded by regions of higher CpG density, indicating that this definition includes many clusters that are nearby other clusters (Figure 1H).
Effect of CpG clustering on methylation
We examined the methylation of the dmax = 25 bp, Dmin = 65 CpG clusters in four human methylomes obtained by whole genome bisulphite sequencing of a fetal lung fibroblast cell line (IMR90), and fetal, juvenile and adult brain cell samples (21,22). Thus each CpG cluster was represented four times. We used the average methylation values, ranging from 0 to 1, for individual CpGs that had been covered at least 10 times within each methylome dataset (∼22 million CpGs of the 28 million in hg18).
The mean methylation of each cluster, calculated as the average of the methylation fractions of each CpG in the cluster, was strongly dependent on the number of CpGs in the cluster, NC. The distributions of mean cluster methylation in Figure 2A display a strong bimodal pattern, with clusters either hyper- or hypo-methylated but rarely in intermediate methylation states. However, clusters containing few CpGs are almost invariably highly methylated, while clusters with increasing numbers of CpGs become increasingly likely to be hypo-methylated (Figure 2A). Thus, ‘lone’ CpGs, which occupy the largest fraction of the genome (Figure 1F), are predominantly hyper-methylated, while very large clusters are predominantly hypo-methylated. Importantly, there is no clear demarcation between high methylation-favoring and low methylation-favoring regimes, suggesting that current criteria for defining CpG islands are somewhat arbitrary.
To check that these effects are not particular to our choice of dmax = 25 bp, Dmin = 65 bp, we tested clusters with various dmax and Dmin combinations. We kept Dmin > dmax so that all clusters are set within lower density regions. However, we note that low Dmin values mean that it becomes more likely that the cluster is nearby other clusters (Figure 1H). The effect of NC was measured for each Dmin/dmax combination by determining N*, the NC at which the average methylation of the clusters crosses 0.5 (e.g. for the dmax = 25 bp, Dmin = 65 bp clusters, N* = 29, Figure 2A). In all cases, the methylation versus NC trend was the same, with methylation favored when NC < N* and unmethylation favored when NC > N*, as shown for dmax = 25 bp, Dmin = 45 bp (Figure 2B). Plotting the N* values against average d, 〈d〉, for each Dmin/dmax combination shows a CpG density effect; decreasing average distances between CpGs give lower N* values i.e. clusters of fewer CpGs are able to exist in an unmethylated state if they are more dense (Figure 2C). Thus, the points in Figure 2C define a transition between a lower CpG number/lower CpG density regime where hyper-methylation is favored (lower right), and a higher CpG number/higher CpG density regime where hypo-methylation is favored (upper left). We note that the actual change in methylation preference across this transition region is gradual. Interestingly, N* only weakly increases with Dmin.
Dynamical model for spatial collaboration
The observed strong bimodality of cluster methylation is a natural feature of the bistability that can result when collaboration involves positive feedback, that is, when methylated CpGs foster methylation of nearby CpGs and unmethylated CpGs foster demethylation of nearby CpGs. Some effect of CpG number and density on cluster methylation is also expected because of the interactions between nearby CpGs. However, we wanted to test whether the asymmetry of the effect of CpG number and density, with hypo-methylation favored in larger, denser clusters, could also be explained by a collaborative model.
Our previous model (7) invoked a number of methylation and demethylation reactions that interconvert fully methylated (m), hemimethylated (h) and unmethylated (u) CpG sites (Figure 3A). Interconversions can be non-collaborative, that is occur independently of other CpGs (black and gray arrows, Figure 3A) or collaborative, where the particular reaction at a target CpG involves a nearby mediator CpG in a particular methylation state (curved arrows, Figure 3A). For example, the methylation of a hemimethylated CpG could depend on the presence of a nearby fully methylated CpG (dark red arrow, Figure 3A). The most robust heritable bistability was obtained with the positive feedback collaborative reactions shown in Figure 3A, where m and h sites act to foster methylation of nearby u and h sites (maintaining the hyper-methylated state), and u sites act to foster demethylation of nearby h and m sites (maintaining the hypo-methylated state) (7). However, this basic model does not predict that CpG number or density should affect which state is favored.
A simple and plausible way to allow an effect of CpG site topology in this model is to introduce a distance scaling of the collaboration reactions, where the probability of interaction between a target CpG and an enzyme recruited to a mediator CpG is dependent on the DNA distance between the two CpG sites. The relationship between contact probability and distance on chromatin in vivo is poorly understood. Hi-C experiments show that at long distances (>1000 bp), relative contact probability between two sites in human DNA in vivo generally falls with increasing distance, roughly as 1/d (37). However, at shorter distances, contact can be sub-optimal because of the stiffness of DNA and the nature of its packaging. A study of FLP recombination in mouse cells found recombination frequency increased as d was increased from 74 to 200 bp, followed by a steady decrease in recombination as d was increased to 15 kb (38). This effect of short distances on reaction probability is likely to be different for different enzymes, as it depends on the flexibility of the protein and the steric requirements for the reaction. Thus, different collaborative reactions may have quite different sensitivities to the distance between the mediator and target CpGs.
The bias toward hyper-methylation for less dense CpG clusters (Figure 2A) suggests that collaborative methylation reactions generally act more efficiently than collaborative demethylation reactions over longer CpG–CpG distances. Conversely, the bias toward hypo-methylation for more dense CpG clusters suggests that collaborative demethylation reactions are favored at shorter CpG–CpG distances. Different ranges for these reactions are supported by analysis of CpG clusters surrounded by at least 2.4 kb of low CpG density on both sides (Figure 3B). Hyper-methylated clusters are associated with a large zone of increased methylation, while hypo-methylated clusters seem to have effects over only a small region. To implement these different distance sensitivities in the model, we chose two mathematically convenient probability density functions (Figure 3C). For the collaborative methylation reactions, the probability that a DNMT recruited by a mediator CpG converts a target CpG that is d bp away from the mediator, scaled as 1/(d + α). Here, α is an offset that produces a less steep decrease of probability over distances of d < α but approaches a 1/d power law as d ≫ α. For the collaborative demethylation reactions, we used a simple exponential function, exp (−d/d0), which provides a steeper decay of probability with distance that favors short-range collaboration (Figure 3). Here, d0 is used to scale this distance sensitivity. We stress that it is unlikely that these functions accurately describe the d versus probability relationships for each of the collaborative reactions; they are used here simply to test the idea that the cluster size/density bias could be explained by a difference in distance sensitivity in the competing methylation/demethylation reactions.
We tested the behavior of the new model by simulating 20 kb DNA regions containing a single CpG cluster (a portion of such a region is shown in Figure 4A). In different simulations we varied the number of CpG sites in the cluster NC and the distance d bp between the sites. The DNA surrounding the cluster contained CpG sites spaced 100 bp apart (the genomic average), except that the first CpG on each side of the cluster was a distance D bp from the cluster. As with our previous modeling, simulations involved iterating the five collaborative and four non-collaborative methylation and demethylation reactions (Figure 3A), randomly chosen according to defined reaction probabilities. For each reaction attempt, a target CpG and for the collaborative reactions also a mediator CpG, are randomly chosen. If the methylation status of these CpGs (u, h or m) is correct for the chosen reaction, then the target CpG is converted, otherwise the target is unchanged. However, in the new model, the collaborative reactions were also subjected to a distance test where the probability for the reaction to occur is determined from the distance between the concerning CpG sites. The probability for a methylation reaction is determined from a probability density function of a power law (a/ (d + α)), while a demethylation is determined from an exponential (b · exp (−d/d0)), where d is the distance between mediator and target CpGs, and a and b are scaling factors. Each generation comprised on average 100 reaction attempts per CpG, after which a replication event was simulated by making the replacements m → u, u → u (unchanged) and h → u or h → h with equal probability. Simulations were carried out for 1000 generations. Parameters were adjusted to test if the systems could replicate the response of real clusters to CpG number and density (Figure 2).
Spatial collaboration can recapitulate genomic patterns
Figure 4A shows a system with NC = 23, d = 25 bp and D = 65 bp where the cluster is bistable, able to exist stably and heritably in either a hyper- or hypo-methylated state, while the surrounding low density CpG region remains predominantly hyper-methylated. This overall pattern was attained whatever the initial state of the system, the simulation was begun with all CpGs in random states. Thus, in the model a single cluster can display the bimodality characteristic of real CpG clusters. Note that in the hyper-methylated state, a zone of mixed and rapidly varying methylation occurs in the regions adjacent to the cluster, reminiscent of CpG island shores (39).
Varying the number of CpGs in the cluster, while keeping all other parameters fixed, produced the trend seen for the methylome data, where smaller clusters were predominantly hyper-methylated and the probability of the hypo-methylated state increased with increasing NC (Figure 4B). In the model this comes about because the long-range property of the collaborative methylation reactions allows the sparsely distributed CpGs outside the cluster to collaborate with each other to sustain their own hyper-methylation, but also to act within the cluster. A few clustered CpGs cannot overcome this pervasive methylating ‘force’. However, increasing the number of CpGs in a cluster allows the short-range collaborative demethylation reactions to build up an interaction field that is able to resist methylation. In large clusters the demethylation reactions can dominate to the extent that only the hypo-methylated state is possible.
We also systematically tested the effect of cluster density 1/d and the separation between the cluster and the surroundings D, on N*, the NC at which the cluster was equally likely to be in the high or low methylation states (Figure 4C). We saw the same trend as seen in the methylomes, where N* was smaller for more dense clusters (low d). That is, increasing cluster density allowed clusters with fewer CpGs to access the unmethylated state. As in the methylomes, the effect of D was small. These effects are understandable, as increased density of the cluster favors CpG interactions via short-range collaborative demethylation, while having little effect on collaborative methylation. The long-range activity of collaborative methylation reactions means that the effect of the outside CpGs on the cluster is not sensitive to changes in D that are relatively small compared to α.
For clusters that display bistable behavior, i.e. where NC ∼ N*, the stability of each of the states is an important factor when comparing the clusters in the model with clusters in methylomes. If the state of a particular cluster were to flip back and forth rapidly, then in a sample of DNA from many cells, that cluster would be hyper-methylated in some DNAs and hypo-methylated in others, giving intermediate average methylation levels. In order to produce the bimodal pattern seen in the methylomes (Figure 2), each specific cluster must be in just one of the two possible states within most of the cells sampled, implying high stabilities. The stabilities of the hyper- and hypo-methylated states in the model vary depending on cluster size and density, but the stability of the unfavored state ranges from 0 to 500 generations while the stability of the favored state ranges from 100 to >1000 generations. The average number of consecutive generations spent in each state for NC = 28 is ∼100 generations in the methylated state and similarly 100 generations in the unmethylated state (out of 3000 simulated generations). However, both states are stable for at least 300 generations. Thus for many clusters, the stability of methylation states seen in the modeling may be insufficient by itself to explain the bimodality seen in the methylome data. We imagine two possible explanations. First, our model may for some reason underestimate the stabilities of each methylation state. For example, we know that reducing the rate of the non-collaborative reactions in the model can increase stability (7). Decreasing the non-collaborative reactions by 5% increases the average consecutive generations spent in the unmethylated state to 150 generations and 280 generations for the methylated state. Second, many or most of the clusters may not be bistable in the cells studied. CpG number and density cannot be the only determinants of methylation state, and each individual cluster is likely to be subject to sequence-specific factors that affect the rates of the methylation or demethylation reactions and bias the cluster toward one of the states. In some clusters this bias could favor the hypo-methylated state, in others the hyper-methylated state, so that many clusters which might be bistable in other cell types remain stably in one state.
We proposed the collaborative model of CpG methylation as a mechanism to provide the robust maintenance and inheritance of alternative methylation states required for a true epigenetic mark (7). We have shown here that a simple extension of this model, in which the methylation and demethylation reactions are differentially sensitive to the distance between interacting CpGs, is able to reproduce the general relationship between CpG clustering and CpG methylation in the human genome.
Mechanisms of distance-dependent CpG collaboration
Although there is some evidence for collaborative methylation and demethylation reactions, little is known about their distance-dependence. However, we expect that the required distance-dependent collaboration would not be difficult to achieve mechanistically. For example, the UHRF1 protein binds a hemi-methylated CpG site via its SRA domain and recruits DNMT1 (40). This recruited DNMT1 is thought to methylate other hemi-methylated CpGs (41), providing one of the required collaborative h → m reactions ((7); Figure 3). It is possible that this DNA-tethered UHRF1-DNMT1 complex is not flexible enough to allow equal access of the DNMT1 catalytic domain to all CpGs in nearby chromatin, possibly giving a bias against short-range interactions.
Ten-eleven translocation methylcytosine dioxygenase 1 (TET) proteins are the prime candidates for CpG demethylases, catalyzing oxidation of 5mC to 5-hydroxymethyl-C and initiating a complex pathway for removal of the methylated cytosine (15). Consistent with the collaborative model, TET1 preferentially associates with CGIs, which are largely unmethylated (12) but this recruitment is poorly understood. Recruitment of TET2 by IDAX, which contains a CXXC domain that recognizes DNA containing unmethylated CpG and is enriched at sites with high CpG content (42), could in theory provide collaborative demethylation. A DNA-tethered IDAX–TET2 complex may be sufficiently flexible to oxidize CpGs close by on the DNA, providing the short-range collaboration required by the model. In theory, methylation or demethylation collaboration may be achieved by more complex recruitment reactions, potentially involving other chromatin marks such as histone modifications or other DNA modifications (8), each with their own characteristic distance dependencies.
An alternative mechanism to the short-range collaborative demethylation reactions in our model is suggested by the study of Thomson et al. (43). They proposed that recruitment of the CXXC protein Cfr1 to unmethylated CpG clusters could inhibit DNMT action on the cluster and maintain the unmethylated state. We have tested this type of mechanism by simulations and have shown that it is indeed able to substitute for the collaborative demethylation reactions in our model (44). Bistability is possible if recruitment of the inhibitor protein by unmethylated CpGs is cooperative, and if the inhibition of methylation extends to the neighbors of the unmethylated CpGs to which the protein is bound. Thus, the principle of short-range collaboration between unmethylated CpGs is shared by both mechanisms. However, when more long-ranged reactions are required, there may be limitations to such a short-ranged cooperative protection mechanism.
The relationship between CpG topology and methylation could be tested experimentally by inserting large DNA fragments containing synthetic CpG clusters set within a low CpG density sequence, into gene-free genomic regions in a suitable cell line, followed by assessment of their methylation states. Use of clusters of different sizes, densities and initial methylation status, would allow systematic determination of the general geometric rules for DNA methylation.
Implications of the model
The model provides a different way of thinking about CpG islands, one that is more strongly tied to the bistability that underpins epigenetic memory. Our analysis suggests that clusters ranging from ∼10 CpG sites within a region of ∼80 bp to ∼40 CpG sites within ∼500 bp are intrinsically bistable. Thus, even a small cluster may be capable of carrying epigenetic memory, being able to be in either a hyper- or hypo-methylated state by transient signals and retaining that state once the signals disappear. Our results argue for a stronger focus on small CpG clusters.
In contrast individual, isolated CpGs and small, sparse clusters are predicted to be unable to maintain hypo-methylation in the absence of a sequence-specific external factor. Similarly, very large, dense clusters, or clusters of clusters, may not be able to stably maintain hyper-methylation. This intrinsic property of large clusters may explain the failure of maintenance of targeted CpG methylation within a large CGI (a cluster of clusters with 198 CpG within 2220 bp) at the human VEGF-A promoter (45). Even if methylation of all of this cluster could be achieved by targeting, the intrinsic bias toward demethylation may be too strong for methylation to persist after targeting. Our modeling suggests that targeting methylation at small, isolated CpG clusters is more likely to induce stable changes.
The collaborative model also has important implications for the origin of clustering in vertebrate genomes. CpG clustering is proposed to be a by-product of a high mutation rate for 5mC residues causing CpG sites that are more often methylated in the germ line to be lost faster than those that are more often unmethylated (46). In the collaborative model, the feedback between CpG density and methylation state should tend to make this mutation rate-driven evolution of clustering more rapid, since loss of a CpG site will enhance methylation and thus loss of nearby sites, while gain of a CpG site will help nearby CpG sites be unmethylated and thus survive. In addition, the functionality of CpG clustering in collaboration means that there would likely be significant selective pressure for gain or loss of CpG sites in order to optimize methylation states (43).
The generation of CpG methylation patterns and epigenetic memory
The classical model does not by itself predict any effects of CpG clustering on methylation state. Variants of the classical model invoke locus-specific individual CpG methylation and demethylation reaction rates (8,47), which can in theory explain, but not predict, clustering effects. In contrast, our collaborative model is generic, invoking relatively few global parameters that apply equally to all CpG sites and allows some prediction of methylation status from CpG number and density alone. However, additional sequence-specific factors are clearly needed to generate the full temporal and spatial patterning seen in methylomes.
In the non-collaborative models, there is a single equilibrium methylation level for any CpG under any given set of conditions. Sequence-specific factors can change the position of this equilibrium but do not automatically generate bimodal methylation patterns. In contrast, the positive feedback in the collaborative model provides an intrinsic force that pushes a cluster away from intermediate methylation levels toward either hyper- or hypo-methylation. Sequence-specific factors act to change the probability of occupation of these alternative states.
The lack of bistability in the non-collaborative models means that if a methylation state of a cluster is set by sequence-specific signals, it will inexorably revert to its default methylation level once the signals disappear. In contrast, the collaborative model predicts that some CpG clusters, once set into the hyper- or hypo-methylated state, can remain in that state stably and heritably in the absence of the signal, providing epigenetic memory.
C.L., J.O.H., I.B.D. and K.S. acknowledge financial support from the Danish National Research Foundation through the Center for Models of Life.
Australian NHMRC [GNT1025549]. Funding for open access charge: Danish National Research Foundation.
Conflict of interest statement. None declared.