Modeling epigenome folding: formation and dynamics of topologically associated chromatin domains

Genomes of eukaryotes are partitioned into domains of functionally distinct chromatin states. These domains are stably inherited across many cell generations and can be remodeled in response to developmental and external cues, hence contributing to the robustness and plasticity of expression patterns and cell phenotypes. Remarkably, recent studies indicate that these 1D epigenomic domains tend to fold into 3D topologically associated domains forming specialized nuclear chromatin compartments. However, the general mechanisms behind such compartmentalization including the contribution of epigenetic regulation remain unclear. Here, we address the question of the coupling between chromatin folding and epigenome. Using polymer physics, we analyze the properties of a block copolymer model that accounts for local epigenomic information. Considering copolymers build from the epigenomic landscape of Drosophila, we observe a very good agreement with the folding patterns observed in chromosome conformation capture experiments. Moreover, this model provides a physical basis for the existence of multistability in epigenome folding at sub-chromosomal scale. We show how experiments are fully consistent with multistable conformations where topologically associated domains of the same epigenomic state interact dynamically with each other. Our approach provides a general framework to improve our understanding of chromatin folding during cell cycle and differentiation and its relation to epigenetics.

, we considered here virial-type expansion contact interactions to account for hard-core repulsion and attraction between monomers: H = (3k B T /2l 2 ) n (X n − X n−1 ) 2 + n<m (U ns + U s δ n,m )δ(X n − X m )+ n =m =k U hc δ(X n − X m )δ(X m − X k ) where δ is the Dirac Delta function. by zooming to the corresponding area) for the same set of parameters. A priori, the contact map of a region should depend on the size of the chain and on the primary sequence of the neighboring chromatin. We remark that the pattern of interactions itself is not affected by the size of the investigated region. We observe only weak alterations in the long-range intensities with domains located at the extremities of the chain. In our example, this is due to the absence during the simulations (in B) of a significant part of the black chromatin that contribute in stabilizing the black chromatin globule mentioned in Fig.3D of the main text.  Mbp of chromosome 3R predicted by the Gaussian self-consistent approximation as a function of the strength of specific and non-specific interactions, starting from a coil, a MPS, a globular or an experimental-like conformations (initial contact maps are drawn at the top right corner). In the multistability region, depending on the initial conditions, many different steady-state solutions can be found by the Gaussian self-consistent method, the true thermodynamic averages being a weighted sum of these different solutions.

Block copolymer model
In this section, we describe in greater details the block copolymer model.
We model the chromatin fiber as an interacting self-avoiding bead-and-spring chain containing N monomers. Each monomer represents 10 kbp of DNA and is characterized by an epigenetic state. The Hamiltonian of the system H = H chain + H inter is made of two contributions: • H chain the Hamiltonian of the self-avoiding chain, given by with X n the position of monomer n, k = 3k B T /l 2 (l the segment length of the pure Gaussian chain), U hc the pair-wise repulsive hard-core potential and r n,m the relative distance between monomer n and m (r 2 n,m = (X n − X m ) 2 ). U hc is modeled by a truncated Lennard-Jones-like potential: The motivation behind the exponents 5/2 and 5/4 is to insure integrability of the hard-core potential needed by the self-consistent approximation (see below). The cut-off occurs at the minimum of the potential to insure continuity. We choose U 0 hc = 20k B T and 2 4/5 σ = l. • H inter accounts for attractive short-range interactions between monomers and is made of nonspecific interactions modeling compaction effect due to confinement, and of specific interactions modeling attraction between monomers having identical epigenetic states. H inter is given by with U ns and U s the strength of non-specific and specific interactions, δ en,em = 1 (0) if the epigenetic states e n and e m of monomers n and m are equal (or not), and r 0 the typical lengthscale of the short-range interaction. The motivation behind this Gaussian-like potential is that the number of contacts between two Gaussian chains (the 10kbp-long subchains contained in each monomer) scales as exp[−d 2 /(2r 2 0 )] with d the distance between the centers of mass and r 0 the typical radius of gyration of the subchain. We choose r 0 = l/ √ 6.

Gaussian self-consistent approximation
In this section, we detail the self-consistent approximation used to derive Eq.2 of the main text.

Gaussian approximation and self-consistency
The dynamics of the chain is described by a set of Langevin equations with ξ the friction coefficient and η n delta-correlated white noise ( η n = 0 and η α n (t)η β m (t ) = 2ξk B T δ(t − t )δ n,m δ α,β with α, β ∈ {x, y, z}).
At each time point, we approximate P by a multivariate Gaussian distribution withȲ (t) (Ȳ i,α = X α i ) andC(t) (C i,α;j,β = X α i X β j ) the first and second moment of the Gaussian. Isotropy of the system imposes alreadyȲ = 0 and thatC i,α;j,β = C i,j δ α,β with C i,j = X i · X j /3.
In the next, we aim to derive an equation that describes the dynamics of C using the approach developped by Ramalho et al. for approximating p.d.f dynamics but in the context of biochemical reaction networks {Ramalho et al, Phys. Rev. E, 87: 022719 (2013)}. The general idea is to assume an initial Gaussian distribution for Y , then to evolve it according to the Fokker-Planck equation, and then find the Gaussian distribution that best fits it. Let's first focus on the deterministic part of this evolution (by putting k B T = 0) before adding the stochastic noise. Consider that we have a Gaussian distribution P (Y ) at time t (with a covariance C(t)). After an infinitesimal time step dt, the evolved (deterministic) p.d.f will be with Y = Y +dt(−∂H/∂X n )/ξ and J = −(∂ 2 H)/(∂X n ∂X m ) the Jacobian of the (deterministic part of the) Langevin equations (the opposite Hessian of the Hamiltonian H). Due to the non-linearity of (−∂H/∂X n ), P e is no longer a Gaussian. However, we aim to determine the closest Gaussian distribution P (characterized by a covariance C ) to P e in term of information content using the maximum entropy principle. One requirement of this principle is to minimize the Kullback-Leibler divergence of P to P e defined as Minimization of d KL (P ||P e ) with respect to C leads to {Ramalho et al, Phys. Rev. E, 87: 022719 with J the average value of J over the Gaussian distrution P (Y ). We now consider the effect of intrinsic fluctuations. Over dt, the evolution Y of Y is given by Y augmented by the random variable ηdt which has a Gaussian distribution with covariance N dt/ξ (Y = Y + ηdt/ξ). Therefore the evolved p.d.f with noise is given by the convolution Therefore after dt, the closest Gaussian distribution of the evolved p.d.f is characterized by the covariance matrix C = C + N dt/ξ = C + dt( J C + C J † + N )/ξ. Taking the limit dt → 0 leads to ξ dC dt = J C + C J † + N (10) This equation is formally very similar to the linear noise approximation (LNA) {van Kampen. Stochastic Processes in Physics and Chemistry, North-Holland (2001)} but with the significant difference that, in Eq.10, J is average over the current Gaussian distribution while in the LNA J is evaluated at the average value of Y .

Application to the block copolymer model
By definition of J, we find with r n,m and U n,m respectively the distance and the interaction potential between monomers n and m. J n,m is then given by averaging over the current Gaussian distribution of X n − X m , ie, for n = m J n,m = 2πr 2 n,m sin θdr n,m dθ (r n,m cos θ) 2 1 r n,m with D n,m = (X n − X m ) 2 = (X n − X m ) 2 /3, the third of the mean squared distance between n and m. Finally, we find for n = m J n,m = −k (2δ n,m − δ n−1,m − δ n+1,m ) + r 3 0 (U ns + U s δ en,em ) (D n,m + r 2 0 ) 5/2 + 5σ 5/4 U 0 hc 2 1/4 12 √ π with Γ inc [a, z] = ∞ z t a−1 e −t dt the incomplete gamma function. J n,n = − k =n J k,n .
From Eq.10, we derive a corresponding equation for D (Eq.2 of the maint text). Remarking that D n,m = C n,n + C m,m − 2C n,m , we find for n = m Since J is a fonction of D, the last equation is self-consistent and allows to compute the dynamics of the mean squared distance matrix.

Numerical integration
We choose k B T as the unit of energy, l as the unit of length, and ξl 2 /(k B T ) as the unit of time. The set of non-linear equations defined in Eq.14 is solved in the steady-state limit by numerical integration. For a given epigenetic pattern, starting from different initial conditions, we implement a fifth order adaptative Runge-Kutta algorithm {Press et al. Numerical Recipes, Cambridge University Press (2007)} to find the fixed points. For parameter sets located in the coil, globule or microphase regions (see Fig. 2 of the main text), the algorithm converges, independently of the initial condition, to a unique fixed point. In the multistability region, it exists several fixed points that corresponds to the stable and metastable thermodynamic states. The algorithm cannot give the relative weight of each state in the thermodynamic ensemble. From the steady-state matrix D, in the Gaussian approximation, the probability P m,n of contact between monomers m and n is given by with a the maximal contact distance and A a numerical factor.
Typically, for each parameter set, we run the integration algorithm starting from 4 different initial conditions (Fig. S3): 3 from the "monophasic" regions (coil, globule and microphase) and one that mimics the experimental HiC-maps. The experimental-like matrices for a given epigenetic pattern were obtained from the experimental map by: (1) constructing an "average" mapP by assigning to every couple of monomers the average contact frequency between the epigenomic domains where they respectively belong to; (2) computing the corresponding matrix D using D m,n = A P −2/3 m,n , A being chosen such that typical intra-domain distances were of order 1.

Equivalence with the self-consistent method of Timoshenko, Kuznetsov and Dawson
The From this set of equations, one can derive easily the dynamics of matrix C with N = 2k B T /ξI. The self-consistency is given by solving Timoshenko et al find that, for n = m, with It is easy to verify that −V n,m = J n,m . This means that Eqs.10 and 17 are identical and proove that the two approaches are equivalent.

Numerical simulations
In addition to the Gaussian self-consistent approximation, we also perform full numerical simulations of the model for some parameter values.
The dynamics of the chain is described by the general equations with m the mass of one bead. The two last terms of Eq.21 represents the coupling with the heat bath. Note that Eq.3 is derived from Eq.21 by neglecting the inertia of the system.
We choose k B T as the unit of energy, l as the unit of length, m as the unit of mass and ml 2 /(k B T ) as the unit of time. Simulation of trajectories are performed using the standard velocity-Verlet algorithm coupled to the Andersen thermostat (Frenkel and Smit, Understanding molecular simulations: from algorithm to applications, Academic Press). The velocity-Verlet algorithm allows to integrate the first -thermostat independent -part of Eq.21 (md 2 X n /dt 2 = −∂H/∂X n ). The Andersen thermostat accounts for the coupling with the heat bath (−ξdX n /dt + η n (t)) : at a given frequency, stochastic collisions are applied to every particles of the system resampling the velocities among the canonical ensemble. High frequencies are associated to strong friction coefficients. In Figs.2 and 3 of the main text and in Fig. S2 of the Supplemental Material, we perform our simulation with a frequency of 5.