Genome-wide profiling of nucleosome sensitivity and chromatin accessibility in Drosophila melanogaster

Nucleosomal DNA is thought to be generally inaccessible to DNA-binding factors, such as micrococcal nuclease (MNase). Here, we digest Drosophila chromatin with high and low concentrations of MNase to reveal two distinct nucleosome types: MNase-sensitive and MNase-resistant. MNase-resistant nucleosomes assemble on sequences depleted of A/T and enriched in G/C-containing dinucleotides, whereas MNase-sensitive nucleosomes form on A/T-rich sequences found at transcription start and termination sites, enhancers and DNase I hypersensitive sites. Estimates of nucleosome formation energies indicate that MNase-sensitive nucleosomes tend to be less stable than MNase-resistant ones. Strikingly, a decrease in cell growth temperature of about 10°C makes MNase-sensitive nucleosomes less accessible, suggesting that observed variations in MNase sensitivity are related to either thermal fluctuations of chromatin fibers or the activity of enzymatic machinery. In the vicinity of active genes and DNase I hypersensitive sites nucleosomes are organized into periodic arrays, likely due to ‘phasing’ off potential barriers formed by DNA-bound factors or by nucleosomes anchored to their positions through external interactions. The latter idea is substantiated by our biophysical model of nucleosome positioning and energetics, which predicts that nucleosomes immediately downstream of transcription start sites are anchored and recapitulates nucleosome phasing at active genes significantly better than sequence-dependent models.

Following previous work by us (Locke et al., 2010;Chereji and Morozov, 2014) and other groups (Chevereau et al., 2009;Teif and Rippe, 2009;Vaillant et al., 2010;Rippe, 2010, 2012;Mobius et al., 2013;Rube and Song, 2014), we have developed a biophysical approach to predicting nucleosome positioning and energetics from high-throughput sequencing datasets. We assume that a nucleosome consists of 147 base pairs (bp) of DNA wrapped around a histone octamer composed of two copies of four core histones: H2A, H2B, H3 and H4. We model nucleosome arrays as interacting hard rods of length a = 147 bp that can be reversibly adsorbed to a one-dimensional lattice with L bps, e.g. representing a chromosome.
There are two sets of related quantities to consider: the nucleosome distribution (probabilities of nding the system in every possible conguration), and nucleosome formation energies that dictate this distribution.

Forward Problem
If all interactions within the system are given but the nucleosome distribution is unknown, we have to solve a forward problem. We denote by u(i) the formation energy of a nucleosome with its center (dyad) at position i. This energy may have contributions from DNA bending, electrostatic interactions, hydrogen bond formation, van der Waals forces, etc. Particles are prevented from forming outside of the lattice ends, so that all nucleosomes must be inside the [0, L] range: Nucleosomes cannot overlap, and we assume that the neighboring particles interact through a hardcore interaction: Here, j and k indicate the positions of neighboring nucleosome dyads, and Φ(j, k) represents the two-body interaction between the two particles. Two-body interactions may also be used to account for the eects of higher-order chromatin structure .
For a xed number of particles bound to DNA, N , the canonical partition function (Reichl, 2009) is where β is the inverse temperature: β = 1/(k B T ) (k B is the Boltzmann constant, and T is the temperature). Our denition of the two-body interaction guarantees that only valid congurations of non-overlapping nucleosomes contribute to Eq. (1).
If the system is allowed to have a variable number of particles (e.g. dierent cells may have a dierent total number of nucleosomes on the same chromosome), the grand-canonical ensemble formalism (Reichl, 2009) is more appropriate, and the partition function is where µ is the chemical potential, and N max is the maximum number of non-overlapping rods that can t in L bp. Note that any conguration with a number of particles N > N max does not contribute to Z, because in this case at least two particles overlap. This allows us to extend the upper limit from N max to ∞. From the partition function we can compute s-particle distribution functions, In particular, the one-particle distribution is given by which represents the probability of nding a nucleosome with its dyad located at position k. From this we obtain the probability of nding bp i covered by any nucleosome, or nucleosome occupancy: represents the position of the outermost bp covered by a nucleosome, relative to its dyad. For a particle covering an odd number of bp, where the dyad position is well-dened, a/2 = a−1 2 , which in the case of nucleosomes is a/2 = 73 bp. Note that 1 − Occ(i) is the probability that bp i is not covered by any nucleosome, i.e. the probability of nding linker DNA at bp i.
As we showed before (Chereji and Morozov, 2014), in the case of hard-core interactions alone, one can compute the single-particle distribution in an ecient way. Consider where Z − (i) and Z + (i) represent the partition functions for the domains to the left and to the right of the nucleosome with the dyad at position i. These two quantities satisfy the recursion relations: Introducing new variables in log space 4 These relations can be iterated starting from the boundary conditions: From Eq.

Inverse Problem
In experiments such as MNase-seq, ChIP-seq, and ChIP-exo, genome-wide distributions of various DNA-binding proteins are determined, but the energetics of DNA binding are unknown. In order to predict protein-DNA binding and protein-protein interaction energies we need to solve the inverse problem, in which these energies are inferred starting from observed distributions of DNA-bound particles. In general, the inverse problem is much harder to solve compared to the corresponding forward problem.
Fortunately, it is straightforward to obtain nucleosome formation energies from the known distribution of nucleosomes, n 1 (Locke et al., 2010;Chereji and Morozov, 2014).
From Eqs.
(2), (3) and (4) we obtain and Z + (i) Z + (i + 1) = 1 + Zn 1 (i + a) Z − (i + a)Z + (i + 1) , where Note that a particle with the center at position i and a particle with the center at position i + a − 1 overlap by exactly one bp. The overlapping bp is i + a 2 = i + a − 1 − a 2 . Z − (i + a − 1) represents the partition function for the domain [1, i + a 2 ), while Z + (i) represents the partition function for the domain (i + a 2 , L]. Thus Z − (i + a − 1)Z + (i) represent all congurations which have bp i + a 2 unoccupied by a particle, so that Using this, Eqs. (8) and (9) become .
For i ≤ a+ a 2 , there is not enough space for a particle to completely t in the domain [1, i− a 2 −1], so that For i > a + a 2 we have .
Similarly, for i > L − a − a 2 , there is not enough space for a particle to t in the domain Using Eqs. (14), (15), (16), and (17) we can compute the partition functions, and from Eq. (2) we . The chemical potential µ can be determined from Eq.
(2) if we know the average nucleosome density < n 1 (i) > i∈[1,L] . In practice, the real average nucleosome density is hard to estimate with accuracy, and we will use µ as a reference for all energies, and as a tting parameter.

Energy models
We consider three models of nucleosome formation energies. The rst model completely neglects DNA-sequence dependence of nucleosome formation and assumes that the only major energy contributions are given by external factors such as chromatin remodelers, which are responsible e.g.

Sequence-independent minimal model
For the sequence-independent model, we use the average nucleosome distribution at the TSS of active genes in order to infer the corresponding formation energy, u SI (i), which can generate this distribution. We use Eq. (18) to obtain the eective energy potential generated by the action of all external factors. We assume that these factors contribute mostly near TSS, and their contribution vanishes away from TSS, so that we can eliminate the constant chemical potential from Eq. (18) by using the asymptotic value of the external potential This way, we obtain the external potential shown in Figure 6A (black line), u SI (i). This energy prole contains a potential barrier over the NDR and a potential well at the +1 nucleosome position. This suggests an anchoring mechanism that generates a strong bias for the location of +1 nucleosomes, which are known to be well-positioned in vivo.
To generate the genome-wide nucleosome distribution predicted by this model, we create an energy landscape which is at except for 2 kb regions around the TSS of active genes, where the anchoring potential (black line in Figure 6A) is added.

Sequence-dependent model
The N = 2 position-independent model (Locke et al., 2010; assumes that the sequence-dependent part of the nucleosome formation energy, u SD (i), corresponding to a nucleosome that occupies the DNA sequence between nucleotides i − a 2 and i + a 2 , depends only 7 on the mono-and dinucleotide counts in the nucleosomal DNA: is the energy contribution of the mononucleotide pair S k / S k , and S k S k+1 / S k+1 S k is the energy contribution of the dinucleotide pair S k S k+1 / S k+1 S k ( S k is a nucleotide complementary to S k ). Because for each DNA sequence and its reverse complement the nucleosome formation energies are the same, u SD (i) is a function of only 12 unique parameters: The energies in Eq. (18) can then be written as where µ is the chemical potential and m i are the counts of mono-and dinucleotide pairs X/ X and XY / Y X in the DNA sequence containing nucleotides between i − a 2 and i + a 2 , respectively.
For all dyad positions, i, we obtain a system of P equations where E − µ is a column vector of dimension P , each row containing one element u SD (i) − µ from µ is the column vector of tting parameters from Eq. (19), and M is a P × 13 matrix with mono-and dinucleotide counts, and -1's in the last column. Using Eqs. (20), we obtain the energy parameters and µ by a least-squares t.
Because in every DNA sequence the number of mononucleotides is equal to the length of the sequence and the number of dinucleotides is equal to the length of the sequence minus one, the columns of the matrix M are not linearly independent: two columns can be expressed as a linear combination of the others. To make the set of tted parameters unique, we impose the following constraints as explained in .

Full model
We also construct a third model in which the full nucleosome formation energy is the sum of the sequence-independent energy, u SI , and the sequence-dependent energy, u SD . After tting the N = 2 position-independent model described above, we add u SI and u SD . The results of all three models considered above are shown in Supplemental Figure S8. 10 ChIPs were performed as previously described (Moshkin et al., 2012 Figure Figure S1 (lighter grey bars). In addition, we considered shorter fragments of 95 ± 20 bp (MNase HIGH -ChIP-seq) and 105 ± 30 bp (MNase LOW -ChIP-seq) in H3 experiments (darker grey bars).     Figure S7. Caption on the next page. proles. Predictions were made using a sequence-dependent model, a sequence-independent model (black line in Figure 6A), and a full model in which the sequence-dependent and sequence-independent contributions are combined (Supplemental Methods). Box plots indicate median correlation (green lines), the 25 th and 75 th percentiles (boxes), the range of the correlation coecients not considered outliers (black dashed lines), and outliers (red crosses). Data points are considered outliers if they are larger than q 3 + 1.5(q 3 − q 1 ) or smaller than q 1 − 1.5(q 3 − q 1 ), where q 1 and q 3 are the 25 th and 75 th percentiles, respectively.
Supplemental Tables   Table S1. Summary of paired-end sequencing data. Occupancy correlation refers to the genome-wide linear correlation coecient between Replicate 1 and Replicate 2 nucleosome occupancy proles.