Spatial confinement is a major determinant of the folding landscape of human chromosomes

The global architecture of the cell nucleus and the spatial organization of chromatin play important roles in gene expression and nuclear function. Single-cell imaging and chromosome conformation capture-based techniques provide a wealth of information on the spatial organization of chromosomes. However, a mechanistic model that can account for all observed scaling behaviors governing long-range chromatin interactions is missing. Here we describe a model called constrained self-avoiding chromatin (C-SAC) for studying spatial structures of chromosomes, as the available space is a key determinant of chromosome folding. We studied large ensembles of model chromatin chains with appropriate fiber diameter, persistence length and excluded volume under spatial confinement. We show that the equilibrium ensemble of randomly folded chromosomes in the confined nuclear volume gives rise to the experimentally observed higher-order architecture of human chromosomes, including average scaling properties of mean-square spatial distance, end-to-end distance, contact probability and their chromosome-to-chromosome variabilities. Our results indicate that the overall structure of a human chromosome is dictated by the spatial confinement of the nuclear space, which may undergo significant tissue- and developmental stage-specific size changes.

diameter D is small. At each step of chain growth process, we calculated effective sample size [1] as: where M is the total number of chains. If ESS < 0.3M , we assign a probability p(i) to each partial chain i as p(i) = exp(w i − max 1≤i≤M w i ) and sample M chains with replacement according to p(i) and adjust the weights of each selected chain k as w * k = w k p(k) . We then continue to grow chains of this new population. This is repeated until all chains reach full length.
Chromatin properties. With m successfully generated chromatin chains, we can calculate the physical properties of the population of chromatin fibers. Denote the configurations of the j-th successfully generated chromatin chain as x (j) = (x (j) 1 , · · · , x (j) n ), and its associated weight w (j) . To calculate the mean value of a physical propertyh(x) such as the mean end-to-end distance of a chromatin chain, we have: Mean end-to-end distance. The mean end-to-end distance R(N ) is the mean Euclidean distance between the beginning and the end of the chain of a length N . For the j th chromatin chain, we have: The mean end-to-end distance is then calculated for the set of m chromatin chains as: . Mean-square spatial distance. The mean-square spatial distance R 2 (s) is the mean-square Euclidean distance between genomic regions with a genomic separation s, here in units of persistence length. For the j th chromatin chain, we have: where the denominator N − s is the total number of all possible such interactions with s-separations. The mean-square spatial distance is then calculated for the set of m chromatin chains as: . Contact probability. The contact probability P c (s) is the probability of two genomic regions separated by genomic distance s to be in spatial proximity of each other for chain of length N . Following Lieberman-Aiden et al. [8], it is calculated by counting the number of times that the Euclidean distance between two regions separated by genomic distance s is smaller than a distance threshold d θ , divided by the number of all such candidate contacts. Let I The mean value from the weighted ensemble average is then calculated as: Reweighting. As chromatin chains are generated following the uniform distribution π(x) of all geometrically realizable chains, these samples need to be reweighted in order to calculate ensemble properties of chromatin chains following a different distribution π ′ (x).
To asses the effect of specific binding on the population of chromatin chains, we recalculate the associated weights of each chain for chromatin following the new distribution π ′ (x), which is the Boltzmann distribution after incorporating energies of binding interactions. For a chromatin chain with interactions mediated through protein binder, each interaction between any (i, j) pairs of sites contributes to the weight of the chain by the Boltzmann factor of exp( is the binding energy if both i and j contain binding sites and are mediated by the binder protein, otherwise E (k) (i, j) = 0. The total weight of the k th chain previously sampled from the uniform distribution is then re-calculated as: Clustering. We clustered the generated chromatin chain conformations according to their pairwise distances between persistence units using a k-means clustering algorithm [9]. For k-means clustering, we need to calculate the Euclidean distances between persistence units. As we have a population of m=10,000 chains,

Additional Details of Results
Scaling of C-SAC chains without confinement. We first used our geometric sequential importance sampling technique to generate free space self-avoiding C-SAC chains without confinement. The scaling relationship R(N ) ∼ N ν and P c ∼ N α are shown in SI Fig. 1. α fit is obtained from the region before contact probability become noisy (from 3 to 40 L p s).

R(s) ∼ s ν scaling relationship from FISH studies.
In the FISH study of ref. [11], spatial distances In the FISH study of human Chr 11 and Chr 1 [12], ν was reported to be ∼ 0.33 in both human Chr 11 Chr 1 when 0.4 < s < 2 Mb. The leveling-off effects were reported to takes place at s ≥ 10 Mb in Chr 11 and s ≥ 3 Mb in Chr 1 [12].
It was also reported in ref. [13] that the FISH study of mouse Chr 14 in ref. [14] exhibits a ν ∼ 0.5 when s < 3.5 Mb, beyond which the leveling-off effects may take place.

R(s) ∼ s ν scaling of confined C-SAC chains and comparison with FISH studies.
In C-SAC chains of length N = 1, 000 with the confinement of D = 1.5 µm, the leveling-off effects are found to take place at around s = 125L p . We calculated the scaling exponent ν of R(s) ∼ s ν between s = 5L p and s = 25L p .
This choice of 25L p is based on the ratio of 25/125, which is the same as the ratio of 2Mb/10Mb between the distance threshold where ν was fitted and the distance threshold beyond which the leveling -off effects occurred in human Chr. 11 [12], which was also used in the study of refs. [8,15]. We found ν ∼ 0.34 when 5L p ≤ s ≤ 25L p .
As discussed above, there are some variations in the reported values of the scaling exponent ν from existing FISH studies. Similarly, we found that ν also varies depending on the regime where the exponents were fitted. If s ≤ 60L p , ν is found to be ∼ 0.25, and ν ∼ 0.5 if s ≤ 15L p .
Details of scaling of α and ν of each of the k = 20 clusters. The average scaling exponents α and ν of each cluster, along with the size of the cluster are listed in Table S1.