Shaping the genome via lengthwise compaction, phase separation, and lamina adhesion

Abstract The link between genomic structure and biological function is yet to be consolidated, it is, however, clear that physical manipulation of the genome, driven by the activity of a variety of proteins, is a crucial step. To understand the consequences of the physical forces underlying genome organization, we build a coarse-grained polymer model of the genome, featuring three fundamentally distinct classes of interactions: lengthwise compaction, i.e., compaction of chromosomes along its contour, self-adhesion among epigenetically similar genomic segments, and adhesion of chromosome segments to the nuclear envelope or lamina. We postulate that these three types of interactions sufficiently represent the concerted action of the different proteins organizing the genome architecture and show that an interplay among these interactions can recapitulate the architectural variants observed across the tree of life. The model elucidates how an interplay of forces arising from the three classes of genomic interactions can drive drastic, yet predictable, changes in the global genome architecture, and makes testable predictions. We posit that precise control over these interactions in vivo is key to the regulation of genome architecture.

The contact function is a sigmoidal function that cuts off interactions between distant monomers, and this is used to calibrate the contact probability between genomic segments [1]. The contact function f (r i,j ) is defined as: where µ = 3.22 ad r c = 1.78 are used following previous calibration with experimental Hi-C maps [1]. Note, the qualitative results discussed in the main text are not sensitive to small changes in these parameters.
Interaction potential

Homopolymer
The homopolymer potential (U HP ) models a generic bead-spring polymer in which each bead represents a genomic segment containing 20-50 Kb of DNA, where chromosome topology fluctuations are controlled by using an energy barrier. This potential consists of the following five terms, U FENE , U Angle , U hc and, U sc : where, U hc (r i,j ) is the hard-core repulsive potential, include to avoid overlap between the bonded nearest neighbor monomers.
a three-body potential applied to all connected three consecutive monomers of a chromosome, where Θ i is the angle defined by two vectors r i,i+1 and r i,i−1 , and θ 0 = 0 is the equilibrium angle.
The non-bonded pairs is defined by a soft-core repulsive interaction in the following form: The expression U LJ correspond to the Lennard-Jones potential: capped off at a finite distance, thus allowing for chain crossing at finite energetic cost. The parameter r 0 is chosen as the distance at which U LJ (r i,j ) = 1 2 E cut . Note that this potential is applied across all non-neighboring monomers of the system.

Lengthwise Compaction
We implement lengthwise compaction of the polymer as a sum of two terms: where = 10 is the characteristic length of short-range compaction. The potential is applied between intra-chain monomers that are more than 3 monomers apart, and underlies a looping tendency. The first term with amplitude A L > 0 controls the long range compaction, while the second term with amplitude A S > 0 controls the short-range compaction. Note that the intensity of lengthwise compaction depends on the genomic distance between the two loci, and that this potential does not act across chromosomes. Different values of A L and A S

Phase Separation
The potential associated with phase separation is self-adhesion among monomers. There is a generic adhesion between any two monomers of intensity χ = −0.2. This implies whenever two monomers come within interaction distance of one another the energy of the system lowers by χ (where the units are in simulation energy scale = k B T ). The centromere monomers adhere to other centromere monomers with an enhanced adhesive interaction χ C .
The interaction potential is represented as follows:

Lamina Adhesion
We place static monomers at a distance R 0 from the center of mass of the genome, such as to form a rigid spherical shell of radius R 0 encapsulating the genome. The numerical value of R 0 is decided from the requirement of physiological volume fraction (φ ≈ 0.1) of the genome inside the nucleus: R 0 = σ(N/(8φ)) 1/3 , where N is the total number of monomers in the genome, and σ = 1 is the monomer diameter.
While all the lamina beads and genome beads experience a soft-core repulsion, a randomly selected subgroup of 30% of the surface beads interact favorably with the centromere with an interaction strength χ L ≤ 0. The nuclear envelope contains many elements like the nuclear pores that cover a significant portion of the surface area that are not adhesive to the chromosome segments. Given the genome volume fraction is about 10%, using a sub population of surface beads made the competition between phase separation and lamina adhesion occur for similar values of χ L and χ C . The lamina interaction potential may be expressed as follows: where 'adh-lamina' refers to the subgroup of lamina monomers that adhere to the centromere.

II. TRAJECTORY ANALYSIS
The analysis were done on an ensemble of simulated trajectories. We simulated each parameter set for 2 × 10 7 time-steps (dt = 10 −3 ), and generated 10 replicas of each trajectory with randomized initial configurations. We use high temperature annealing of the homopolymer model to generate many random structures that are used to initialize the simulations. We then neglect the initial 10 6 time steps from our analysis, to ensure the steady-state nature of our trajectories (Fig. S13).

Contact probability matrices
Contact probability matrices, the analogue of HiC-maps, are calculated using the contact function. For every frame, we compute the pairwise distance between monomers and then use the contact function to convert the distance into contact probability, following our previous approach [1]. All the snapshots corresponding to a parameter set are then averaged to generate the contact maps shown in the main and supplementary figures.

Principal component eigenvector
The outer product of the contact probability matrices were used to generate the cor- Using this scheme, we identify neighbors of a monomer, and then classify the ratio of number of intra-chain contacts to the total number of contacts per chain as the strength of the chromosome territory. Similarly, the proportion of trans-centromere contacts is computed from the ratio of number of inter-chromosome centromere contacts to the total number of centromere-centromere contacts.

Shape analysis: Gyration tensor
Attributes of chromosome shape, like the radius of gyration and the shape anisotropy, are calculated using the gyration tensor G, defined as follows for a set of coordinates: Here the i-sum is over all the monomers of the polymer whose shape we are interested in, and (x cm , y cm , z cm ) is the center of mass of the polymer. Once this matrix is computed for a given snapshot, we compute the eigenvalues of the gyration tensor λ 1 , λ 2 , λ 3 (note, λ 1 ≥ λ 2 ≥ λ 3 > 0) and then calculate the shape descriptors as follows [4]: where R g is the radius of gyration. c is the acylindricity, which is lower if there is cylindrical symmetry in the conformation. And, κ 2 is the relative shape anisotropy that is bound between 0 and 1, and is higher for anisotropic structures.

Hierarchical clustering: Number of centromere clusters
Clusters of centromeres and telomeres were defined using the hierarchical clustering algorithm via constructing a dendrogram. At the first step, each centromere monomer is considered a cluster, this is the largest possible number of clusters in the genome. Iteratively, clusters are merged, following a condition that the shift in the centroid of the cluster due to the merge is smaller than a cut-off value. We choose this cut-off to be twice the radius of gyration of the cluster (note, using a slightly different value, like three times the radius of gyration, does not change the qualitative nature of our results). When merging two clusters is shifts the centroid to larger than the cut-off, those two clusters are identified as two individual clusters. We use the python module Scipy.cluster [3] to implement hierarchical clustering. The number of clusters obtained from every snapshot is then plotted using a histogram.

Radial density distribution
Radial distribution of monomers is calculated from the snapshots. The volume occupied by the genome is divided into concentric shells centered at the centroid of the genome, then the number of monomers is each cell is counted. To obtain the density, the number is divided by the volume of the shell.

Gauss linking number: inter chromosome entanglements
The Gauss linking number between two chromosomes, counts the number of signed crossings between the them, and measures their entanglement. We use the "method 1a", as prescribed in Ref. [5], to compute the linking number. Since linking number is defined only for closed curves, we simply connect the two ends of each chromosome to close the curve during our computation. We calculate entanglement for each snapshot of an ensemble and then plot the histogram of linking number values, showing the distribution of linking numbers.

Simulation snapshots
The simulation snapshot images are made using the VMD software [6].
Graphics 14 33-38 (1996). FIG. S13. Steady state trajectories. The territory signal is shown for a single replica as a function of simulation time. The system was initialized as a SAW (hence the low territory at zero time), but simulated under the potential for G. The system reaches its steady state corresponding to G chromosomes in less than 10 6 time steps. We exclude the initial one million time steps from our analysis (dashed line), to ensure that the memory of the initial configuration is completely lost, and we are capturing the steady state.