Protein disorder-to-order transition enhances the nucleosome-binding affinity of H1.

Abstract Intrinsically disordered proteins are crucial elements of chromatin heterogenous organization. While disorder in the histone tails enables a large variation of inter-nucleosome arrangements, disorder within the chromatin-binding proteins facilitates promiscuous binding to a wide range of different molecular targets, consistent with structural heterogeneity. Among the partially disordered chromatin-binding proteins, the H1 linker histone influences a myriad of chromatin characteristics including compaction, nucleosome spacing, transcription regulation, and the recruitment of other chromatin regulating proteins. Although it is now established that the long C-terminal domain (CTD) of H1 remains disordered upon nucleosome binding and that such disorder favours chromatin fluidity, the structural behaviour and thereby the role/function of the N-terminal domain (NTD) within chromatin is yet unresolved. On the basis of microsecond-long parallel-tempering metadynamics and temperature-replica exchange atomistic molecular dynamics simulations of different H1 NTD subtypes, we demonstrate that the NTD is completely unstructured in solution but undergoes an important disorder-to-order transition upon nucleosome binding: it forms a helix that enhances its DNA binding ability. Further, we show that the helical propensity of the H1 NTD is subtype-dependent and correlates with the experimentally observed binding affinity of H1 subtypes, suggesting an important functional implication of this disorder-to-order transition.


INTRODUCTION
In Eukaryotic cells, genomic DNA is packaged into chromatin, a nucleoprotein complex whose fundamental repeat-ing unit is the nucleosome (1,2). Nucleosomes are formed by ∼147 bp of DNA wrapped around a histone protein octamer core (two copies each of H3, H4, H2A and H2B) and are joined together by free DNA linker segments. Most organisms contain an additional type of histone, the H1 linker histone protein. H1 is a chromatin architectural protein that binds to nucleosomes near the entry/exit site of the linker DNA (3,4), interacts with ∼20 bp of the DNA linkers (5), and critically influences chromatin organization (6)(7)(8).
From the protein structure point of view, the H1 proteins contain a ∼75 residue-long winged-helix globular domain, an unstructured N-terminal domain (NTD) of up to 45 residues, and a long mostly unstructured C-terminal domain (CTD) of ∼100 residues (19,20) ( Figure 1A). The globular domain is highly conserved within the H1 family (21,22) but the NTD and CTD vary significantly in sequence and length. This has led to the hypothesis that the differential behaviour among H1 subtypes stems from the heterogeneity in the diverging unstructured terminal domains (14). Because the CTD is believed to account for most of the subtype heterogeneity (23)(24)(25), significant efforts have been made to understand the conformational landscape of the CTD in isolation (26), when bound to DNA (27) or to a negatively charged IDP with high affinity (28), and when bound to a full chromatosome (4,29).
H1's shortest domain, the NTD, plays a negligible role in H1's chromatin compaction capability (30). However, its deletion reduces the binding affinity of H1 to the nucleosome significantly (30,31). Across H1 subtypes, an analysis of the NTD role has shown contrasting results with swaps of the H1.0/H1.2 and H1.0/H1x NTDs resulting in an exchange of their binding characteristics (32,33), but swaps of H1.1/H1.5 NTDs displaying no effects (23). The NTDs are rich in basic residues crucial for DNA interactions and otherwise predominantly composed of G/E/P/S/T/A amino acids -an archetypical composition of IDPs (34). Indeed, the NTD has been shown to be unstructured in aqueous solution and to have a propensity to fold in the presence of short DNA segments (35). However, whether the NTD remains disordered or undergoes a disorder-to-order transition when H1 binds to the nucleosome is not known, which limits our molecular mechanistic understanding of its functional role (36,37).
Given the unstructured and flexible nature of the NTD, characterizing its conformational landscape experimentally at atomistic resolution is challenging. Complementary to the experimental efforts, advanced simulation methods, such as enhanced sampling molecular dynamics (MD) simulations, have been successfully used in recent years to investigate the configurational ensembles of core histone tails (38,39) and the CTD of H1 (29), and to understand the role of post-translational modifications within them (40,41). Here, we take advantage of recent advances in enhancedsampling methods and the development of state-of-the-art biomolecular force fields (42,43) for the description of IDPs. In particular, we use the Parallel-Tempered Metadynamics in the Well-Tempered Ensemble (PTMetaD-WTE) method (44)--a combination of the Temperature Replica-Exchange (T-REMD) and metadynamics techniques developed to facilitate sampling of phase space. While T-REMD assists in the crossing of high energy barriers via high temperature replicas, MetaD-WTE accelerates sampling through the use of a history-dependent biasing potential acting along specific collective variables (CV) (see Materials and Methods). The combination of both helps overcome two well-known limitations of using T-REMD or metadynamics independently: (1) thorough sampling along slow diffusing CVs not considered explicitly (44), and (2) increase of the energetic overlap among temperature replicas and, hence, significant reduction of the total number of replicas needed and the associated computational costs (45) (see Materials and Methods).
By means of a large set of microsecond-long PTMetaD-WTE, T-REMD, and Biased-Exchange Metadynamics (BEMD) simulations (Supplementary Table S1), we fully characterize the difference in structure across various NTD subtypes. We compare how the structural behaviour of the NTD changes when it is free in solution, versus when it is bound to DNA or to a full 211-bp nucleosome (a 147-bp nucleosome with two symmetric 32-bp linker DNA arms). Our results show that the NTD of H1 is completely unstructured in solution, but upon DNA or nucleosome binding transitions into an amphiphilic helix with charged and uncharged residues on opposing faces. Further, we demonstrate that formation of this amphiphilic helix depends on the charge neutralization of the Lys sidechains by DNA. Crucially, our work demonstrates that the amphiphilic helical propensity of the NTD of H1 varies proportionally to the experimentally observed binding affinity of H1 subtypes, suggesting a potential implication of the NTD disorder-to-order transition in the function of linker histones.

System and simulation setup
The sequences of the human H1 subtypes were obtained from the Uniprot Database (46) and their N-terminal domains were built in an extended conformation using Avogadro (47) without capping of the N-or C-terminus. For computational expedience associated with reducing the solvent molecules required, we bent the initial linear configurations through a short 5 ns simulation in GBSA implicit solvent (39,48). The resulting structures were then used as initial configurations for all further simulations. The basic sub-region of each NTD was defined as commencing three residues (one-turn) before the first Lys of the NTD.
For the 20-bp B-DNA strands, we used two different force fields: Charmm DNA and parmbsc0 (60,61). The Settle algorithm (62) was used to constrain bond lengths and angles of water molecules and P-LINCS was used for all other bond lengths. Temperatures were maintained using the v-rescale thermostat (63) and the pressure at 1 bar using the Parrinello-Rahman barostat (64). Long range electrostatic interactions were calculated using the Particle Mesh Ewald (PME) (65) algorithm with a cut-off of 1.0 nm.

Temperature replica-exchange simulations
Temperature Replica-Exchange (T-REMD) simulations were performed starting from representative NTD configurations obtained through implicit-solvent equilibration. A temperature range of 300-450 K was used and the number/temperature of replicas were calculated using the predictor of Patriksson and van der Spoel (66) and an acceptance probability of 20%. This resulted in 56, 96 and 72 replicas for the H1.0, H1.1 and H1.2 subtypes respectively. The replicas were simulated for 250 ns each and exchanges between adjacent temperatures were attempted every 10 ps. The trajectory frames from the lowest temperature (300 K) were then used for analysis. Convergence of the T-REMD simulations were assessed by monitoring changes in the helical and beta content of the residues in 40 ns intervals after discarding the initial 50 ns for equilibration.

Parallel-tempered metadynamics in the well-tempered ensemble
We begin by applying a time-dependent biasing potential, V(S,t), defined as a sum of Gaussians deposited along the collective variables (CV) space (S) where G is the time interval at which the Gaussians are added with height W, width and mean S t' . The biasing potential discourages the trajectory from visiting states already sampled. Choosing an accurate set of CVs to adequately promote sampling is often non-trivial (45) with energy barriers along unaccounted CVs affecting sampling efficiency (67). The Parallel-Tempered Metadynamics (PT-MetaD) (68) method combines the above described T-REMD with metadynamics to deal with unaccounted slow degrees of freedom. The metadynamics biases of individual replicas at different temperatures are evolved independently and the adjacent replicas can exchange with a probability of where U is the potential energy, V is the bias potential and ␤ is the inverse of the thermodynamic temperature (1/k B T). However, energy distributions of the replicas 'i' and 'j' need to sufficiently overlap for exchanges to occur (Supplementary Figure S1). This in-turn requires the use of a large number of replicas as predicted for the T-REMD simulations. Here, we use the Well-Tempered Ensemble (WTE) (44) to improve efficiency. The method enhances energetic fluctuations of each replica and thereby its overlap with adjacent replicas through the use of the potential energy as an independent CV. For the current work, we used the procedure of Bernetti et al. (69) with eight temperatures geometrically distributed between 298 and 400 K as where N is the number of replicas. The configurations obtained from implicit solvent equilibration were equilibrated in the eight temperatures before a preliminary 10 ns PTMetaD-WTE run was carried out. This initial run used only the potential energy (U) as the CV and deposited Gaussians of height 2.5 kJ/mol, width 500 kJ/mol and bias factor of 50. The Gaussians were deposited every 250 steps and sufficient overlap of the replica potential energies was ensured through a histogram analysis at the end of this run (see Supplementary Figure S1).
Nucleic Acids Research, 2020, Vol. 48, No. 10 5321 The bias from this preliminary run was then kept fixed in the production PTMetaD-WTE simulation and ensured exchanges between adjacent replicas. This production run used two CVs: (1) the alpha helical content (S ␣ ) and (2) the radius of gyration (S rg ) of C␣ atoms. The alpha helical content CV (S ␣ ) has been used extensively as a CV within metadynamics simulations to sample the random coil-helix transformation of proteins, with back-calculated NMR shifts of the resulting conformational ensembles being in good agreement with experimental values (69)(70)(71). It is defined following the work of Pietrucci and Laio (72) as where R 0 , n and m are 0.08 nm, 8 and 12 respectively.
RMSD is the root mean square difference of six residue segments between the configuration and ideal alpha conformations. The Backbone (C␣, C, N, O) and C␤ atoms were included in the RMSD calculation.
The radius of gyration CV (S rg ) of the C␣ atoms is defined as where r i and m i are the positions of and mass of C␣ atom i. r com is the centre of mass of the NTD C␣ atoms.

DNA-protein metadynamics simulations
The H1.0 subtype was used to examine the effects of DNA binding on the conformational preferences of the NTD. The nucleosomal on-dyad structure of the H1.0 globular domain (PDB ID: 5NL0 (4)) was obtained and Modeller (73) was used to add the NTD basic subregion in an extended random coil configuration. The relative orientations of this built basic subregion and one of the nucleosomal linker arms (20 bp) was used as the initial configuration ( Figure  3A) for metadynamics simulation.
The metadynamics simulations utilized two CVs: (1) the alpha helical content (S ␣ ) and (2) the number of NTD-DNA electrostatic contacts (S cont ). The CVs thus simultaneously allow exploring NTD-DNA binding/unbinding and changes in helical propensity. The S cont CV was defined as where R 0 , n and m are 0.2 nm, 8 and 10 respectively. H + is the set of all positively charged sidechain hydrogen atoms within Lys/Arg residues and O − is the set of backbone oxygen atoms within the central 11-bp of DNA. Metadynamics gaussians were deposited every 10 ps and the simulation was run for 600 ns and the initial 50 ns was disregarded for analysis. To assist convergence, the lowest terminal base-pair (BP 20) and the backbone of the NTD's carboxy-terminal residue were both restrained with a force constant of 750 kJ/mol/nm 2 .

Trajectory analysis
The simulation trajectories were analysed using a combination of Plumed/Gromacs tools together with Python scripts using the MDAnalysis and MDTraj libraries (74,75). The secondary structures were assigned using DSSP (76) with residues being part of either alpha-or 3 10 -helix motifs counted towards the total helical propensity. The theoretical compaction (radius-of-gyration) of proteins with globular (77) and random-coil (78) structures were calculated as Clustering was performed using the single-linkage clustering method where the structure 'j' is assigned to cluster 'i' if the RMSD of C␣ atoms between them is < 2.5Å.
In the metadynamics simulations, the addition of history dependent bias potentials precludes a direct ensemble averaging of the system's characteristics as simulation time is without physical meaning. The methodology of Tiwary and Parrinello (79) was thus used to reweight trajectory frames and subsequently calculate the unbiased equilibrium averages. Briefly, the Probability Distribution of the biased system P(R,t) can be expressed as is the bias potential as given by Equation (1). The introduction of the delta function δ(s -S(R)) and the unbiased probability density function P 0 (R,t) allows the expression of equation (7) as where c(t) is the time-dependent bias offset (80) defined as The c(t) offset is calculated using Plumed following the protocol of Tiwary and Parrinello (79) and used to assign weights For the calculation of per-residue helical content, perframe O i of the residue was set as 1 if DSSP (76) predicted an alpha/3 10 motif and 0 otherwise. Similarly, the clusters were reweighted using w(t) of the trajectory frames assigned to them.

Docking
Docking of the clustered conformations from the PTMetaD-WTE simulations were performed using the HADDOCK webserver (81). The B-DNA for docking was built as a 20-bp long strand using the Nucleic Acids Builder within Amber16 (82). To allow for all possible orientations of the NTD basic sub-regions when interacting with this strand, the central 12-bp (one-turn) were considered the 'active' residues for docking. For the NTD basic subregions, the Lys residues within the basic 'face' were considered the 'active' residues. The His residues were considered neutrally charged and no segment within the DNA or NTD was considered flexible.

Potential of mean force calculations
The Potential of Mean Force (PMF) calculations were performed using a protocol similar to that of Wieczor and Czub (83). Firstly, the helical axis of the 20-bp strand was aligned along the z-axis and the radial-distance between the Central BP and the Helix C␣ atoms along the XY-plane was used as the CV. Initial frames for the umbrella windows were generated from the docked configuration by increasing this radial distance CV using 20 windows at intervals of 0.1 nm. Each window was then simulated for 100 ns using an umbrella biasing potential of 750 kJ/mol/nm 2 along the CV. To reduce diffusion of the DNA strand and thereby assist convergence (84), the two terminal base-pairs at each end (BP 1,2,19,20) were restrained with a force constant of 500 kJ/mol/nm 2 . The initial 25 ns were discarded, and the rest of the trajectory was used for analysis. PMF profiles were generated using the GROMACS implementation (g wham) (85) of the weighted histogram analysis method (WHAM). Bayesian bootstrapping with 100 bootstraps were used to estimate the errors for each profile.

The isolated H1 NTD peptide is disordered regardless of subtype
The amino acid sequences of the various H1 NTD subtypes display several distinguishable features of IDPs: low hydrophobicity, high net charge, and low sequence complexity (86). Thus, we first determine if the varying amino acid sequences of the different subtypes of the H1 NTD are sufficient to induce different structural behaviours, such as, random coil conformations, disordered structures with flickering secondary structural elements, or stable secondary structural folds. To answer this, we assess the conformational landscape of the isolated NTD peptides of three different H1 human subtypes--H1.0, H1.1 and H1.2--in solution through T-REMD simulations. We consider temperature replicas ranging from 300 to 450 K and run each replica for 250 ns for an accumulated sampling of up to 24 s per system (see Materials and Methods).
We find that for all subtypes, the H1 NTD emerges as a highly unstructured domain displaying only transient secondary structural elements (Figure 1), irrespective of the force field used (Amberff99sb*-ildn-q (51-53) and Charmm36M (42); Supplementary Figure S2). Indeed, for both force fields, the conformational ensembles include only a small fraction (<15%) of residues with ␣-helical or ␤strand structural elements ( Figure 1B). These observations are consistent with circular dichroism experiments (35,87) showing that the spectra for the NTDs of H1.0 and mH1.4 (homologous to hH1.2, Supplementary Figure S3) in aqueous solution are dominated by contributions of the random coil.
To further explore the structural heterogeneity of the NTDs, their size distributions were calculated and compared to the mean radius of gyration (R g ) of a globular (77) and thermally denatured random-coil proteins (78). The structural heterogeneity of the NTDs is evident from the wide distributions of radius of gyration ( Figure 1C). However, despite the large structural variation, most of the conformations adopted by the NTDs are compact, as shown by radii of gyration that are closer to the value predicted for a globular protein of similar length ( Figure 1C and Materials and Methods). In contrast, Replica-Exchange with Solute Tempering (REST2) (88) simulations of the CTD of H1.0 (29) reveal extended conformations ( Supplementary Figures S4 and S5). The differences in compaction between the two unstructured domains might plausibly be attributed to the uneven charge density within them. The 97-residue H1.0 CTD is predominantly composed of basic residues (42.2%, 41 Lys) while the 26-residue H1.0 NTD only has six basic residues (23.1%). To test this hypothesis, we split the H1.0 NTD into hydrophobic and basic subregions ( Figure 2D) and recalculated the radius of gyration of each region from the T-REMD simulations (Supplementary Figure S6). Consistently, we observe that the basic subregion, with higher charge density, adopts extended conformations that yield the typical distribution of the radius of gyration of IDPs (Supplementary Figure S6). In contrast, the hydrophobic subregion adopts predominantly collapsed conformations that contribute to the comparatively compact NTD radius of gyration observed in Figure 1B.
This distinction between the two unstructured N-and Cterminal domains might provide a molecular explanation for their experimentally observed contrasting roles in mediating H1-nucleosome interactions (23,89). Extended states of the CTD might allow it to initiate the process of nucleosome recognition through long-range electrostatic interactions (89) while the collapsed conformations of the NTD might preclude this (23).

H1 NTD undergoes a subtype-specific disorder-to-order transition upon charge neutralization
The NTD can be divided into two distinct subregions (90): the N-terminal subregion that lacks charged residues and is enriched in Ala, Pro and other non-aromatic hydrophobic residues and the basic subregion (close to the globular domain), which contains a high proportion of positive Lys/Arg residues crucial for electrostatic interactions with the DNA (Figure 2D) (90). Binding of H1 NTD to DNA is, hence, expected to significantly transform the structural behaviour of the second half of the NTD due to screening of the electrostatic repulsion among its numerous positively charged residues enabled by the interactions with the phosphate backbone (91).
We perform PTMetaD-WTE (68) simulations of the basic subregions of the three NTDs explored above in two conditions: (1) with standard charged Lys (WT), and (2) with neutralized Lys sidechains (neutralized; as an approximation to NTD charge neutralization by DNA). We use two collective variables (CV) that show complete diffusivity along the simulation (Supplementary Figure S7)--the radius of gyration (S Rg ) and a function of the alpha-helical content (S ␣ ) (72)--and eight geometrically spaced temperature replicas from 300 to 450 K (see Materials and Methods) together with enhanced potential energy fluctuations (WTE -Supplementary Figure S1) (44). To measure alpha-helical content, S ␣ is defined as a function of the RMSD between the NTD and a six-residue ideal ␣-helix (72) (see Methods). We chose these parallel-tempering metadynamics method as it allows us to characterize the effects of charge neutralization in the statistical ensembles of the H1 NTDs and on their free energy surfaces as a function of their secondary structural content. In all cases, we run the simulations for 3.6 s as we achieve convergence after 2.8 s of accumulated sampling (350 ns per replica; see Supplementary Figure S8).
For the WT NTD systems, independently of the subtype, the free-energy profile along the S ␣ collective variable shows a single minimum at values of low helical content S ␣ ∼0.1, which evidences that the most energetically favourable conformations for all subtypes are those lacking secondary structure (Figure 2A). This is in agreement with the predominantly unstructured ensembles observed in our T-REMD simulations for the same systems ( Figure  1). Upon charge neutralization, all three variants preserve a global minimum at S ␣ values of ∼0.1, signalling favourable unstructured conformations in all cases. However, interestingly, the neutralized NTDs of the H1.0 and H1.2 subtypes exhibit a second low energy minima at a higher value of S ␣ (Figure 2A), corresponding to conformations with helical secondary structural elements. Transient short-lived ␤motifs were also observed, but with propensities that remain unchanged upon charge neutralization (Supplementary Figure S10). To discard the existence of other energy minima, the two-dimensional free energy profiles along the two CVs were plotted and compared between the two Lys charge states (see free energy surface (FES) plots in Supplementary Figure S9).
The minima of the H1.0 NTD at large S ␣ values (S␣=4.4) is nearly as stable as that of the unstructured conformations ( E = 3.6 kJ/mol). The helical propensity of the neutralized H1.0 NTD ( Figure 2B), estimated using the reweighted probability distributions stemming from our PTMetaD-WTE simulations (see Materials and Methods) (79), shows that this is a single seven residue-long helix spanning the region 16 Table S3). While the extent of helicity varied, the span of the motif remained invariant throughout (Supplementary Figure S11).
Further inspection of this motif ( Figure 2C), shows that forming a helix would concentrate the positively charged residues K14, K17, K20 and K21 on the same face of the helix, creating a charged face on the NTD that should favour interactions with DNA. Indeed, the projection of sidechains down the axis of an ideal helix (3.6 residues per turn, 100 • Table 1. Characteristic differences in the sequences of the NTD basic subregion of the H1.0, H1.1 and H1.2 subtypes. The basic subregion was considered as the segment between the first and last Arg/Lys residues within the NTD between residues), known as helical wheel analysis, reveals that the sequence of amino acids in the H1.0 NTD is ideal for clustering charge along one face ( Figure 2E), as occurs in many proteins that undergo a disordered-to-ordered transition upon DNA binding (92,93). Hence, once electrostatic repulsion is screened, the sequence of the H1.0 NTD leads to focussing of positive charge on one face of an ␣-helix favouring then the stability of H1.0 NTD-DNA contacts. Although the H1.2 NTD with uncharged Lys sidechains also exhibits a second minimum corresponding to a partial helical conformation, this occurs at a much lower S ␣ value of 1.3 and with a free energy slightly higher ( E = 5.8 kJ/mol) (in-comparison to the unstructured conformation). In line with the lower S ␣ value, the per-residue helicity for this subtype shows two short and more transient helical motifs involving residues 19 PVKKKA 24 and 33 RKAS 36 ( Figure 2B). Inspection of the H1.2 NTD sequence explains why this subtype is unable to form a single long helix: each of the two short helices starts with a 'helix-initiating' Pro and are separated by a helix-breaking double Gly (G29/G30) motif (94). This is consistent with a helix-Gly-Gly-helix motif observed in the mouse H1.4 NTD (>95% sequence similarity to hH1.2 NTD, Supplementary Figure S3) in the helix-stabilizing solvent TFE (87). Despite helicity in the H1.2 NTD case being shorter-lived than in the H1.0 subtype, transient partial folding and rigidification of the H1.2 NTD upon DNA binding would imply enabling a more compact configuration that concentrates positive charge (in this case three and two basic residues per helical motif) better than a random coil and favours stronger interactions with DNA.
In contrast to the two other H1 NTDs, the H1.1 subtype exhibits a one-dimensional free-energy surface along S ␣ that is invariant with the Lys charge state. The perresidue helicity plot in Figure 2B confirms only two small segments with low propensity (<10%).
The differences in folding propensities that we observe correlate with the percentage of positively charged residues and the lack of helix-breaking residues that each system has (see Table 1). That is, the H1.0 NTD subtype that possesses the highest folding propensity, has the highest concentration of positive residues and only one 'helix-destabilizing' residues. Consistently, the H1.1 NTD subtype that exhibits the weakest folding propensity, has the lowest positive charge concentration in the set and is interspersed with four 'helix-destabilizing' residues (95) (see Table 1 and Figure 2D). The helical-wheel analysis in Figure 2E  in amphiphilicity and charge concentration within the basic 'face' (green curve) in contrast to the unstructured H1.1.

H1 NTD also undergoes a disorder-to-order transition upon DNA binding
We now focus on determining if the charge-neutralizationinduced helical conformations observed above are indeed reproducible when charge repulsion among Lys in the H1 NTD is instead screened via DNA binding. For this, we start by using HADDOCK (81) to rigid-body dock a NTD configuration representative of either the second minima (for the H1.0 and H1.2 subtypes) or the first minimum (for the H1.1 subtype) from our PTMetaD-WTE simulations (on the neutralized peptides) to a 20-bp ds-DNA segment (see Materials and Methods). Before the docking, we restore the full charges on the NTD Lys. In the case of H1.0 and H1.2, we consider the Lys residues within the amphipathic face ( Figure 2C, E) as active residues for the docking interaction, in the case of H1.1 use K22, K23, K25, K26 and K27, as they are within a region with the highest charge density. Figure 3A shows the lowest energy HAD-DOCK cluster for the three subtypes. For the H1.0 and H1.2 subtypes, docking predicts the helix regions to fit snuggly within the DNA major groove (the four lowest energy clusters were similar for all three cases), facilitating interactions between the four Lys within the basic faces and DNA. Binding of helices to the major groove is amongst the most common DNA-protein interaction motifs with such interfaces present within helix-turn-helix proteins, zinc-binding proteins and leucine zipper proteins (96). In the case of H1.1 NTD docked configuration instead, there are only three basic residues (all within the unstructured region) that interact with DNA around the minor grove.
To assess the viability of the docked conformations and to investigate the implications of NTD differential folding across subtypes on DNA interactions, we run unbiased MD simulations of the DNA-docked NTD basic subregion systems with fully charged Lys sidechains. Our results confirm that charge-screening by DNA binding favours disorder-toorder transitions in all the H1 NTD subtypes studied. When interacting with the DNA strand, the induced helical configuration in the H1.0 NTD is stable across the 400 ns MD simulation (Supplementary Figure S12), with residues in the basic face (K17, K20, K21) of the helix accounting for the majority of interactions with the DNA (Figure 3B).
Simulations of the DNA-bound H1.2 NTD also show a prominent disorder-to-order transition upon DNA binding, with region 19 PVKKKAAKKAGG 30 retaining the stable Lys-rich helix throughout the course of the simulation (see Supplementary Figure S12). Here, the four Lys K22, K23, K26 and K27 residues fall on the same side of the helix, making two contiguous helical turns that interact strongly with the DNA (Figure 3B), mainly at the major groove. A secondary small set of residues ( 33 RKA 35 ) with helical properties can also be observed. However, this secondary motif is too short to form a full helix and correlates with less prominent NTD-DNA interactions.
Contrary to the neutral-Lys PTMetaD-WTE simulations, the H1.1 NTD also forms two short helices ( 17 KPLA 20 and 29 KAAAA 33 ) that are joined together by a Lys-rich unstructured loop ( 21 GKKAKKPA 28 ). Because the helices formed here are mainly devoid of charged residues, DNA interactions are instead mediated by the unstructured loop and the single charged residues at the edge of each helix. Enhanced fluidity of the binding region is consistent with the H1.1 NTD showing a reduced number of DNA-bound states (∼83% of the simulation frames) than the H1.0 and H1.2 cases (100%), where rather than focusing on the DNA major groove, the NTD promiscuously binds/unbinds the entire DNA strand ( Supplementary Figure S13). Although the specific DNA interaction patterns of the disordered H1.1 segment depend on the initial configurations generated by HADDOCK's rigid body docking algorithm, our results ( Figure 3B, Supplementary Figure S13) demonstrate a distinction between the promiscuous interactions of disordered segments and the stable interactions of structured segments.
To quantify the impact of this observed subtype-specific helical propensity on the protein's DNA binding ability, we took the two partially structured H1 NTD subtypes (i.e., H1.0 and H1.2) and, for each of them, independently simulated the binding process to a 20-bp ds-DNA strand by means of umbrella sampling simulations (see Methods). From the umbrella sampling simulations, we calculate the potential of mean force (PMF) as a function of the radial protein-DNA distance normal to the DNA's helical axis ( Figure 3C). The PMFs reveal that while the subtype with highest helical propensity, the H1.0 NTD, is stabilized by ∼20 kJ/mol (∼8k B T) upon DNA binding, the longer H1.2 NTD--with a decreased helical propensity--is stabilized by only ∼15 kJ/mol (∼6k B T) upon DNA binding. However, it should be noted that within the force fields, interactions between the oppositely charged amine and phosphate groups have not been explicitly calibrated (97). Thus, we repeated the PMF calculations with the Amberff14SB (98) and parmbsc1 (99) force fields together with the complete set of non-bonded fix (cufix) corrections (100). While the inclusion of the non-bonded amine-phosphate interaction corrections modified the free energy profile at short distances, the depth of the resulting well is equivalent (∼20 kJ/mol) (Supplementary Figure S14). These results suggest that a higher tendency of the H1 NTD subtypes to undergo a disorder-to-order transition (favoured by a shorter length and higher charge density) enhances their DNA binding propensity.
Finally, to further investigate the stability of the disorderto-order NTD transition, we performed metadynamics simulations of the basic subregion of the H1.0 NTD (with normally charged Lys sidechains) in contact with a 20-bp ds-DNA strand, starting from an unstructured NTD configuration. We used two CVs: the alpha helicity (S ␣ ) and the number of electrostatic contacts (S cont ) between the phosphate backbone and Lys/Arg sidechains (see Methods). Convergence was assessed using the time-evolution of the free-energy profiles (Supplementary Figure S15).
The simulations predict that upon DNA binding the NTD transitions from the unstructured configuration (S ␣ = 0, Figure 4) to a helical state; that is, the global minimum appears at S ␣ ∼5.8 and S cont ∼12.5, which corresponds to the helical peptide bound to DNA ( Figure 4B, Supplementary Figure S16). The resulting helix spans a larger region ( 13 PKRAKASKKST 23 in Figure 4C) than that predicted by our simulations with neutralized Lys sidechains and no DNA ( 17 KASKKS 22 in Figure 2B), suggesting that DNA binding effects are stronger than simple charge neutralization in triggering a disorder-to-order transition of the NTD. In addition, the interaction between the NTD and DNA emerges as non-specific with homogeneous contacts across the DNA strand (Supplementary Figure S17).

Implications of NTD disorder-to-order transition on H1 nucleosome binding
After having determined that the NTDs of H1 undergo a subtype-specific disorder-to-order transition upon DNA binding, we explore the implications of this structural transition for H1-nucleosome binding. We first verify if a similar transition is observed when the H1 NTD forms part of a full H1 protein (with both globular and CTD domain) that is bound to a nucleosome. For this, we compare the behaviour of an isolated H1.0 NTD peptide to that of the NTD of H1 in our recent 5-s long Biased-Exchange Metadynamics simulation of the full H1.0 bound to a 211-bp chromatosome (29) (SI Discussion, Supplementary Table S4). For a direct comparison to the nucleosomal results, we additionally repeated the 'neutralized' PTMetaD-WTE simulations with the full length H1.0 NTD (in-contrast to only the basic subregion discussed in the preceding sections). Figure 5B shows the per-residue structural propensity of the NTD within the nucleosome using multiple forcefields (Amber99SBildn, Amber ff03ws, Charmm36M (Supplementary Table S5)) and compares the results to those obtained from neutralized H1.0 segments. For all systems, the NTD displays a significant tendency to transition into a helical state. In comparison to the modelling of the shorter basic subregion (cyan, Figure 5B), modelling the entire NTD (dotted line, Figure 5B) allows the formation of a larger helical motif.
Within the nucleosome-bound systems, regardless of the force field, the helical motif is formed by the residues 12 KPKRAK 17 and mediates interactions with DNA (Supplementary Figure S18). Upon nucleosome binding, the exact region where the helix forms is shifted slightly away from the point of attachment of the NTD to the globular domain, when compared to the helix in isolated NTD peptide. This difference may be plausibly attributed to the NTD within the nucleosome-bound system being subjected to additional configurational restraints imposed by its attachment to the globular domain of H1.
What is the meaning of these results for the binding of H1 to the nucleosome? Several different classifications for the relative nucleosomal binding affinities of the various H1 subtypes have been proposed (see Table 2). Despite variance across experimental methods, most studies have classified H1.0 as possessing the highest nucleosome binding affinity, followed by H1.2 and finally H1.1 (see Table 2). This trend is also consistent with H1.0 subtype overexpression being associated with quiescent chromatin/repressed gene expres- Table 2. Summary of the experimental nucleosomal binding affinities of the linker histone subtypes. Experiment 5 only ranked the subtypes into two sion (101), and the H1.2 and H1.1 subtypes localizing in transcriptionally active euchromatic regions (16). Remarkably, the observed binding affinities vary congruously with the ability of the different H1 NTD subtypes to form helical conformations upon charge-neutralization. Hence, it is plausible to propose that the NTD contributes to the differential nucleosomal binding of the subtypes through differences in its conformational preferences when within the nucleosome. That is, the higher nucleosome binding affinity of the H1.0 subtype might be in part due to the H1.0 NTD being able to form a stable amphipathic helix that increases the stability of the DNA-NTD complex, helping 'glue' the H1 to the nucleosome. Conversely, the longer and less-positive H1.1 NTD is disordered, which in-turn might result in weaker, fluctuating, and promiscuous interactions with the nucleosomal DNA ( Figure 3B). Our simulations thus suggest a molecular mechanism contributing to explain the differential nucleosome binding affinities of the H1 subtypes.

CONCLUSIONS
We have used a combination of microsecond-long T-REMD, PTMetaD-WTE and BEMD simulations to sample the conformational landscape of the NTD of H1 subtypes H1.0, H1.1 and H1.2. When free in solution, our simulations predict that all three NTDs would predominantly adopt unstructured but partially collapsed conformations. Interestingly, when the Lys sidechains are neutralized either artificially by increasing the pH or physiologically by binding to DNA, the simulations show that reduced inter-sidechain repulsion is expected to be sufficient to induce a subtype-specific disorder-to-order transition. However, DNA-binding has a stronger effect than simple chargereduction at triggering such transition. The ordered NTD structures in subtypes H1.0 and H1.2 are amphipathic helical conformations that orient the basic residues of the NTD along one 'face' of the helix, facilitating electrostatic interactions with the DNA major groove along that face which drives the peptide to a folded conformation acting as an 'anchor' between the DNA and H1 as previously hypothesized (102). By combining docking, unbiased MD simulations, and metadynamics simulations we show that the NTD conformational preferences translate into different patterns of DNA/NTD interactions and, hence, a differential ability of the different subtypes to mediate interactions of the H1 and the nucleosome.
Our simulations further reveal that the different charge density per unit length, the total length of the NTD, and interspersing of 'helix-breaking' Pro/Gly residues across subtypes, give rise to different degrees of helicity of the NTD; thereby, shorter charge-dense positive peptides devoid of Pro/Gly residues (H1.0 and H1.2) emerge as optimal for helical folding. Importantly, the varying degree of helicity upon DNA binding correlates with the experimentally observed binding affinities of the specific H1 subtypes, and the differences in the free energy stabilization of the various DNA-NTD complexes, with higher affinity subtypes exhibiting more prominent helical conformations. Specifically, upon charge neutralisation, the strongest binding H1.0 subtype forms a single contiguous helix spanning four turns, while the weakest binding H1.1 displays no helicity. This computationally predicted structural mechanism correlates with the experimentally observed effects of the H1.0/H1.2 NTD domain-swap that resulted in an exchange of their binding characteristics (32).
Our simulations postulate how subtle sequence differences in chromatin binding proteins can have a significant impact in their structural behaviour, and subsequently, their nucleosome binding affinity. Our results suggest a mechanism by which the structural behaviour of short chargedrich disordered domains of chromatin binding proteins could be crucially transformed after nucleosome binding. Prospectively, our observations can be exploited in the design of novel experiments to assess the structural preferences of the NTD of H1 in vitro and enhance our molecular understanding of the link between H1 NTD structure and the different roles of the various H1 subtypes in vivo (9). Our work also suggests that charge-reducing post-translational modifications like Lys acetylation (38,41) and Arg citrullination (103) can strongly impact the structural behaviour of the histone's terminal domains, especially those that are short and highly enriched in positively charged amino acids.

DATA AVAILABILITY
The initial configurations, temperature distributions,docking results, PLUMED input files and GRO-MACS .mdp are available at the Cambridge Research repository (https://doi.org/10.17863/CAM.46225).

SUPPLEMENTARY DATA
Supplementary Data are available at NAR Online.