Quantifying the two-state facilitated diffusion model of protein–DNA interactions

Abstract The current report extends the facilitated diffusion model to account for conflict between the search and recognition binding modes adopted by DNA-binding proteins (DBPs) as they search DNA and subsequently recognize and bind to their specific binding site. The speed of the search dynamics is governed by the energetic ruggedness of the protein–DNA landscape, whereas the rate for the recognition process is mostly dictated by the free energy barrier for the transition between the DBP’s search and recognition binding modes. We show that these two modes are negatively coupled, such that fast 1D sliding and rapid target site recognition probabilities are unlikely to coexist. Thus, a tradeoff occurs between optimizing the timescales for finding and binding the target site. We find that these two kinetic properties can be balanced to produce a fast timescale for the total target search and recognition process by optimizing frustration. Quantification of the facilitated diffusion model by including a frustration term enables it to explain several experimental observations concerning search and recognition speeds. The extended model captures experimental estimate of the energetic ruggedness of the protein–DNA landscape and predicts how various molecular properties of protein–DNA binding affect recognition kinetics. Particularly, point mutations may change the frustration and so affect protein association with DNA, thus providing a means to modulate protein–DNA affinity by manipulating the protein’s association or dissociation reactions.


INTRODUCTION
DNA-binding proteins (DBPs) possess a remarkably efficient ability to search and recognize their specific binding sites embedded within the genomic DNA. Early experiments showed that DBPs can find their target site at a rate two-orders of magnitude faster than the 3D diffusion limit (1). Following these results, it was suggested that DBPs accelerate the search for their target site via a mechanism of facilitated diffusion in which 1D diffusion alternates with 3D diffusion during the search (2,3). 1D diffusion itself comprises two distinct search modes, sliding and hopping, which differ in the degree to which translocation and rotation along the DNA are coupled as well as in the dependency of their corresponding diffusion coefficients on salt concentration (4)(5)(6). The understanding that DBP search of DNA proceeds through a combination of diffusions in different dimensional spaces is referred to as the facilitated diffusion model. This model is well supported by numerous experimental and theoretical studies and is able to explain the high target association rates observed in vitro and in the cell (7)(8)(9)(10)(11)(12)(13).
The biophysical characteristics of DNA search may depend on the molecular properties of the searching proteins. For example, the dimensions of the protein, its oligomeric state, electrostatic potential and degree of flexibility may affect the relative usage of the different search modes (14)(15)(16)(17)(18)(19). The DNA sequence may also affect search speed by producing different DNA geometries and consequently affecting the protein's ability to interact with the DNA major groove and thus its ability to perform coupled rotationtranslation diffusion (20,21). The energy landscape for 1D diffusion along DNA may also be affected by the DNA sequence, which can affect the ruggedness of the potential energy landscape and, consequently, friction in protein-DNA interactions (9,22,23).
The 1D diffusion coefficient of proteins on DNA has been expressed as D = G(T,R,η)·F(σ ), where G is a term that depends on the temperature, T, the size of the protein, R, and the viscosity of the solution, (10). F is a term that represents the ruggedness of the potential energy landscape. Assuming that the ruggedness of the potential energy follows a Gaussian distribution, then F(ε) = exp[-(σ /k B T) 2 ], where σ denotes the variance in the protein-DNA sliding potential and is related to the average energetic barrier for sliding (9,10,23). It was estimated that the ruggedness of the protein-DNA landscape must be low (σ < 2k B T) to achieve reasonable association rates (9).
Although many studies aimed to biophysically characterize the mechanism of facilitated diffusion that proteins adopt when searching for their target sites (2,7,(24)(25)(26)(27)(28)(29)(30)(31), it is clear that the kinetics of many protein-DNA recognition interactions does not depend solely on the search speed. Recognition requires not only finding the target site, but also specifically binding to it. The time-scale for specific binding may affect the kinetics of protein-DNA recognition because finding the site does not guarantee immediate binding. The discrepancy between finding and binding the target site is related to the different types of interactions used for the search and recognition modes. In the search mode (designated here as the S state), a DBP interacts with DNA non-specifically through mostly electrostatic interactions between positively charged protein residues and negatively charged phosphates on the DNA backbone (32,33). Specific binding upon reaching the target site requires the protein to switch to its recognition mode (designated here as R state) by forming sequence specific contacts with the DNA, supported by hydrogen bonding between the protein residues and the DNA bases (34). Thus, the full search kinetics involves not only searching in the non-specific binding mode, S, but also switching to the specific binding mode, R, upon recognition of the cognate binding site (11,23,(35)(36)(37)(38)(39). It was argued that the existence of the S and R binding modes is necessary to solve an apparent conflict between speed and stability whereby the conditions for fast search (which requires σ < 2k B T) are incompatible with stable protein-DNA interactions (which requires σ > 5k B T) (9,23). This two-state model is supported by various experimental approaches (including X-ray crystallography, NMR and single-molecule techniques) indicating that numerous DBPs adopt different conformations for their specific versus non-specific interactions with DNA (8,14,23,(40)(41)(42)(43)(44). The existence of two states can also be inferred from the low roughness for sliding that was found for various proteins (10,45), which is unlikely for the R states.
The existence of the S and R binding modes suggests the presence of an energetic barrier governing the transition between them and therefore a separation of time scale between finding and binding the target site. This implication is supported by single-molecule experiments conducted in living cells on the lac repressor, which displays facilitated diffusion characterized by a low sliding energy barrier of ε ∼ 1.0 k B T and a scanning length of 45 ± 10 base pairs for each 1D search round (8). Surprisingly, the lac repressor was found to slide numerous times over its promoter before target recognition is achieved, which reflects the existence of a barrier that needs to be overcome for specific recognition to occur.
Despite being insightful, the two-state model for protein-DNA recognition is still not quantitative. In particular, the interplay between the speed of sliding (via the S mode) and the recognition rate (transition from the S to R mode) is unclear. While some crystal structures of protein-DNA complexes reveal conformational changes associated with the transition from non-specific to specific binding (32,33,46), other structures indicate high similarity between the nonspecific and the specific binding modes (47,48). The latter scenario describes reactions with much smaller barriers for recognition in comparison with the former scenario. One may therefore ask: what are the kinetic consequences for sliding dynamics and what is the overall kinetics when the The degree of similarity between the search and recognition binding modes (designated as S and R, respectively) is governed by the extent of overlap between the surface patches the DNA-binding protein uses to interact with non-specific and specific DNA residues. Non-specific binding is dictated by electrostatic interactions (indicated by the blue ellipse), whereas specific binding is dictated by a set of hydrogen bonds that can be identified from the X-ray structures (indicated by the green ellipse). The overlap between the S and R binding modes is quantified by the similarity index, χ . A smaller overlap between the nonspecific and specific patches (i.e. χ << 1) corresponds to high frustration and suggests that the protein needs to undergo a conformational change (which potentially involves a change in the DNA conformation as well) to make the S → R transition and so bind its DNA target site (left). A high value of χ (∼ 1) reflects less frustration between the S and R states and suggests that these two states are very similar (right). barriers between the S and R binding modes are small? Furthermore, is the rate limiting step in protein-DNA recognition the search kinetics or the transition barrier from S to R?
In recent studies, we introduced the concept of molecular frustration denoted between the non-specific and specific protein-DNA binding modes (40,41,49). Frustration is quantified as the degree of overlap between the protein surface patches that are used for the S and R binding modes. For a given protein, the S mode is represented by the largest positively charged patch on the protein surface and the R mode is given by its X-ray structure. Greater frustration corresponds to a smaller overlap between the S and R binding modes ( Figure 1). Frustration between the protein residues forming specific versus non-specific contacts with the DNA creates means that the two binding modes represent different energetic and conformational states, so creating the two-state model (40,41). It was shown that numerous DBPs, and particularly enzymes, have a medium to high degree of frustration between their S and R binding modes. Furthermore, coarse-grained molecular dynamics simulations qualitatively showed that frustration strongly influences the kinetics of the protein-DNA search process, such that high (low) frustration is associated with fast (slow) sliding but a poor (high) target recognition probability (41).
In this paper, we show how frustration links together two of the most prominent aspects of the protein-DNA recognition process, namely, the kinetics of finding the target site via the facilitated diffusion mechanism and the kinetics of binding the target, which depend on the target recognition probability. We extend Slutsky and Mirny's (9) facilitated diffusion theory by introducing the concept of frustration between the non-specific and specific binding modes and its effect on DNA recognition kinetics. Utilizing the new theoretical model, we elucidate and enable quantification of various factors influencing protein-DNA recognition kinetics.

Theoretical model
We study the kinetics of protein-DNA recognition by building on Slutsky and Mirny (9,50) facilitated diffusion theory by introducing explicit terms for the S to R transition. In the framework of this theory, a single DBP searches for its target site on a long DNA of M bps through rounds of 1D and 3D diffusion. The mean durations of the 1D and 3D diffusion are τ 1D and τ 3D , respectively. The number of 1D and 3D rounds needed to find the target site can be estimated as M/<n>, where <n> is the mean number of sites scanned in each 1D round (i.e. n = 2 √ D 1D τ 1D , where D 1D is the diffusion coefficient of sliding). The mean total search time is then: The expression in Equation (1) assumes that the target is recognized (i.e. S to R transition occurs) in the first search round that visits the cognate site. However, it is possible for the DPB to miss the target site and then additional 1D and 3D rounds might be needed. Accordingly, the total search time, τ S , must be multiplied by 1/P f , where P f is the probability of recognizing the target site and, thus, the efficiency of the search. Given that, in a single round of 1D diffusion, the protein covers ∼n sites and makes n 2 steps, each site is revisited ∼ n times. Thus, the overall probability of the DPB locating the target site, once the protein associates inside a region of size ∼n that contains the site, is P loc = min [1, <n>·P f ]. The total recognition time is therefore estimated by: The value of P f depends on the rate, k res , to move a single step while searching using the S state (i.e. high k res implicates small residence time and will reduce the probability for S → R transition) and the transition rate, k S→R , for switching from the S to the R state. P f thus can be expressed by: The transition rate, k S→R , is dictated by the energy barrier G ‡ s→R between the two binding modes ( Figure 2): Where 0 represents the characteristic time to move between non-specific site. It is suggested that for very small energy barrier (i.e. G ‡ s→R ∼0), the kinetics of target recognition will be dictated by the diffusion speed. We can define k res as the inverse of the average time the protein spends on a given DNA site during 1D sliding (i.e. k res = τ res −1 , being the inverse of the residence time in the S state and is obtained by having n = 1 in the relation n = 2 Figure 2B): The diffusion coefficient for protein sliding on a rugged DNA potential energy surface is expressed by: Where τ 0 expresses the typical time it takes the protein to hop to a neighboring site. For a protein undergoing spin coupled diffusion (51): Where R is the protein's radius, R OC is the distance between the center of mass of the protein and the DNA, BP is the distance between two base pairs along the DNA axis and η is the solution viscosity. In our model, we assumed R OC = R and a protein radius of R = 3 nm. The average amount of time the protein stays bound to DNA for each 1D search round is dependent on the non-specific protein-DNA binding strength, which is given by the inverse of the dissociation rate, k off , of the protein from non-specific DNA site obtaining: Where E ns denotes the non-specific binding energy and is mostly governed by the electrostatic interactions between the positively charged residues and the negatively charged backbone of the DNA. The energetic contribution of state S may include a contribution from the formation of semispecific interactions between the protein and sequences that share some similarity to the target site. The formation of such occasional hydrogen bonds that depend on the DNA sequence is captured by the roughness of the protein-DNA energy landscape, σ (Figure 2A). Combining the above equations with the diffusion law,n = 2 √ D 1D τ 1D , the τ R term can be expressed in terms of σ , E ns , G ‡ s→R , τ 0 and T. We note that in our model, the transition S → R can take place at any DNA site although it is likely that some unknown DNA features may support the transition state of the S → R transition.
In this study, we argue that the roughness of the protein-DNA energy landscape for sliding, σ , and the energetic barrier for the transition between the search and recognition modes, G ‡ s→R , are linked. For example, a protein having a high probability of interacting with semi-specific sites while in the S state is expected to transition faster from state S to R, simply because the S state already shares some similarity to the R state even prior to the transition. Accordingly, σ and G ‡ s→R are expected to be anti-correlated. Namely, the S and R states are linked and the relationship between them can be quantified by the degree of frustration. The molecular frustration between the S and R binding modes can be quantified through the degree of overlap between the nonspecific and specific binding patches on the DBP (Figures 1 and 2) (40,41). As the degree of overlap between the S and R binding modes (also known as the similarity index, χ ) increases, additional residues may interact with DNA; thus σ increases. To model the relationship between G ‡ s→R and χ , we use a coarse-grained model to simulate the rate of binding a target site for a set of six DBPs having varying χ values. The kinetics of recognizing the binding site can be estimated to depend exponentially on χ (see Supplementary Data). We therefore define linear relationships for σ and G ‡ s→R with the overlap or similarity index χ and obtain: Where σ max and G max are the maximal ruggedness and transition energy barrier, respectively. A schematic representation of the proposed model is shown in Figure 2. The maximal sliding roughness, σ max , corresponds to the roughness in the recognition mode, R, in which all the protein residues participate in specific contacts with the DNA. Sufficient stability at the target site can be achieved for σ max ∼ 5k B T and therefore this value is used throughout this study (9). An estimate of the value of G max is, however, less straightforward as it can be influenced by various factors, including loss of non-specific contacts and strain, such as bending and deformation upon formation of the DNAprotein complex. We therefore evaluate the influence of various G max values on the target recognition rate. We note that the search time of the target site by DBPs may be in-fluenced by other factors not included in the current study such as roadblocks and crowders (12,13,52), the geometric properties of the searched DNA and the locations of the target genes (25,(53)(54)(55), DBP's concentration and sequence effects of the DNA (20,39,56). Furthermore, the search kinetics might be characterized not only by the mean search time but also by the full distribution of reaction times (57,58). Despite the simplicity of the model, it quantitatively highlights the tradeoff between the search and recognition kinetics.

Calculation of the frustration between the two-state binding modes
The similarity between the S and R states is estimated as the overlap between the binding modes. The residues that interact with DNA in the R mode can be obtained from crystal structures and those that interact in the S mode are linked to a positively charged patch on the protein surface. To estimate the overlap between S and R, we first define χ i for each protein residue forming specific contacts with DNA in the complex according to the equation: Where j denotes any protein residue closer than a cutoff distance of r c = 8Å to residue i; q j is the point charge of residue j and takes values of −1, 0 or 1; r i j is the cutoff distance between residues i and j; and a = 5 is an exponential decay constant. We then average all χ i values of the protein to obtain its similarity indexχ . The values of χ lie between (−1) and (+1) and depend on the parameters a and r c . The larger the value of χ (i.e. the lower the frustration) the greater the similarity between the S and R states. The similarity indices can be evaluated by considering all the positively charged residues in the structure or by selecting those that support the S state. The latter subset can be elucidated from coarse-grained simulations that were applied to study the sliding of proteins along DNA (4,5). We find strong correlation between the values of χ calculated using these two approaches.

Trade-off between sliding rate and recognition rate
We first address how frustration (which impedes a protein's ability to switch from its searching mode to its recognition and binding mode) influences the different components of target recognition kinetics; namely, 1D diffusion and S → R transition. Such frustration is associated with a positive effect on the search kinetics by reducing the energetic roughness for sliding, σ , and maximizing the number of BPs scanned in each 1D search round. Figure 3A plots the elapsed time for a single 1D search round as a function of the similarity index (being negatively correlated with frustration) for three different non-specific binding energies. Overall, increased frustration (i.e. decreased χ ; Figure 3A) reduces the amount of time spent in each 1D search round by reducing the number of energetic traps for the searching protein on the DNA (i.e. σ is smaller for more frustrated interfaces, Equations 6 and 9). In addition, the 1D search time is highly sensitive to the magnitude of the non-specific Frustration also has a negative effect on the search kinetics by increasing the transition time from the S to R binding mode and thereby decreasing the probability of the protein recognizing the target DNA sequence (Equation 10). The overall transition rate for a given protein also depends on the value of G max ( Figure 3B), which accounts for interaction variability in different protein-DNA complexes. For a moderate similarity index value of χ = 0.3, a two-order of magnitude increase in transition time is observed upon increasing the free energy barrier for recognition, G max , from 5k B T (red line) to 10k B T (green line). Figure 3C plots the total mean search time as a function of similarity index under three conditions: the two extreme search kinetics settings for a smooth sliding landscape (i.e. σ max = 0) and an immediate transition S → R (i.e. G max = 0). The figure shows that, for cases with low transition barriers ( Figure 3C, red line), the DBP favors high frustration and exhibits fast 1D diffusion. At the other extreme, where the protein lacks sliding roughness but the transition barrier is high, the protein favors low frustration ( Figure 3C, green line). In the presence of both an energy barrier and sliding roughness ( Figure 3C, black line), there exists an optimal frustration value at which both sliding and target recognition proceed at an adequate speed. The optimal frustration value between the S and R states may vary for different proteins depending on their electrostatic energy to bind DNA (i.e. E ns ) and the free energy barrier for recognition (i.e. G max ).
To understand better how the molecular properties of the recognition process affect its kinetics, we study the relationship between the non-specific (electrostatic) binding energy (E ns ), the magnitude of the transition energy barrier ( G max ) and the total recognition time (τ R ) for two different similarity index values (χ ) as a protein shifts between the S and R states. Figure 4A plots τ R as a function of E ns . Changing the value of E ns can be viewed as equivalent to changing the salt concentration, which is well known to tune the strength of non-specific protein-DNA binding. At high non-specific binding energies, the total recognition time is long ( Figure 4A). This is due to the sluggish 1D dynamics at high values of E ns (Equation 8). At low values of E ns , 1D scanning in each round of 1D and 3D search is less efficient, as the time spent in the sliding mode is shorter and Nucleic Acids Research, 2019, Vol. 47, No. 11 5535 consequently the number of scanned sites in each round is smaller as n = 2 √ D 1D τ 1D . As <n> decreases, the total recognition time, τ R , becomes longer (Equation 1). The dependency of the total recognition time on E ns , as obtained from our theory, therefore supports one of the hallmarks of the facilitated diffusion model regarding the existence of an optimal combination of 1D and 3D search modes (2,4,9,24). It can be seen that proteins with a high S to R transition barrier ( Figure 4A, red curves) prefer greater frustration, that is, a lower similarity index ( Figure 4A, solid versus dashed lines) in order to reduce the sliding barrier and maximize the number of target binding attempts required in each 1D search round. However, proteins with lower transition barriers (black line) do not require multiple target recognition attempts and therefore prefer lower frustration, that is, a higher similarity index ( Figure 4A, dashed versus solid lines). The effect of E ns on the recognition rate has been shown experimentally; increasing the salt concentration markedly affects the association rate while the dissociation rate is hardly affected (59). Figure 4B plots the total recognition time as a function of the maximal transition energy barrier, G max . The figure shows that, for a given non-specific binding energy, E ns , the recognition timescale reduces as G max decreases until the value of G max is low and the probability for S → R transition is very high (i.e. P f = 1). Figure 4B also shows that faster recognition is achieved for more frustrated proteins (lower similarity index; solid compared to dashed lines).

Frustration is linked with the diffusion coefficient for sliding
Our group has previously calculated molecular frustration for a dataset of 125 DBPs on the basis of the crystal structure of their complexes with DNA and with the goal of estimating the degree of frustration for various protein structures and functions (40,41). Here, we quantify the frustration indices for nine DBPs (Equation 11) whose linear diffusion coefficient, D 1D , was measured experimentally and whose crystal structures with DNA are known. Figure 5A shows pictorially the overlap between the R state (represented by the green spheres) and the S state (represented by the blue surface and corresponding to an electrostatically positive patch). As can be seen in Figure 5A, proteins with low frustration (top row) obtain high overlap (i.e. high χ values) and vice versa for proteins with low frustration (bottom row). The similarity index can explain the linear diffusion coefficient of different DBPs, because χ is linked to the roughness of the protein-DNA energy landscape, σ . Recently, experimental data have shown architectural binding proteins to be associated with relatively high energetic roughness compared with other DBPs (60). According to our model, we expect these proteins to be characterized by low frustration (i.e. high χ values). Figure 5B plots the frustration indices of these nine DBPs and the experimental σ values, which were derived from their corresponding D 1D values. The correlation between χ and σ , shown in Figure  5B, strongly supports our model regarding the linkage between frustration and sliding (Equation 9). It is shown that, indeed, architectural binding proteins are associated with high χ values. We therefore hypothesize that, while transcription factors are optimized for fast search and therefore Figure 5. Relationship between the ruggedness of the protein-DNA energy landscape and the frustration between the S and R states. (A) Schematic illustration of the electrostatic potential surface (color bar on the right) and residues that are found to participate in specific contacts with the DNA (green beads) for several proteins whose linear diffusion coefficients were measured experimentally. The calculated similarity index, χ , is shown below each structure. (B) Correlation between the calculated similarity index and the experimentally estimated roughness of the energy landscape (σ ) obtained from the linear diffusion coefficient (60). The three DNAbinding proteins (DBPs) that are marked in red were classified as architectural binding proteins. The correlation between χ and σ suggests that greater energetic barriers to sliding are related to low frustration (high χ ) and also points to the molecular difference between architectural and nonarchitectural DNA-binding proteins. high frustration (low χ value), architectural binding proteins are less frustrated and diffuse more slowly, as required for their function.

Frustration can explain the effect of mutations on recognition kinetics
The effect of point mutations on the kinetics of the S → R transition can be predicted by the frustration between these two states. Experiments measuring the association and dissociation rates of mutated proteins are routinely performed to shed light on the mechanism of protein-DNA kinetics (59,61,62). Generally, these experiments measure the binding kinetics of DBPs with short DNA oligonucleotides containing the target site and therefore they do not involve significant 1D sliding on DNA. Accordingly, we postulate that mutations that change the level of frustration will also modify the protein-DNA association rates by shifting the transition energy barrier, G S→R . The transition barrier is expected to increase for larger frustration (i.e. for lower χ ; Equation 10).
To test our hypothesis, we calculate the change in the similarity index for six different mutants of the p53, TUS and λ Figure 6. The effect of mutations on the kinetics of DNA recognition can be explained by a change in the frustration. The kinetics of binding to a specific site on DNA was analyzed for six mutants of three different proteins (P53, TUS and λ repressor proteins). The electrostatic potential of each of the three proteins is shown. In each protein, the residues that participate in specific contacts with DNA (green beads) as well as the mutated sites (yellow and magnetite beads) are shown. The mutants are analyzed in terms of the change in the similarity index, χ , in comparison with the wild-type protein and with respect to the association rate relative to the wild-type protein, k a Mut /k a WT . Association rate values are taken from (59,66,68). The PDB codes used to calculate the similarity index: P53RF (4MZRa), P53wt (3Q05), TUS (1ECR) and λ repressor (1LMB).
repressor proteins and for the corresponding wild-type proteins. Then, the experimentally measured association rates are compared to the change in χ . The results are summarized in Figure 6. For the p53 protein, we studied frustration in the R248Q mutant, which has been shown to be inactivate (63,64), and for the p53FG mutant, which contains two substitutions, S121F and V122G in the Loop L1 and is associated with high activation rates (65,66). The similarity index of the wild-type p53 is χ = 0.26 whereas that of R248Q is lower (indicating increased frustration) at χ = 0.21 and that of the p53FG mutants is higher at χ = 0.32. These results support our hypothesis by suggesting that inactivation of the R248Q protein results from an increased energetic barrier for the S → R transition, which is manifested in a 19% decrease in the similarity index. The increased activation rates of the p53FG mutant can be attributed to a decreased barrier, which is characterized by a 23% increase in the similarity index.
Similar results are obtained for the TUS and λ repressor proteins. In the case of the TUS protein, both the K89A and the R198A mutants are characterized by decreased association rates (59) and accordingly show frustration indices decreased by 20% and 43%, respectively. The λ repressor K4Q mutant is characterized by a significant reduction in activity (67,68) and a decrease in the similarity index of 28%, accordingly. In contrast, the E34K mutant, which is a secondary mutant that also contains the K4Q mutation, is characterized by a substantial increase in the similarity index from 0.13 for the K4Q mutant to 0.22 for the E34K mutant, which corresponds to a 22% increase with respect to the wild-type and is accordingly associated with an in-creased association rate. In all of the above cases, apart from the p53FG mutant, the mutations are associated with substitutions of charged residues that alter frustration by influencing the degree of overlap between specific protein-DNA contacts and the positively charged residues. In the case of the p53FG mutant, the variation in frustration is a result of modification to specific protein-DNA contacts.

CONCLUSIONS
In this study, the facilitated diffusion model of DNA search by proteins for their target sites is revisited in order to consider explicitly not only the timescales for finding the target sites, but also the timescales for recognizing that site after it was identified. We show that these two processes of finding and binding the target sites are coupled via protein frustration between the non-specific and specific binding modes (i.e. the search, S, and recognition, R, states). Low frustration means that the non-specific and specific binding modes of the protein to the DNA are very similar, so recognition can occur relatively easily after the target is identified. Nonetheless, low frustration suggests hindered 1D diffusion because semi-specific protein-DNA interactions can be formed with higher probabilities. Accordingly, low frustration defines a low barrier for recognition but a more rugged protein-DNA energy landscape. High frustration, on the other hand, implies that a conformational change will be required during the transition from non-specific association with the specific complex. Indeed, in many protein-DNA complexes, either or both the protein and the DNA change conformation to affect specific binding (32)(33)(34). Our theory predicts that the magnitude of the conformational change depends on the similarity index. Consistently with this perspective, an NMR study of homeodomain HoxD9 that reported high similarity between the non-specific and specific binding modes (47,69) can be explained by the high similarity measure (i.e. low frustration) of this system (χ = 0.37). The experimentally estimated D 1D of HoxD9 (70) corresponds to an energetic ruggedness of ∼2k B T, again in agreement with low frustration. Lac repressor, on the other hand, is characterized by a lower similarity index (χ = 0.08) that explains the lower energetic ruggedness of <1 k B T for its 1D diffusion as was deduced experimentally (10). This high frustration may result not only in faster 1D diffusion but also in high transition barrier that may cause to unsuccessful recognition after finding the target site, as was indeed concluded experimentally for the lac repressor (8).
The concept of frustration between the S and R binding modes is utilized to successfully predict several experimentally measured phenomena. We show that the degree of frustration explains differences in the linear diffusion coefficients for sliding along DNA and thus provides a molecular interpretation for the experimentally reported barrier to sliding (10,60). In particular, we argue that architectural proteins slide along DNA with a smaller diffusion coefficient due to lower frustration. Furthermore, the concept of frustration has practical implications in predicting variations in recognition kinetics, particularly the effect of point mutations on recognition rates. Although mutations that change specific hydrogen bonds with the target DNA sites Nucleic Acids Research, 2019, Vol. 47, No. 11 5537 often affect the dissociation rates (i.e. k off ), some mutations that involve positively charged residues are found to affect the association rate (i.e. k on ). The latter can be rationalized as a change in the molecular frustration between the S and R states upon mutation. This also implies that the protein-DNA binding affinity can be modulated via k on and not only via k off , as is often found. Our study explains the origin of the experimentally observed changes in association rate with salt concentration while the dissociation rates are hardly changed (59). Changing the energy of the specific complex (for example, by mutating the DNA target site) is expected to have a much greater effect on the dissociation rates than on the association effects, as was reported earlier for several proteins using comprehensive kinetic measurements (61).
Our study illustrates that there is a trade-off between the speed of sliding diffusion during the search process and the kinetics of the transition from the search mode to the recognition mode. Accordingly, our model suggests that the speed of sliding dynamics and the speed of recognition are tightly coupled. Most importantly, this trade-off is modulated by frustration between the non-specific and specific protein-DNA interactions and therefore there is an optimal degree of frustration that minimizes the total time for recognition (i.e. search plus recognition). The concept of frustration suggests that it is not only the search process that dictates the overall recognition time but also the energetic barrier for the specific complex. The magnitude of the trade-off depends on various parameters of the protein-DNA system such as the nature of the conformational change and the protein structures (e.g. existence of multi domains (14)). Finally, we show that our model is supported by experimental results, which find a strong correlation between the similarity index and the kinetic properties of protein-DNA recognition. Our model advances understanding and quantifies the relationship between protein characteristics and the facilitated diffusion mechanism and thus can be utilized to assist in the design of mutations to engineer and control the kinetic search process of DBPs.