What matters for lac repressor search in vivo—sliding, hopping, intersegment transfer, crowding on DNA or recognition?

We have investigated which aspects of transcription factor DNA interactions are most important to account for the recent in vivo search time measurements for the dimeric lac repressor. We find the best agreement for a sliding model where non-specific binding to DNA is improbable at first contact and the sliding LacI protein binds at high probability when reaching the specific Osym operator. We also find that the contribution of hopping to the overall search speed is negligible although physically unavoidable. The parameters that give the best fit reveal sliding distances, including hopping, close to what has been proposed in the past, i.e. ∼40 bp, but with an unexpectedly high 1D diffusion constant on non-specific DNA sequences. Including a mechanism of inter-segment transfer between distant DNA segments does not bring down the 1D diffusion to the expected fraction of the in vitro value. This suggests a mechanism where transcription factors can slide less hindered in vivo than what is given by a simple viscosity scaling argument or that a modification of the model is needed. For example, the estimated diffusion rate constant would be consistent with the expectation if parts of the chromosome, away from the operator site, were inaccessible for searching.


INTRODUCTION
The DNA sliding model for target search was originally formulated to explain the observed rapid rate of binding of lacrepressor from Escherichia coli to its operator site on DNA, which was estimated to be ca 100-fold faster than the expected maximal diffusion-limited value (1)(2)(3)(4) in vitro at low salt concentrations. In the model, the protein finds its target site on DNA through a combination of 3D diffusion and 1D diffusion (sliding) along the DNA contour. The main effect is that the target size is effectively extended from one base pair to the distance that the protein can slide during a non-specific binding event. The model was successful in explaining the high association rate as well as its dependence on the salt concentration, non-specific binding constant and DNA-chain length (5)(6)(7)(8). Since its original formulation, the sliding model has been revisited and extended  and also complemented with informative simulations at levels of detail ranging from atomistic (36,37) to coarse grained (13,17,21,(38)(39). This has not only generated new insight into e.g. the role of water molecules, condensed ions and DNA conformation during the sliding process but has also stimulated new hypotheses related to search kinetics such as the role of disordered tails in the search (21,38) or the role of multiple DNA binding domains (13,40).
When confronted with in vivo data (41), the basic model, which was developed to describe an in vitro situation, may need to be extended with features found in the in vivo situation. The search problem involves finding a single (or a few) specific site(s) among a vast excess of non-specific ones. The search efficiency will depend on the speed with which the repressor can transfer from one potential site to another and several different mechanisms have been suggested. There are in principle four different ones, correlated or uncorrelated with or without dissociation from DNA. In the correlated search the repressor will search sites nearby on the contour and in an uncorrelated one it will search between random locations. While the repressor remains nonspecifically bound it can slide in a 1D diffusion along the DNA contour and test correlated nearby sites, or it can be transferred directly to an uncorrelated DNA segment (intersegment transfer) (5,19,42,43) that is close by in 3D space. Similarly, after a dissociation event, the repressor can rebind at a correlated site nearby (hopping) (5,8) or it can rebind to a different segment (intersegmental jump) (22,31). In addition, there is the possibility that the repressor does not recognize its specific site once it has been found. A specific recognition step is needed, for example, to explain the ca 20% difference in rate between the natural operator O 1 and a synthetic operator sequence O sym (41). This recognition step is equivalent to introducing a two-step model where LacI is in equilibrium between a non-specific and a specific binding mode conformation (6,29). involved in our model. In the MC simulation scheme, following a dissociation the searching protein can associate with rate k a to a non-specific stretch of DNA given that no other proteins (roadblocks) are in the way. Given a successful non-specific association the protein may diffuse in 1D along the length of the DNA or perform intersegment transfer with rate k IST . After a number of non-specific association-slidingdissociation iterations during the total time , the searching protein finds its way to the specific binding site and binds with rate constant k sp .
Here, we evaluate how these different factors interplay and, in particular, which of the many parameter combinations are more likely to explain the in vivo search time measurements for the lac repressor dimer binding its symmetric operator O sym . The measurements that we seek to account for are the total search time as well as the reduced association rate per operator when two operators are close or when a single operator is flanked by other proteins at specific distances (41). Specific questions that we address are whether there are any physically reasonable sets of parameters for which the model can explain the observed in vivo measurements and whether hopping and intersegmental transfers contribute to the search efficiency in this parameter range.

The model
We build on the classical sliding model (2)(3)(4)(5)8) (Figure 1), where the DNA is considered as a smooth cylinder and the protein is a fully reactive sphere. In the Supplementary material the classical model is extended to include roadblocks, i.e. non-specifically bound proteins to DNA, and a specific recognition step at the specific operator site. The reaction radius ρ is the sum of the radii of the cylinder and the sphere. Non-specific protein-DNA binding is described by the reaction radius ρ, a relative diffusion rate D 3 and the extent of diffusion control, α (Equation (3)). All steric effects in the binding are assumed to be incorporated in the parameters α or ρ. Beyond the distance R c from the DNA axis, where it is equally far to another DNA segment, the protein is assumed to have lost its correlations with a particular DNA segment and will be equally likely to bind anywhere on the DNA. If the number of accessible non-specific binding sites on DNA is M acc with each base pair having a length and where the genome is confined to a volume V c , R c can be determined from This corresponds to a regular distribution where competing DNA segments are considered as lined up in parallel at a mutual distance of ca 2R c . In the Supplementary material we show that very similar results follow from the assumption that the DNA segments are homogeneously distributed throughout the volume V c .
A non-specifically bound protein will dissociate with the (microscopic) rate constant λ, which leaves the protein in solution at the reaction radius r = ρ. The motion of the protein after making a microscopic dissociation is described by diffusion in cylindrical coordinates, with a radiation boundary condition at the reaction radius (44) describing a rebinding event, and an adsorbing condition at r = R c . κ is a measure for the reactivity by which the protein will adsorb or bind to DNA at contact. Solving the diffusion equation for the concentration c(r) with these boundary conditions the total return probability to the same DNA segment is given by (3)(4)(5)8) The parameter α serves as a measure for the extent of diffusion control of the binding reaction and is defined as Here k = has been introduced such that k is a proper bimolecular association rate constant per base pair; it is the macroscopic rate constant the protein would have if it was diffusing infinitely fast. For α >> 1, the non-specific association is significantly limited by the diffusion rate, i.e. it binds non-specifically at the first contact.
In Equation (3), the fraction 1−φ 0 of microscopic dissociations goes beyond r > R c and will eventually rebind at a random position on the DNA. Thus the macroscopic nonspecific dissociation rate can be defined as To give the same equilibrium constant as the microscopic rates, the macroscopic association rate constant, k a , must satisfy k a /k d = k/λ. Thus, This is the average association rate constant for a protein starting anywhere in the solution.
At equilibrium the fraction F B of proteins that will be bound is determined by the non-specific binding constant K RD such that During the time 1/k d that the protein remains nonspecifically bound macroscopically, it will make λ/k d −1 = φ 0 /(1−φ 0 ) returning micro-dissociations. For each of these it can diffuse and rebind at a position z some distance away, as given by the probability density F(z) in ((4,5); Supplementary Equation (3)). This process, which is an integral part of the diffusion geometry, has been called hopping.
While non-specifically bound, the protein can move along DNA in a 1D diffusion with rate D 1 (bp 2 s −1 ). If we, for a moment, neglect the displacements to near-by sites during the short-lived hopping events, the protein remains nonspecifically bound during the macroscopic residence time 1/k d . As a consequence, the protein will scan a distance along the DNA corresponding to the sliding length during each macroscopic non-specific binding event Thus a protein landing within a distance s from the specific site will also find it with high probability, giving an effective target size of 2s bp. Actually, under these conditions one finds the total search time (Supplementary Equation This is the time to locate the specific site. However, it is possible that the protein does not recognize it and binds but instead slides off. If the protein binds specifically with a finite rate constant k sp , the total time to bind will be given by the search time (Equation (9)) increased by the factor (41) (Supplementary Equation (31)) Another complication is that sliding could be impeded due to crowding by other proteins that are bound to the DNA. On the assumption that a fraction v (vacancy) of the non-specific DNA is free of bound proteins, Li et al. (45) showed that the fraction of sites available for binding by a particular protein is ve 1− 1 v . Thus, the total number of accessible non-specific binding sites in the genome would be where M tot (bp) is the total genome size. Furthermore, crowding impedes the sliding and the effective target size becomes (45) L(v) as given by Supplementary Equation (40) rather than 2s. Thus the search time in Equation (9) is increased due to crowding by the factor Here, d is the footprint of the DNA-binding protein. In the parameter regime where the recognition step and the effects of crowding contribute independently to the search time, both correction factors (Equations (9) and (12)) can simply be multiplied to the result (Equation (9)). In this case the search time is given by This simple approximation works well in the parameter space investigated as shown both by the simulations (Supplementary Figure S7) and by theory (Supplementary Equation (44)). However, with two operator sites within sliding distance of each other, it is more difficult to find analytical solutions for the combined effects of crowding on DNA, hopping and finite binding rates and we have therefore resorted to computer simulations to test the corresponding parameter combinations. The simulations recover the analytical results in the appropriate limits as shown in Supplementary Figure S7.
The effect of intersegment transfer is to modify the exchange rate between DNA segments following a macroscopic dissociation, k d , by adding to it another exchange rate k IST (5). The nature of k IST is to allow for an exchange between DNA segments without prior dissociation. We therefore make the substitution k d → k d +k IST in Equations (8) -(13) without changing the fraction bound F B .

The Monte Carlo simulation scheme
The search process for one or two operator sites was realized using a 1D 5000-element (base pairs) lattice MC simulation with periodic boundary conditions representing the DNA. A protein corresponds to a position on the lattice with all proteins having a footprint d, where d was set to 21 bp. Except for the searching protein, the other proteins involved are nucleoid associating (NAPs) such as H-NS, Fis or IHF in E. coli (46). It is assumed that the NAPs are stationary during the time microscopic processes (hopping and sliding) govern the target search on the ms time scale. It has previously been shown that the effect of allowing NAPs to move during the microscopic sliding events is small (45). When the searching protein dissociates macroscopically the NAPs are uniformly redistributed on the lattice. The redistribution frequency is justified by the fact that the differences between the dissociation constants for LacI and NAPs under comparable in vitro conditions (6,42,(47)(48)(49) are mostly within an order of magnitude of each other. We tested this by redistributing NAPs every 10th LacI macroscopic dissociation which resulted in a difference in search times within a few percent depending on the vacancy. The output of the simulation is the average number of microscopic or macroscopic binding-dissociation cycles before one of the two operators is found in the 5000-bp DNA. The number of cycles is then linearly scaled to the 4.5 × 10 6 -bp E. coli chromosome.
The algorithm starts with randomly positioning the searching protein on DNA followed by placements of roadblocks, one on each side of LacI such that the end-to-end distance (ζ ) distribution between them is p(ζ ) (45) (Supplementary Equation (45)). A new end-to-end distance distribution is created by multiplying p(ζ ) with (ζ −d+1) where ζ Nucleic Acids Research, 2015, Vol. 43, No. 7 3457 = {d,d+1. . . ,2500 −d} since the simulations condition on LacI being placed on DNA before the roadblocks. There is exactly one way to place LacI between the roadblocks if the distance between them is d, two ways if the distance is d+1 and so on. The contribution to the search time from the binding attempts hindered by obstructing proteins is taken care of in the dissociation rate in Equation (16) through the definition of the equilibrium constant (Equation (7)), which contains the number of accessible sites only. This approach where two roadblocks are distributed at either side of the protein with the end-to-end distance distribution as given in (45) agrees very well with simulations where all roadblocks are included.
Next step in the simulation is to sample whether the searching protein should (i) dissociate microscopically with rate , (ii) slide 1 bp in a direction which is not blocked by another protein with rate D 1 , (iii) bind specifically with rate k sp if the protein is on the lattice corresponding to the specific site or (iv) intersegment transfer with rate k IST . The probability for each event corresponds to the fractional rate of the event as for any time continuous Markov process; e.g. when the protein can slide in two directions the probability that the next event is a microscopic dissociation is or a specific binding event Using the experimentally determined values for M tot , V c , D 3 , ρ, F B , v and as constants and the largely unknown D 1 , α and p bind as variable input parameters, λ and k sp can be solved for and p dissoc calculated; see Table 1 for a summary of the parameter names and their definitions in the model.
Using Equations (14) and (15) we determine whether to dissociate microscopically or bind specifically. Given a microscopic dissociation the searching protein has a chance of macroscopically dissociating as given by Equation (3). A macroscopic dissociation implies resampling the position of the searching protein and all the crowding proteins on the DNA.
If the protein dissociates microscopically, but not macroscopically, and hopping is included in the description, then the protein is displaced along the length of the DNA according to the distribution in Supplementary Equation (3) with reflective boundary conditions at the edge of a roadblock; if hopping is not included, the protein is simply returned to the site it came from. The number of microscopic dissociation events until specific binding is recorded, and the whole process is repeated 150 000 times to get a good estimate of the average search time for one set of parameters.
To arrive at the total search time, the number of microscopic association-dissociation cycles (N micro ) is multiplied by the time it takes for each cycle, i.e. (λF B ) −1 . Thus,

Approach
Our approach to estimate the in vivo parameters governing the target search is to acquire the time it takes for LacI to find its specific binding site using a combination of Monte Carlo simulations and analytical calculations as a function of three unknown variable input parameters D 1 , α and p bind . The calculated search time is then used to compare with two sets of single molecule in vivo data which are the search times to one operator site and the ratio of search times for one and two operator sites at various distances. The parameter regions where there is agreement between the calculated search times and the measured search times are illustrated in Figure 2. Here the semi-transparent cyan region corresponds to parameters where the overall search time to one site is acceptable and the level curves correspond to chisquare values for the agreement for the two-operator data set. The calculated search times should conform to both sets of in vivo data, which means that the acceptable parameter space is the region where the cyan region overlaps with the contour map. The constraint to small chi-squared values is because they quantify the difference between the estimated search times and the experimental search times for the data involving two operator sites as exemplified in Figure 3. This approach is expanded upon and clarified in the sections that follow.
Parameters. We treat the parameters that are not known in vivo as variable input parameters (see Table 1), i.e. the probability (p bind ) that the repressor recognizes and binds to its specific site after finding it, the 1D diffusion coefficient for sliding along DNA in vivo (D 1 given in bp 2 s −1 in the equations, otherwise given in m 2 s −1 ) and the degree of diffusion control in binding non-specific DNA sequences (α). The parameters, for which we have better estimates, are treated as constants, i.e. the 3D diffusion coefficient for LacI D 3 = 3 m 2 s −1 (50), the E. coli genome size set to M tot = 4.5 × 10 6 bp, the base pair length = 0.34 nm, the cell volume Cell volume per genome equivalent The reaction radius between LacI and a non-specific DNA segment (0.34 nm) The width of one base pair F B (0.9) The fraction of time a searching protein stays non-specifically bound to DNA v {0.70,0.85} The fraction of the genome unoccupied by DNA binding proteins The degree of diffusion control Probability of binding the specific operator site k IST {0-500}s −1 Intersegment transfer rate The variable parameters are the 1D diffusion coefficient D 1 [m 2 s −1 ], the degree of diffusion control, α, the probability of binding the specific site given that the searching protein is on it, p bind , are indicated by the curly brackets. The other parameters are considered well known and are fixed except for the vacancy on DNA, v, where we test two different values. The y-axis is the ratio between the rate with which LacI finds one of the two operator sites at a certain distance from each other and the rate with which it finds one operator site; the x-axis is the distance between operator sites in base pairs. The s-values in the legend are the unobstructed sliding length for LacI given by Equation (8).
(per genome equivalent) V c = 1 m 3 , the reaction radius ρ = 5.5·10 −3 m and the fraction of time the searching protein stays non-specifically bound F B = 0.9 (41). For the fraction of the total DNA length not covered by other proteins (i.e. the DNA vacancy v) there are alternative literature values and we therefore independently tested both v = 0.7 (45) and v = 0.85 (46). Furthermore, the average number of repressors per genome equivalent in the experimental system was estimated (41) to be between 3 and 5. Two parameters, α and D 1 , were varied systematically given high and low values of p bind . p bind is given the extreme value p bind = 1, corresponding to always binding when reaching the site and p bind = 0.2, below which we get no solutions with an acceptable total search time. The range of D 1 is confined from above by the corresponding in vitro value of 0.046 m 2 s −1 (50,51) and from below by 0.01 m 2 s −1 where the total association is too slow to give any acceptable solutions. Based on viscosity effects alone, Tabaka et al. (52) calculate D 1 = 0.025 m 2 s −1 ; this neglects friction with DNA and corresponds to half the in vitro value. Thus, we expect the actual value to be lower than this. α can, in principle, take any real positive value and we evaluate the model in the range where we have acceptable agreement with both sets of data.
Correlated binding to two different operator sites. The most informative experimental constraints on the allowed parameter space are those that compare the ratios of the search times for one of the two specific sites at distances 25, 45, 65, 115 and 203 base pairs apart and one specific site alone (41). When the specific sites are far apart, the approaching protein will see them as two independent sites, but when they are within sliding distances from each other, the effective target will look more like a single site. The level curves correspond to chi-squared values less than 3. Figure  3 shows examples of the fits afforded for some of these chisquare values justifying our choice of the threshold at 3 as a good-fit region.
The positive correlation for low chi-square values (red) in the D 1 -α plane of Figure 2 is due to the opposite effects these parameters have on the sliding length (Equation (8)). Increasing the degree of diffusion control means increasing both association and dissociation rates (Equation (4)) since the equilibrium constant is held constant. The sliding length, being inversely proportional to the square root of the dissociation rate, will thus decrease, which means that D 1 increases to maintain the sliding length and the dependence between two operators at different distances.
The total binding time. In the Materials and Methods section we derive the equation for the total search time for one repressor and one operator when crowding and finite binding probability at the operator site is introduced (Equation (13)). As the derivation includes some approximations, the validity of this equation was tested and found to agree well with simulations in the parameter ranges of interest (Supplementary Figure S7).
The total search time from this analytical expression is used as an additional constraint for the acceptable parameter space. The average number of searching proteins was estimated to be between 3 and 5 per genome equivalent (41) with an experimental search time for O sym of 81 ± 2 s. This corresponds to a search time for a single protein to fall between 236 and 416 s. In Figure 2 this is translated to an acceptable parameter space with respect to the absolute search time shown in cyan. An agreement between experiments and the model would be achieved for parameter values where the cyan region overlaps with the 'good-fit' red chi-square values. From the figure it is evident that the model is not consistent with the calculated value v = 0.7, since it would require D 1 > 0.04 m 2 s −1 for a reasonable fit. This lends more credibility to the alternatively reported value 0.85.

Constraints from one or two stationary roadblocks
Another experimental result that can be used to constrain the solution space is the ratio of the search times with and without a stationary protein, a roadblock, positioned next to the operator site (see Table 2). The roadblock should increase the binding time by a factor of two if the DNA is naked except for the roadblock and p bind = 1.The ratio between the times has been found to be less than 2 in vivo, which was one of the arguments suggesting that p bind < 1 (41). In the presence of other semi-dynamic roadblocks however, this argument fails. A stationary roadblock next to the operator reduces the probability that a random roadblock will overlap the operator and thereby hinder a specific binding event. The increased influx to the operator site counteracts the doubling of the search time one would expect when only considering that half of the pathways into the operator site is cut off by the stationary roadblock. This can explain a factor less than 2 also when p bind equals 1. Further, the values in Table 2 show that the magnitude of the influx effect increases (the ratios get lower) as the vacancy decreases. This is due to a relative increase in the number of events where roadblocks are excluded from the specific operator site. When p bind is 1, the increase in search time is simply a factor 2v (Supplementary Equation (47)) and the results ( Table 2) exclude the case with 70% vacancies in the operator region.
Hammar et al. (41) also have results for two cases where the operator is boxed in between two stationary roadblocks with different gap length. The simulation results for these cases are consistent with the parameter estimates above but do not contribute any new constraints on their values.

The sliding length
An important parameter for characterizing the target search is the sliding length (Equation (8)) for which there are both theoretical and experimental estimates (8,41,52). The best fits to the model result in sliding length between 30 and 40 bp for the acceptable solution space i.e. where cyan regions overlap with chi-squared level curves. These sliding lengths are in accord with recent in vivo estimates (41,52).

The effect of hopping
Hopping is the correlated motion of the searching protein along the length of DNA at the microscopic dissociation distance where the protein loses all electrostatic and van der Waals interactions with DNA. The hopping effect is analytically tractable and the theory is developed in the Supplementary material in the case of no crowding proteins (v = 1) and unity probability of specific binding (p bind = 1). The results are summarized in Figure 4 where we have used  (27) and (28), respectively. In the case without hopping a microscopic dissociation results in rebinding to the same position it left. These results only show divergence between the search times with and without hopping when the degree of diffusion control (α) is increased beyond 1 or as the sliding length is decreased below 10 for LacI. Thus, we begin to see an increase in the relative contribution of hopping which effectively extends the specific operator site a few base pairs as the sliding length decreases. In the absence of sliding, hopping could decrease the search time by an order of magnitude (if α > 10; dashed-dot curves in Figure 4 Left Panel). However, the parameter region where hopping influences the search time falls outside the bounds of the experimental constraints; low D 1 and high α means high chi-square values (see Figure 2). Furthermore, the search time remains invariant with respect to hopping also when both crowding proteins and a specific binding probability less than 1 are introduced in the MC simulations (data shown in Supplementary Figure  S5) for combinations of parameters that are consistent with experimental data.

Intersegment transfer
The process by which two DNA segments diffuse within binding distance of a protein capable of transiently binding both segments and therefore transfer from one segment to the other without dissociation is called intersegment transfer. We take this mode into account using the approach of (5). For the combination of parameters having no analytic counterpart we used simulations to get the chi-squared values. We find that intersegment transfer rates below ca 100 s −1 have a negligible effect on the minimal acceptable D 1 value ( Figure 5). The reason for the small effect is that intersegment transfer disrupts the sliding process which weakens the correlations in the two-operator data effectively increasing the required D 1 value. At the same time, intersegment transfer increases the specific association rate, effectively decreasing D 1 . For higher values of k IST , the fit becomes increasingly bad, pushing the minimal acceptable D 1 higher (Supplementary Figure S8). The table shows the ratio between the time for binding one specific operator site with and without one stationary roadblock adjacent to the operator site for different values of p bind , and v with D 1 = 0.03 m 2 s −1 and α = 0.10-0.80. The experimental value is 1.75 ± 0.18 (41). Results are from simulations and from theory (Supplementary Equation (46)).

DISCUSSION
In this work we have tested if the classical sliding model for target location, with a few extensions that previously have been discussed and analyzed independently (1D crowding, hopping, finite binding probability and intersegment transfer), is sufficient to explain recent single-molecule in vivo data with reasonable parameter values. The free parameters in the simulations are the 1D diffusion coefficient D 1 and the degree of diffusion control α which have been varied for high and low p bind (the probability of binding specifically at the operator site). The rest of the parameters are taken from experimental literature reports. When there are conflicting literature values we have tested which set of parameters results in the best overall fit. There are also obviously uncertainties for other literature values such as the number of searching proteins N or the total accessible genome size M acc . Given the limitations, the parameter sweep generates a small but reasonable solution space where the model explains the experimental data, and without additional experimental observations we find no need to discard the model. The solution space does, however, result in relatively specific predictions of the unknown parameters; the most unexpected of these is a relatively high value of D 1 and a low value of α.

Parameter space--1D diffusion and sliding
The upper bound on D 1 was set to 0.05 m 2 s −1 based on in vitro single-molecule tracking experiments (50). The pa-rameter sweeps suggest that D 1 cannot decrease below 0.01 m 2 s −1 (Figure 2) to give both the experimental association time and reasonable chi-square values. However, a simultaneous fit with the two-operator dependence suggests that D 1 ≥ 0.02 m 2 /s. This seems high, considering that viscosity effects alone one gets D 1 close to 0.025 m 2 s −1 in vivo (52) which becomes 0.009 m 2 s −1 when multiplied with a retardation factor e −(ε/k B T) 2 (37) which takes into account that LacI diffuses in a rugged-free energy landscape (53) with roughness ⑀ ∼ 1 k B T (37,53). Thus a D 1 value approaching 0.009 m 2 s −1 is more reasonable than the D 1 ≥ 0.02 m 2 /s constraint given by the model and the simulations. There are two possible reasons for this apparent discrepancy, either the model or some of its parameters are wrong, which is addressed in the next paragraph, or 1D sliding in vivo is faster than what can be expected by an extrapolation from in vitro data. This would in particular suggest that the magnitude of the ruggedness of the free-energy landscape (37,53) along DNA could be higher in vitro than in vivo. A smoother free-energy landscape in vivo might be due to a stabilization of the DNA phosphate backbone by nucleoid-associated proteins, metabolites or ions (4,6) missing in vitro. Other effects that are neglected in the simple lattice MC simulation method and which could potentially explain the discrepancy between the expected and the fitted D 1 values are hydrodynamic interactions (HIs) and conformational fluctuations (CFs) of DNA (54). These effects enter in the 1D diffusion coefficient (D 1 ) and the propensity of LacI to rebind (α). As these parameters were varied in the simulations and then constrained by fitting to experiments both HIs and CFs of DNA are included to the extent that they contribute to the in vivo search kinetics, although we cannot separate each effect and gauge its respective magnitude. However, D 1 calculated from viscosity effects on the protein motion and multiplied with the fluctuations in the interaction freeenergy between LacI and DNA gives ∼0.045 m 2 s −1 (51) which is the value measured in vitro. Therefore, we expect that the same kind of estimate, without explicitly including the effects of CFs and HIs, will also be valid in vivo.

Parameter space--uncertainties
When looking at which of the experimental input parameters are most uncertain we find that there are considerable uncertainty in both the DNA vacancy v and in the estimated number (N ≈ 3-5) of repressors participating in the search. The number of proteins does not influence the chi-square fits in Figure 2, but the total search time is influenced in direct proportion. Thus, if N > 5, the overlap region would be pushed toward lower α and D 1 (Figure 6). The DNA va- cancy has been assumed to be uniform over the chromosome; however, it could be very different around the operator regions as compared to the bulk of the DNA, for instance if parts of the genome are excluded due to local folds or if DNA binding proteins themselves are clustered in regions of the chromosome other than the operator regions (55). This would have the effect of lowering the occupancy around the operator site while allowing for more DNA binding molecules overall. Taking this effect into account, the overlap between the cyan region and the red chisquare values would increase, which would also allow for lower D 1 values ( Figure 6).

Parameter space--crowding and the propensity to bind nonspecifically
There are conflicting reports concerning the occupancy of the DNA where Li et al. (56) calculate 30% while Tabaka et al. (45) calculate 15%. Crowding on DNA has at least three distinct effects on the association process: (i) binding of other proteins on DNA beyond the immediate operator region reduces the amount of competitive DNA leaving less DNA to be searched. Alternatively, if, as in the calculations above, the fraction F B of non-specifically bound repressor is considered as a given experimental quantity, the reduction of competitive DNA would lead to a larger non-specific binding constant (K RD ). (ii) Crowding molecules can directly obstruct the operator site and thereby reduce the rate of binding. (iii) Finally, crowding molecules would interfere with the sliding process, effectively reducing the distance the repressor can slide along the DNA. Thus when crowding is introduced, the unobstructed sliding distance will have to be increased to explain the dependence in the two-operator data ( Figure 2). The solution spaces exclude the previously proposed 30% occupancy case simply because the search becomes too slow. The 30% occupancy cases also conflict with the results from placing a stationary roadblock adjacent to the operator (see Table 2).
As a solution, there could be two levels of chromosomal DNA coverage: for example, perhaps half of the crowding proteins have specific binding sites outside the operator region and would therefore only contribute to the decrease of the competitive DNA. Moreover, this competition could also decrease if some DNAs were inaccessible due to packing in the nucleoid. Thus, the kinetic data suggest that v = 0.85 at least in the operator region.
In the calculations above we have used F B = 0.9, i.e. the repressor spends 90% of its search time on non-specific DNA (41). We have also tried F B = 0.7 for which we find no acceptable solution space. For proteins with very strong non-specific binding (F B → 1), intersegment transfer would be needed to avoid having them trapped in DNA regions far from the operator or between roadblocks.

Parameter space--the recognition step
Once the repressor has reached the operator site it might pass by without recognition and specific binding. Hammar et al. (41) introduced an extra recognition step that was given by the probability of binding, p bind , for a repressor non-specifically bound at the operator site. This parameter was introduced to explain the difference in association rates to operators that differ only a few bases in sequence, in particular the 20% difference in rate between the natural operator O 1 and a synthetic operator sequence O sym at 298 K; Without this extra step, the sliding model cannot explain why the association rate depends on the operator sequence. A finite recognition step could also explain why blocking off half of the access paths for the repressor by placing a stationary roadblock on one side of the operator does not lead to an increase in the association time by a factor of two but only by 1.75. However, as we have shown in Table 2, such a stationary-roadblock effect also follows from introducing crowding with v = 0.85. This is because the stationary roadblock blocks not only the access path for the repressor but also random roadblocks from directly obstructing the operator site. The data used in the present analysis all refer to O sym , which consequently could have p bind close to 1, while O 1 could have a significantly smaller value. Indeed, p bind = 1 affords a slightly better fit to the data than does p bind = 0.2 ( Figure 2).
It should be noted that sliding allows the protein to test a site many times before moving away, so that even a small value for p bind leads to a modest decrease in the overall association rate (41). For the relative small values of α and large values of s found here, the effect of p bind < 1 is to increase the association time by a factor (Equations (10) and (15)) which can be approximated as 1 + 1− p bind p bind 1 s . This gives roughly a 10% increase for p bind = 0.2, but only 2.5% for p bind = 0.5. Thus, it would be difficult to distinguish p bindvalues above ∼0.5 from this effect, why we can only say that p bind is so high that the overall binding rate to O sym is not significantly reduced.

Diffusion control and steric effects
The best fit to the in vivo kinetic data suggests that the parameter α is small (α ≈ 0.1-0.15; Figure 2) (52). This would imply that the non-specific association rate constant (k a ) is smaller than its maximal diffusion-limited value, which may be interpreted as a reaction limited non-specific binding. However, the reduction in k a could also be due to steric constraints, rather than an activation step in the non-specific binding (37). Thus, the non-specific association could well be diffusion limited when the steric effects are accounted for.
The effective reaction radius for non-specific binding, ρ, is difficult to estimate. At a maximum ρ would correspond to the distance between the mass centers of the DNA cylinder and the protein when they form a non-specific complex (the Smoluchowski limit), i.e. ρ max ∼5.5 nm. This would be the case if all molecular collisions would have a high probability of finding the productive orientational configuration required for a bound complex to form before diffusing apart again. This, in turn, would require that the molecules are held together for some time while they explore their mutual orientational space, akin to a local sliding process. Although electrostatic attraction could help pull the reactive regions together into a productive orientation, these forces are screened and will help primarily when the reactive regions are already close. If α ∼ 0.1 is interpreted as being due to steric effects (37), then the corresponding effective interaction radius would be ρ eff = ρ max e −1/α , which could be on the order of ∼10 −4 nm as expected from some protein-protein association rates. While the diffusion-limited protein-protein association rate is directly proportional to ρ eff , for the protein-DNA association, being largely a 2D process, a small value for ρ eff decreases the macroscopic association rate only logarithmically.

Hopping
Hopping is a process where a non-specifically bound protein dissociates and quickly rebinds mostly to the same DNA site but occasionally to one nearby along the DNA contour. Since hopping is a consequence of the diffusive motion in the geometry of the protein leaving DNA in the form of an extended chain, we expect it to exist for all DNA binding proteins. It can be formally distinguished from sliding in that it involves diffusion paths where there are no interactions between DNA and protein. The task at hand is to quantify the influence of hopping on the rate with which the LacI dimer finds its specific binding site. The modeling (5,8) shows that hopping would have little effect on the LacI association process in the presence of sliding ( Figure  4). However, hopping would be very important for nonsliding proteins by allowing one that is 'almost there' to try neighboring binding sites without having to start the search all over. Although, it would be possible to model the observed specific association time in vivo in terms of hopping alone (Figure 4), it cannot explain the observed dependence between different binding sites; since sliding is needed to bridge the distances between them. When combining hopping with sliding and crowding in the simulations, we find no effect of hopping in the required parameter ranges. It should be noted that there is ambiguity in the definition of hopping in the literature, where it is sometimes defined as a non-helical displacement of the protein along the DNA while still bound to it (17,54). This would in our work cor-respond to a non-helical sliding mode. Such a sliding mode (defined as hopping in some other work) could have an impact on the estimated value of D 1 . However, we find this unlikely for two reasons: first, fully atomistic MD simulations (37) show that the repressor has a high propensity to follow the helical path. Furthermore, the agreement between the experimental value for D 1 and that calculated for a helical sliding mode (51) shows that non-helical sliding does not contribute significantly, at least not in vitro.

Intersegment transfer
Intersegment transfer is the process by which a protein transfers to an uncorrelated distant site as measured along the contour length of DNA via a doubly bound intermediate. This means that the transfer circumvents macroscopic dissociation where the protein would have to diffuse in 3D before finding and binding to an uncorrelated (beyond a few persistence lengths) site. The need for a doubly bound intermediate demands that the protein have two binding sites with a geometry such that a doubly bound complex is realizable. This geometrical requirement is satisfied for the LacI tetramer for which the intersegment transfer mode has been indirectly shown to occur between specific binding sites in vitro based on the dependence of the specific dissociation rate on the DNA concentration (57). However, to our knowledge there is no experimental evidence of intersegment transfer between non-specific binding sites neither for the LacI tetramer nor for the LacI dimer. In fact, an intersegment transfer mode has not been observed between specific binding sites for the LacI dimer either. A priori, based on visual inspection of the geometry, it would seem improbable that intersegment transfer would have any significant effect on the LacI dimer search kinetics. The partial unbinding of one of the two LacI dimer binding sites simply does not allow for LacI conformations which could bind to two DNA segments at the same time. Another possibility is that the LacI dimer could bind two DNA segments but that the non-specific binding strength is much stronger for the native binding site which would make intersegment transfer a very infrequent event. Although the introduction of an intersegment transfer mode does improve the search efficiency, as seen by an increased specific association rate, the results cannot be fitted well to the in vivo data if the intersegment transfer rate is much larger than ∼100 s −1 . Thus, the 1D diffusion coefficient would become impermissibly high for higher intersegment transfer rates.

CONCLUSION
We have extended the sliding model to include crowding, hopping, intersegment transfer and the possibility of traversing the specific binding site to calculate the time it takes for one LacI molecule to find its specific binding site. To account for recent in vivo data on the dependence between two operator sites as a function of the distance between them we also employed Monte Carlo simulations. Using the simulations and the analytical solutions as a constraint we generate solution spaces from a parameter sweep where the 1D diffusion coefficient and the degree of diffusion control were systematically varied given diverse values Nucleic Acids Research, 2015, Vol. 43, No. 7 3463 for the DNA occupancy, and the probability of LacI binding the specific operator site. We find that there exists a small parameter space where the model is compatible with the experimental measurements. This space is not significantly extended by allowing for hopping or intersegment transfer of the LacI dimer, which does not imply that these mechanisms do not exist, only that they do not contribute significantly in the allowed parameter space. The allowed space suggests that LacI binds the specific O sym -operator site with a probability larger than 0.5. We also establish that the lac repressor dimer binds non-specific DNA with a low probability at the first contact. This does however not necessarily imply an energetic barrier for binding, but rather that not all patches on the repressor or DNA will bind at contact and that steric effects may need to be considered. Finally, based on that the DNA occupancy by other proteins in the operator region seems to be ∼15%, while the overall occupancy may be higher, we propose that chromosomal DNA may have two levels of coverage by Nucleoid Associated Proteins, one fraction that are randomly associated with DNA and thus are found in the operator region where they influence binding and sliding by the repressor and one fraction of more specifically binding proteins that do not reside in the operator region and only contribute to the search time by hiding part of the chromosome from searching.

SUPPLEMENTARY DATA
Supplementary Data are available at NAR Online.