Biologistics—Diffusion coefficients for complete proteome of Escherichia coli

Motivation: Biologistics provides data for quantitative analysis of transport (diffusion) processes and their spatio-temporal correlations in cells. Mobility of proteins is one of the few parameters necessary to describe reaction rates for gene regulation. Although understanding of diffusion-limited biochemical reactions in vivo requires mobility data for the largest possible number of proteins in their native forms, currently, there is no database that would contain the complete information about the diffusion coefficients (DCs) of proteins in a given cell type. Results: We demonstrate a method for the determination of in vivo DCs for any molecule—regardless of its molecular weight, size and structure—in any type of cell. We exemplify the method with the database of in vivo DC for all proteins (4302 records) from the proteome of K12 strain of Escherichia coli, together with examples of DC of amino acids, sugars, RNA and DNA. The database follows from the scale-dependent viscosity reference curve (sdVRC). Construction of sdVRC for prokaryotic or eukaryotic cell requires ~20 in vivo measurements using techniques such as fluorescence correlation spectroscopy (FCS), fluorescence recovery after photobleaching (FRAP), nuclear magnetic resonance (NMR) or particle tracking. The shape of the sdVRC would be different for each organism, but the mathematical form of the curve remains the same. The presented method has a high predictive power, as the measurements of DCs of several inert, properly chosen probes in a single cell type allows to determine the DCs of thousands of proteins. Additionally, obtained mobility data allow quantitative study of biochemical interactions in vivo. Contact: rholyst@ichf.edu.pl Supplementary information: Supplementary data are available at Bioinformatics Online.


INTRODUCTION
Biologistics and biochemistry in a crowded environment are two emerging interdisciplinary fields of science. They provide quantitative analysis of transport of proteins and their spatio-temporal correlations involved in gene expression and regulation. According to the current state-of-the-art theory of gene expression (activation or repression) in bacteria (Elf et al., 2007;Li et al., 2009), mobility of proteins is one of the few parameters necessary to describe reaction rates of gene regulation. The mobility is understood as a three-dimensional diffusion or one-dimensional sliding along DNA (for prokaryotes and eukaryotes), or by velocity of molecular motors (in eukaryotic cells). Understanding of diffusion-limited biochemical reactions requires accurate in vivo mobility data for the largest possible number of proteins in their native forms. The three-dimensional diffusion of different types of macromolecules in the cytoplasm of Escherichia coli has been experimentally studied in several cases (Bakshi et al., 2012;Campbell and Mullins, 2007;Cluzel et al., 2000;Derman et al., 2008;Elowitz et al., 1999;English et al., 2011;Golding and Cox, 2004;Jasnin et al., 2008;Konopka et al., 2006;Kumar et al., 2010;Mika et al., 2010;Mullineaux et al., 2006;Nenninger et al., 2010;Slade et al., 2009;van den Bogaart et al., 2007), but experimental determination of the mobility of all proteins is technically an impossible task because of their large number in a given cell. For example, the proteome of the K12 strain of E. coli (Blattner et al., 1997) contains more than 4300 proteins. Moreover, most of the recent studies concern measurements mainly performed with the use of green fluorescent protein (GFP) (Elowitz et al., 1999;Konopka et al., 2006;Kumar et al., 2010;Mika et al., 2010;Nenninger et al., 2010;Slade et al., 2009;van den Bogaart et al., 2007) or GFP fusion proteins (Jennifer et al., 2001).
Attempts to study the diffusion of many proteins simultaneously, under conditions resembling the interior of the cells, were performed in silico by McGuffee and Elcock (2010). Computational methods, however, have limitations arising from the speed and capacity of computing hardware and small number of interacting proteins in the system ($50 different types of proteins) (McGuffee and Elcock, 2010). An alternative approach is the quantitative analysis of available literature data. Mika and Poolman (2011) gathered literature data of diffusion coefficients (DCs) of $20 different types of proteins in E. coli and proposed a power law dependence of the DC on the molecular weight of proteins. This power law, however (Mika and Poolman, 2011), can be applied only for the proteins in a narrow range of molecular weights, i.e. between 20 and 30 kDa.
In this work, we present a method for predictions of the DCs of proteins for the proteome of any cell. We collected all available literature data (Bakshi et al., 2012;Campbell and Mullins, 2007;Cluzel et al., 2000;Derman et al., 2008;Elowitz et al., 1999;English et al., 2011;Golding and Cox, 2004;Jasnin et al., 2008;Konopka et al., 2006;Kumar et al., 2010;Mika et al., 2010;Mullineaux et al., 2006;Nenninger et al., 2010;Slade et al., 2009;van den Bogaart et al., 2007) on diffusion of various probes, including small molecules (water, glucose), proteins and plasmids, in the cytoplasm of E. coli. We used those data and the scaling function of viscosity (Holyst et al., 2009;Kalwarczyk et al., 2011;Szyman´ski et al., 2006a, b) to predict the mobility of macromolecules in the bacterial cytoplasm. We also predicted the DCs of amino acids, sugars, proteins and DNA. We created a unique database, including the DCs of all proteins of strain K12 of E. coli (4302 proteins), their oligomers and their potential complexes with translocation proteins; 6600 records in total.

A brief description of the method
Our predictions of DCs of proteins in the bacterial cytoplasm are based on experimental data on diffusion in the cytoplasm of E. coli available in the literature (Bakshi et al., 2012;Campbell and Mullins, 2007;Cluzel et al., 2000;Derman et al., 2008;Elowitz et al., 1999;English et al., 2011;Golding and Cox, 2004;Jasnin et al., 2008;Konopka et al., 2006;Kumar et al., 2010;Mika et al., 2010;Mullineaux et al., 2006;Nenninger et al., 2010;Slade et al., 2009;van den Bogaart et al., 2007). The method relies on the dependence D 0 =D cyto ¼ = 0 , where D 0 is the DC of macromolecule in water of viscosity 0 , and D cyto is the DC of macromolecule in the cytoplasm. is the effective viscosity experienced by the macromolecule during diffusion in the cytoplasm. The protocol of determination of DCs is graphically represented in Figure 1.

Calculation of hydrodynamic radii and DCs in water
Hydrodynamic radius of proteins was determined using the following formula (Dill et al., 2011): while for RNA we used Equation (2) (Werner, 2011).
Dependence of the hydrodynamic radii of linear, circular or super coiled DNA on molecular weight [Equations (3)-(5), respectively] was obtained from DCs of DNA constructs (Robertson et al., 2006) using Equation (6).
Radii of amino acids and sugars have been calculated, assuming that the hydrodynamic radius r p corresponds to the van der Waals radius r w calculated according to the procedure described elsewhere (Zhao et al., 2003).
For each probe, we use the literature values of D cyto , while the values of D 0 (if not available) were calculated using the Stokes-Sutherland-Einstein equation [Equation (6)].
2.3 Calculation of DCs of various molecules in the cytoplasm of E. coli Using the molecular weights from Uniprot protein database (Apweiler et al., 2011;Jain et al., 2009), we calculated the DCs for the complete proteome of E. coli (K12 strain). We identified the cellular localization of each protein as well as its quaternary structure (a single polypeptide chain or multiple chain aggregates or complexes). In the case of membrane or periplasmic proteins, we adopted the assumption that, after synthesis, the proteins diffuse via the cytoplasm to its target in the membrane, through one of two transport pathways [twin-arginine translocation (TAT) or the general secretion system (Sec)] (Driessen and Nouwen, 2008;Sargent, 2007). Consequently, these proteins were considered as single polypeptide chains (the TAT pathway) or protein complexes with SecB or Tig proteins (the Sec pathway). Hydrodynamic radius of proteins was determined using Equation (1). When the protein was composed of several subunits, the molecular weight of all polypeptide chains comprising the protein was added together. On the basis of cumulative molecular weight of the complex, hydrodynamic radius of the protein r p and further its DC D 0 was calculated [Equations (1) and (6)]. Then, using Equation (7), we calculated the relative DCs for all analysed proteins, and we calculated the DCs of proteins in the cytoplasm D cyto . The calculated DCs of all proteins in the cytoplasm are summarized in Supplementary Table S1.

Construction of the scale-dependent viscosity reference curve
We collected the literature data (Bakshi et al., 2012;Campbell and Mullins, 2007;Cluzel et al., 2000;Elowitz et al., 1999;English et al., 2011;Golding and Cox, 2004;Jasnin et al., 2008;Konopka et al., 2006;Kumar et al., 2010;Mika et al., 2010;Mullineaux et al., 2006;Nenninger et al., 2010;Slade et al., 2009;van den Bogaart et al., 2007) for DCs of different solutes and macromolecules in the cytoplasm of E. coli ( Fig. 2 and Table 1). We used the least squares method to fit those data with Equation (7) (Kalwarczyk et al., 2011). To predict the DCs of molecules in the cytoplasm, it is essential to correctly select the probes that will be used to determine the reference curve. Next, one need to measure the DCs of selected probes in water (buffer) D 0 and the DC in the cytoplasm of studied cell D cyto . Using D 0 and D cyto , we create the sdVRC. To predict the DC of a given molecule, it is necessary to know its hydrodynamic radius r p or D 0 . Although sdVRC depends on both r p and D 0 , in practice, both parameters can be calculated knowing only one of them. Finally, by substituting the values of r p and D 0 to sdVRC, the DC in the cytoplasm D cyto can be determined here r p is the hydrodynamic radius of the probe, and R h and are length scales characterizing the cytoplasm. (an average distance between surfaces of proteins), R h (average hydrodynamic radius of the biggest crowders) and a (a constant of the order of one) are the fitting parameters whose values for the cytoplasm of E. coli are as follows: ¼ 0:51 AE 0:09 nm, R h ¼ 42 AE 9 nm and a ¼ 0:53 AE 0:04. From the scale-dependent viscosity reference curve (sdVRC), we directly determined the macroscopic viscosity m of the cytoplasm. We found that m ¼ 17:5 Pa Á s (26 000 times greater than the viscosity of water -0 % 0:7 mPa Á s at 310 K). R h is comparable to the radius of the loops (Kim et al., 2004) of DNA covered with proteins. The second length scale determined from sdVRC, , is comparable to the average distance between surfaces of proteins. R h determines the length scale above which the viscosity ceases to depend on the size of the probe and reaches the macroscopic value. For a probe smaller than , the experienced viscosity has a value comparable to the viscosity of water.
We used as-obtained sdVRC [Equation (7)] as a tool for prediction of DCs of all known proteins of K12 strain (Blattner et al., 1997) of E. coli as well as other molecules and macromolecules.

Interpretation of sdVRC
For more than a decade, diffusion of various proteins in the cytoplasm of E. coli has been studied (Table 1) (Bakshi et al., 2012;Campbell and Mullins, 2007;Cluzel et al., 2000;Elowitz et al., 1999;English et al., 2011;Golding and Cox, 2004;Jasnin et al., 2008;Konopka et al., 2006;Kumar et al., 2010;Mika et al., 2010;Mullineaux et al., 2006;Nenninger et al., 2010;Slade et al., 2009;van den Bogaart et al., 2007). Those experimental data show that the DCs exponentially depend on the size of the diffusing molecule. For example, GFP with a molecular weight M w ¼ 27 kDa and hydrodynamic radius r p ¼ 2:8 nm is characterized by cytoplasmic DC (Elowitz et al., 1999) D cyto ¼ 7:7 AE 2:5 m 2 =s. On the other hand, the DC of large oligomeric protein consisting of four subunits of GFP-tagged -galactosidase (-gal-GFP) 4 , of radius almost three times greater than GFP (M w % 580 kDa, r p ¼ 7:3 nm), is equal to 0:7 AE 0:22m 2 =s (Mika et al., 2010). The above differences are explained in terms of scale-dependent viscosity (Kalwarczyk et al., 2011) experienced by the diffusing molecule [cf. sdVRC, Equation (7)]. Equation (7) is an empirical equation primarily found for synthetic systems such as polymer or micellar solutions (Holyst et al., 2009;Kalwarczyk et al., 2011;Szyman´ski et al., 2006a, b). Interpretation of four parameters in Equation (7) (R h , , m and 0 ) is taken from those studies (Holyst et al., 2009;Kalwarczyk et al., 2011;Szyman´ski et al., 2006a, b). In synthetic systems, is the average distance between macromolecular components of the complex liquid and R h is equal to the hydrodynamic radius of a polymer random coil or of a micelle. In sdVRC, both and R h determine the viscosity experienced by a probe diffusing in the investigated liquid. For r p ) R h , the probe experiences the macroscopic viscosity m . A probe of radius r p smaller than moving in the liquid experiences the viscosity of the solvent 0 . On the other hand, a probe of r p 4 will experience a viscosity higher than the viscosity of the solvent. Finally, the effective viscosity experienced by a probe of radius between and R h (5r p 5R h ) depends exponentially on r p . In case of the cytoplasm of mammalian cells, R h corresponds to the hydrodynamic radius of the filaments forming the cellular cytoskeleton in the volume of the cytoplasm (Kalwarczyk et al., 2011). The bacterial cytoskeleton (Shih and Rothfield, 2006), however, is located directly next to the inner membrane (Pogliano, 2008). We can therefore assume that it should not have a large contribution to the viscosity experienced by the proteins diffusing across the cytoplasm. This assumption is also supported by the value of R h ¼ 42 AE 9 nm determined from fitting, which is similar to the radius of the objects identified as fragments of the bacterial nucleoid (around 40 nm) (Kim et al., 2004), i.e. loops of DNA covered with structural proteins. This value can be compared with the value of the hydrodynamic radius of the filaments forming the bacterial cytoskeleton (Hou et al., 2012;Pogliano, 2008) (fragments of length L ¼ 100 nm and a radius r ¼ 2.5 nm), which is $17 nm (Vandesande and Persoons, 1985), well below R h , obtained from the fit. Therefore, the length scale, R h , is neither correlated with the hydrodynamic radius of the filaments nor with the proteins whose highest hydrodynamic radius is about 10 nm. in the cytoplasm of E. coli equals 0:51 AE 0:09 nm and is comparable with the average distance between proteins. Parameters of the sdVRC ( and R h ) depend on the internal structure of the cytoplasm (proteins density, size of the nucleoid, etc.). Thus, each cell type will be characterized by a different shape of the reference curve (due to differences in parameters and R h ), while the mathematical form of the sdVRC will not change, and such curve can be constructed for other cell types.  (Table 1) of radii from 0.16 nm to 203 nm (closed square). The cytoplasmic DCs D cyto of probes were taken from the literature (Bakshi et al., 2012;Campbell and Mullins, 2007;Cluzel et al., 2000;Elowitz et al., 1999;English et al., 2011;Golding and Cox, 2004;Jasnin et al., 2008;Konopka et al., 2006;Kumar et al., 2010;Mika et al., 2010;Mullineaux et al., 2006;Nenninger et al., 2010;Slade et al., 2009;van den Bogaart et al., 2007) (cf. Table 1). By fitting the data with Equation (7) (solid line), we determined two length scales: ¼ 0:51 AE 0:09 nm and R h ¼ 42 AE 9 nm. We also determined the macroscopic viscosity of the cytoplasm m ¼ 17:5Pa Á s, i.e. 26 000 times higher than the viscosity of water 0 at 310 K. Shading represents the maximum error of fitting

Other models of diffusion in the cytoplasm
We compared our results with three models of diffusion in the cytoplasm of E. coli, available in the literature (Figures 3 and 4). McGuffee and Elcock (2010) proposed two models of diffusion in the cytoplasm: the 'steric' model, which takes into account only steric interactions between diffusing proteins, and the 'full' model, which includes steric, electrostatic and hydrodynamic interactions between diffusing entities. Comparison of the results (Figure 3) shows that the model we propose takes into account possible interactions between the diffusing probes and the surrounding environment. Moreover, we show that the full information needed to build the sdVRC can be obtained only after taking into account the probes whose r p greatly exceeds R h . For example, simulations conducted by McGuffee and Elcock (2010) include proteins that are most abundant in the cytoplasm, but the absence of large objects such as the nucleoid leads to underestimated values of D 0 =D cyto . The effect starts to be meaningful for probes whose r p 410 nm. In that case, the values of D 0 =D cyto are lower by an order of magnitude with respect to experimental results.
We also compared our results with the model proposed by Mika and Poolman (2011), where D cyto / M À0:7 w . As can be seen, the power law dependence of D cyto on M w may also lead to underestimated values of D 0 =D cyto . For example, for the ribosome 70S D 0 =D cyto measured experimentally is five times higher than predicted using power law dependence. Therefore, the power law dependence proposed by Mika and Poolman (2011) holds for the proteins in a small range of molecular weights 20-30 kDa and, moreover, is not applicable to macromolecules other than proteins. This is because each type of macromolecules (DNA, RNA, proteins, polymers, etc.), has different shape and thus different dependence of r p on M w [Equations (1)- (5)]. The shape of the macromolecule and in consequence its radius translates into the DC. The dependence of DC D cyto of different types of macromolecules (proteins, RNA and DNA) on their molecular weight is shown in Figure 4.

Accuracy of the model
Accuracy in determination of the sdVRC strongly depends on the amount of available data. One would expect that increasing the amount of data for probes of r p ) R h and r p 5, would significantly decrease the maximum error of the sdVRC (compare Fig. 2).  (2007) To test the accuracy of the presented method, we perform an analysis of the error of calculation of DC D cyto for GFP as a function of the number of experimental data points. Using Equation (7), we generated 10 datasets, where the number of data points ranges from 6 to 100. The generated data were uniformly distributed on a logarithmic scale and were randomly drawn on the assumption that measurement error is described by a normal distribution with standard deviation ¼ 0:1. We assumed that the error of r p equals to 5%. We found that 20 data points are sufficient to obtain D cyto at the level of 20% for the GFP (averaged over 10 generated datasets). In comparison, D cyto obtained from the analysis of the literature data was at the level of 40% (cf. Fig. 2). This is mainly because of the small number of available experimental data. Furthermore, most of the experimental data are available for a narrow range of hydrodynamic radii (around 3 nm, cf. Fig. 2) which is not preferred in this type of analysis. To date, however, there is no experimental data which would improve the accuracy of the sdVRC. Therefore, to improve the accuracy, additional experiments are needed to cover a wider range of r p of the probes and also uncertainties of D 0 , D cyto and r p should be minimized.

DCs of proteins
Preparing a database of DCs of the entire proteome, one should keep in mind that about 45% of the proteome are proteins forming a larger macromolecular complex (homo-, heterooligomers and complexes of membrane proteins with translocation proteins). Thus, the calculation of DCs of proteins should be carried out also for protein complexes. The Uniprot protein database (Apweiler et al., 2011;Jain et al., 2009) contains information on the molecular weight of proteins, their quaternary structure and their location in cell. Using these data and sdVRC (cf. Fig. 2) we calculated the DCs D cyto of all proteins in E. coli (Supplementary Table S1) present in the cytoplasm as monomers (single polypeptide chains) or as multimers (homo-or hetero-oligomers) or complexes composed of many chains, see Fig. 5). Figure 5A shows the histogram of molecular weights of cytoplasmic proteins, including homo-and hetero-multimers. Distribution of molecular weights of proteins is given by log-normal distribution with probability density function qðM w Þ ¼ð ffiffiffiffiffiffiffiffi ffi ð2Þ p M w Þ À1 exp ÀðlnðM w =ÞÞ 2 =ð2 2 Þ Â Ã , where standard deviation ¼ 0:825 AE 0:007 and mean molecular weight ¼ 31:9 AE 0:3 kDa. The relationship between the DC and the molecular weight of protein is expressed by the Equations (1) and (7). A histogram of DCs of cytoplasmic proteins is shown in Figure 5B (same proteins as in Fig. 5A). The distribution follows the curve given by the probability density function: pðD cyto ðM w ÞÞ ¼ qðM w Þ dMðD cyto Þ=dD cyto . Fig. 4. Comparison of measured and predicted D cyto as a function of molecular weight of the investigated probes. Predicted dependencies shown in the graph are expressed by Equation (7). The hydrodynamic radius r p of each type of macromolecules is given by the relationship: where M w is the molecular weight of the macromolecule. For proteins C ¼ 0.0514 and ¼ 0.392-Equation (1); RNA C ¼ 0.0566 and ¼ 0.38-Equation (2), linear DNA C ¼ 0.024 and ¼ 0.57-Equation (3); circular DNA C ¼ 0.0125 and ¼ 0.59-Equation (4); super coiled C ¼ 0.0145 and ¼ 0.57-Equation (5). For comparison, we present experimental data on DCs of proteins (Cluzel et al., 2000;Elowitz et al., 1999;English et al., 2011;Konopka et al., 2006;Kumar et al., 2010;Mika et al., 2010;Mullineaux et al., 2006;Nenninger et al., 2010;Slade et al., 2009), RNA (Golding and Cox, 2004), plasmid (Campbell and Mullins, 2007) and ribosomes 30S and 70S (Bakshi et al., 2012). The dashed-dotted straight line indicates the relationship D / M À0:7 w proposed by Mika and Poolman (2011). The dependence of D cyto on M w proposed by Mika and Poolman (2011), when applied to large plasmids (M w $ 2 Â 10 4 kDa), yields several orders of magnitude overestimation of DC  (Bakshi et al., 2012;Campbell and Mullins, 2007;Cluzel et al., 2000;Elowitz et al., 1999;English et al., 2011;Golding and Cox, 2004;Jasnin et al., 2008;Konopka et al., 2006;Kumar et al., 2010;Mika et al., 2010;Mullineaux et al., 2006;Nenninger et al., 2010;Slade et al., 2009;van den Bogaart et al., 2007). Black solid line represents Equation (7) with parameters: ¼ 0:51 AE 0:09 nm, R h ¼ 42 AE 9 nm and a ¼ 0:53 AE 0:04. We compared our results with data generated by McGuffee and Elcock (2010) and Mika and Poolman (2011). The data generated by McGuffee and Elcock (2010) were fitted by Equation (7), yielding the following parameters: for the 'full' model ¼ 0:2 AE 0:2 nm, R h ¼ 20 AE 48 nm and a ¼ 0:32 AE 0:12 (dotted circle, dotted line), for the 'steric' model ¼ 3:57 AE 0:1 nm, R h ¼ 17 AE 6 nm and a ¼ 0:59 AE 0:05 (open diamond, dashed line). The model proposed by Mika and Poolman (2011) where D cyto / M À0:7 w is plotted as dashed-dotted line

2975
We also calculated D cyto of membrane proteins that are $30% of the proteome of E. coli. Membrane proteins, after synthesis by the ribosome, are transported to the membrane, according to one of the two pathways: the TAT (Sargent, 2007) in which proteins are transported as single polypeptides in a folded state and the Sec (Driessen and Nouwen, 2008) in which unfolded proteins are complexed mainly by one of the two proteins: SecB or Tig.
We created a database (Supplementary Table S1) listing the DCs of all proteins, including their monomeric forms, the possible homo-and hetero-multimers, and in the case of membrane proteins also the complexes with translocation proteins (SecB and Tig). Apart from DCs of proteins, we calculated D cyto of small molecules such as amino acids or sugars and for macromolecules such as RNA or DNA (linear, circular and super coiled). Calculated values of DCs are listed in Table 2.
The predicted DCs refer only to three-dimensional diffusion. In cells, particularly eukaryotes, there are also other types of transport such as molecular motors (Vale, 2003). Nevertheless, mobility, irrespective of the type of motion, is inversely proportional to the viscosity of the surrounding environment. Since the viscosity is dependent on the scale (Holyst et al., 2009;Kalwarczyk et al., 2011;Szyman´ski et al., 2006a, b), each type of motion will depend exponentially [Equation (7)] on the size of a moving object.

Application of DC database in studies of biochemical processes occurring in cells
Using the database of DCs, one can determine quantitatively whether the protein diffuses freely or interacts and forms complexes with much larger macromolecules, e.g. plasmids. Capoulade et al. (2011) performed diffusion measurements and showed that, in the nucleus of eukaryotic cell, euchromatin creates domains of high and low affinity for heterochromatin protein (HP1). Another kind of analysis was performed by Elf et al. (2007). Authors compared in vivo DCs of both: the lactose repressor in its native form and the lactose repressor devoid of the DNA-binding domain. Order of magnitude difference in the coefficient of diffusion of both proteins led to the conclusion that the native lactose repressor spends 87% of the time attached to the DNA. This shows that the presence of attractive interactions between diffusing particles will result in a slowdown of diffusion of molecules.
To clarify the method, consider a hypothetical protein of hydrodynamic radii r p ¼ 3 nm. The DCs of this protein D cyto (calculated from sdVRC) is approximately equal to D cyto ¼ 8:7m 2 =s. The time required by the protein to visit every place in the cell volume [for E. coli V $ 0:6 m 3 (Kubitschek, 1990)] is approximately equal to t ¼ V=4D cyto r p % 1:8 s. Now suppose that the protein binds to a  Hydrodynamic radius calculated using Equation (5). plasmid whose molecular weight equals to 10 000 kDa, the DC of the plasmid is of the order of D plasm ¼ 10 À4 m 2 =s. Suppose further that the protein spends one-tenth of the time diffusing freely f , and the remaining 90% of time c as a complex with the plasmid ( c ¼ 10 f ). The effective DCs of the complexes D eff , defined as D eff ¼ D cyto þ D c c = f À Á = 1 þ c = f ð Þ , and under assumption that D c ¼ D plasm , will be nearly an order of magnitude lower than the predicted one (D cyto ):D eff ¼ 0:8m 2 =s. According to the above analysis, we can assume that any deviation of experimentally measured DC from the proposed sdVRC will result from intermolecular interactions such as specific or non-specific binding.

Diffusion in the cytoplasm and the diffusion in organelles of eukaryotes
Prokaryotic cells are characterized by small sizes [volume of E. coli is approximately V $ 0:6m 3 (Kubitschek, 1990)]. Measurements of diffusion in the cytoplasm of these cells are performed for the entire volume of the cytoplasm. Thereby, the effective DC measured in these experiments is the value averaged over the entire volume of the cytoplasm. Because the sdVRC was found on the basis of DCs, in the case of E. coli, this curve is also averaged over the entire volume of the cell. At this point, it should be stressed that the sdVRC should not be used to describe diffusion on the cell membrane due to structural differences between membrane and cytoplasm, and the two-dimensional nature of such diffusion. Small sizes of prokaryotic cell also affect the long-time behaviour of diffusing objects. This is known as confined diffusion (Ochab-Marcinek and Holyst, 2011). Nevertheless, from the normal, three-dimensional DCs (short time diffusion), one can draw constructive conclusions. For example, English et al. (2011) on the basis of short-time diffusion measurements have characterized the catalytic cycle of RelA protein.
Eukaryotic cells are much larger than bacteria. Therefore, measurements of diffusion in these cells are easier and can be performed in the individual organelles [e.g. nucleus (Pederson, 2000)]. In previous work, we showed that it is possible to construct a reference curve for the cytoplasm of mammalian HeLa and Swiss 3T3 cells (Kalwarczyk, et al., 2011). However, based on comparison of the results obtained by Lukacs et al. (2000) for the cytoplasm and the nucleus of HeLa cancer cell, we expect that the sdVRC determined for each cellular organelle is different. Furthermore, as sdVRC depends on the structure of the environment where diffusion occurs, it should be unique for a given cell or even organelle.

CONCLUSION
The method presented above has a high predictive power. Although, so far a large error of the method (40% for proteins), the experimentally measured DCs coincide remarkably well with the predicted DCs (cf. Fig. 4). Therefore, measurements of DCs of several inert probes in a single cell type allow to determine the DCs of thousands of proteins and other (macro)molecules. Correctly designed experiment would require involvement of different experimental techniques (NMR, FRAP, FCS, particle tracking) and the use of probes in a wide range of sizes.
One needs to know the DC of a given probe in water and/or the hydrodynamic radius of this probe. Additionally for the same probe, measurements of diffusion in cytoplasm of the cell should be performed. Sizes of selected probes should be uniformly distributed along the logarithmic scale of sizes. We showed that only 20 measurements are required to predict the cytoplasmic DC of the typical protein with 20% accuracy.
Analysis of the sdVRC allows to determine the characteristic length scales R h and , and the DC of any (macro)molecule in the cytoplasm. For the cytoplasm of E. coli, we found that R h is surprisingly well correlated with the average radius of the DNA loops forming the nucleoid. This suggests that the nucloeid is the main crowding agent (responsible for the macroscopic viscosity) in the cytoplasm of E. coli.
Finally, it should be noted that there are no additional requirements (except experimental data) to construct analogous database of DCs in other systems such as the nucleus or mitochondria of eukaryotic cells. We also believe that sdVRC can be easily adopted to calculate other types of mobility, including one-dimensional sliding, velocity of molecular motors, etc., as they all are inversely proportional to the viscosity.