In vivo single-molecule kinetics of activation and subsequent activity of the arabinose promoter

Using a single-RNA detection technique in live Escherichia coli cells, we measure, for each cell, the waiting time for the production of the first RNA under the control of PBAD promoter after induction by arabinose, and subsequent intervals between transcription events. We find that the kinetics of the arabinose intake system affect mean and diversity in RNA numbers, long after induction. We observed the same effect on Plac/ara-1 promoter, which is inducible by arabinose or by IPTG. Importantly, the distribution of waiting times of Plac/ara-1 is indistinguishable from that of PBAD, if and only if induced by arabinose alone. Finally, RNA production under the control of PBAD is found to be a sub-Poissonian process. We conclude that inducer-dependent waiting times affect mean and cell-to-cell diversity in RNA numbers long after induction, suggesting that intake mechanisms have non-negligible effects on the phenotypic diversity of cell populations in natural, fluctuating environments.


Construction of the pMK-BAC vector
To construct the pMK-BAC (P BAD -mRFP1-96 binding site (96 BS) array, the following plasmids were used: a plasmid with mRFP1 plus 96bs array region in the BAC vector, originally designed and generously provided by Prof. Ido Golding (P lac/ara-1 -mRFP1-96 bs) (32). To amplify the construct containing the AraC and pBAD promoter region from the pGLO vector (Biorad), a primer set was designed as follows: Ara_AatII-Fw-5´CCTAAGACGTCATCGATGCATAATGTGCC 3Á ra_AatII-Rv-5´CCTTGATGACGTCATGTATATCTCCTTCTTAAAGTTA3T he target BAD promoter region along with AraC coding region from the pGLO vector was amplified and inserted into the pIG-BAC vector by standard molecular biology techniques. The construct was verified by sequencing with the appropriate primers and transformed into the E. coli DH5 -PRO strain carrying the bacterial expression vector pPROTET.E (Clontech) coding for MS2d-GFP. For more details see Supplementary Figures 1 and 2.

Plate reader experiment
The mean uorescence of RFP under the control of P BAD was measured with a microplate fluorometer (Fluoroskan Ascent, Thermo Scienti c). 200 ml of cells at OD600 0.5 were induced with 0.1 % or 1 % L-arabinose and placed on 96 well microplate. From this, cells were measured for 2 hours for relative uorescence levels of mRFP1 protein (excitation and emission wavelengths were 584 nm and 607 nm, respectively). The cell density was kept identical in all wells of the plate for all conditions.

Quantitative PCR for mean mRNA quantification
The change in the rate of transcription of genes araB and mRFP was studied using qPCR. E. coli DH5 -PRO cells containing the constructs were grown as described in the section describing the microscopy measurements. Cells were grown overnight at 30°C with aeration, diluted into fresh medium and allowed to grow at the appropriate temperature of the experiment until an optical density of OD 600 0.3-0.5 was reached. For the experiment, 5 ml of cells were pre-incubated with 100 ng/ml of aTc to induce the expression of MS2d-GFP. 1 % L -arabinose was used for induction of the BAD promoter, 30 minutes after induction, the first sample was taken. From then onwards, samples were taken at an interval of 60 minutes. Rifampicin was added to the samples immediately, so as to prevent further transcription and the cells were fixed with RNA protect reagent immediately followed by enzymatic lysis using Tris-EDTA lysozyme buffer (pH 8.3). RNA was purified from each sample by RNeasy mini-kit (Qiagen). The total RNA was separated by electrophoresis through a 1 % agarose gel and stained with SYBR Safe DNA Gel Stain. The RNA was found intact with discreet bands for 16 S and 23 S ribosomal RNAs. To ensure purity of the RNA samples, they were subject to treatment with DNase free of RNase, to remove residual DNA. The yield of RNA obtained was 0.4 -0.6 mg/ml. Approximately 40 ng of RNA was used for cDNA synthesis using iSCRIPT reverse transcription super mix (Biorad) according to the manufacturer's instructions.
Quantification of cDNA was performed by real-time PCR using SYBR-green supermix with primers for the amplification of target and reference genes at a concentration of 200nM. Primers specific to AraB (Forward: 5' GGTACTTCCACCTGCGACAT 3', Reverse: 5' CAACCTGACCGCAAATACCT 3') and mRFP genes (Forward: 5' TACGAC GCCGAGGTCAAG 3' and Reverse: 5' TTGTGGGAGGTGATGTCCA 3') were designed using PRIMER3 (39), the length of the amplicon for the target and reference were maintained at 90bp. The sequence of the primers for the reference gene 16S rRNA (EcoCyc Accession Number: EG30090) (Forward: 5' CGTCAGCTCGTGTTGTGAA 3' and Reverse: 5' GGACCGCTGGCAACAAAG 3') and the primers were obtained from Thermo Scientific. The level of 16s rRNA was used to normalize the expression data of each target gene. 10 ng of cDNA was used as a template. The cycling protocol used was 94 °C for 15 s, 51 °C for 30 s, and 72°C for 30 s, up to 39 cycles. The amplification was monitored in real time by measuring the fluorescence intensities at the end of each cycle. The experiment was performed in triplicates along with the No-RT and no template controls. The volume used for each reaction was 25 µl in low-profile tube strips in a MiniOpticon Real time PCR system (Biorad). The Cq values were obtained from the CFX ManagerTM Software and the fold change of expression of the target gene was analysed by normalizing against the reference gene according to the Livak method (40). See Supplementary Figure  3 for the results.

Normalization between samples of the distributions of time intervals
The observation time for the production of RNAs is two hours. In some cells, the intervals between transcription events ( t) are of this order of magnitude. This causes shorter intervals to be 'favored'. This is more likely to occur in cells where the waiting time for the first RNA to be produced (t 0 ) is longer, since the remaining observation time is shorter. This introduces an artificial anti-correlation between t0 and t in individual cells. Similar correlations are introduced by different division times as well, i.e., shorter division times hamper the collection of longer t samples.
Thus, prior to determine if any real correlation exists between t 0 and t in individual cells, it is necessary to remove these artificial sources of anti-correlation due to the limits in the measurement period. For this, in all cells, all intervals between consecutive RNAs were collected only for a time window of size t c after the previous production. The value of t c is identical in all cells. This causes the probability of appearance of the next RNA molecule during that period to be uniform for all cells, if the underlying process is in fact identical in all cells.
This restriction in the collection of values of t is made when assessing correlations between t 0 and t and when comparing these two distributions between conditions. When imposing the restriction, we thus consider only cells that produce at least 2 RNA molecules during their life time and measurement period. The value of t c was selected so as to maximize the number of data points collectable from the data sets. Here, t c was set to 39 minutes (see Supplementary Figure S6).

Fitting the empirical distributions to a sum of d-exponential variates
The arabinose intake mechanism can be described by a single Michaelis-Menten function (41). Since the backward reaction of the intake process is slower than the forward reaction (12), the intake process is modeled, roughly, by a sequence of non-reversible reactions. Interestingly, we found from the measurements and the inference procedure, evidence of two steps at this stage (exponential in duration), which is in agreement with the number of forward steps assumed in other studies for this process (12). Finally, transcription initiation, which follows the intake process, can also be modeled by a 3-step exponential model according recent in vivo measurements (9, 10). Thus, we fit the measured distributions of t 0 to a 5-step exponential model.
To fit the empirical distribution with a sum of d-exponential variates (of possibly unequal rates), we select the exponential rate parameters ,…, such that the Kolmogorov-Smirnov (K-S) statistic is minimized. That is, parameters are selected as = arg max  The parameter values are found using a nonlinear numerical optimizer. This method is convenient, since if the K-S test was rejected for the parameters , such a test would also be rejected for any other set of parameters in this family of fitted distributions, indicating that these distributions are inappropriate models of the data. The results of the fitting are shown in Table S1.
As a final note, the model assumed above can be considered as the simplest possible, i.e., each step is an elementary reaction of the form c AB , with a constant probability of occurring per unit time.
This entails that the distributions of intervals between steps are exponential (42). Notably, the inferred distributions and the experimental data are statistically indistinguishable by the K-S test, which implies that there is no evidence to assume that the model is wrong (see Table S1).

CME solution
To estimate the effect of the intake on the cell-to-cell diversity in RNA numbers we made use of direct integration of the Chemical Master Equation (CME) of the model described in the previous section, using the Finite State Projection algorithm (43). This method truncates the infinite state space of the CME such that the amount of probability outside the truncated region is negligible. In all cases, we truncated the state space at 20 RNA molecules. This number sufficed for this space to contain virtually all of the total probability in the system. The probability mass vector at each time moment is then solved by numerically integrating the truncated CME. From this distribution over time, we calculate mean, variance, and Fano factor of RNA molecules of a model at each moment. Figure S1. Plasmids used for the pMK-BAC construction. The pMK-BAC(P BAD -mRFP1-96bs) plasmid was engineered by linking the amplified region, containing the P BAD promoter and the araC gene, obtained from pGLO, to the pIG-BAC expression vector, without the lac/ara-1 promoter, obtained from pIG-BAC(P lac/ara-1 -mRFP1-96 bs)-V.     Table S1. Results of the K-S fitting. Asymptotic p-values of the Kolmogorov-Smirnov goodness-of-fit test when fitting the empirical distribution with a sum of 5-exponential variates in the case of t 0 and of 3-exponential variates in the case of t. We compare these p-values with a standard value of 0.05.