Inferring delays in partially observed gene regulation processes

Abstract Motivation Cell function is regulated by gene regulatory networks (GRNs) defined by protein-mediated interaction between constituent genes. Despite advances in experimental techniques, we can still measure only a fraction of the processes that govern GRN dynamics. To infer the properties of GRNs using partial observation, unobserved sequential processes can be replaced with distributed time delays, yielding non-Markovian models. Inference methods based on the resulting model suffer from the curse of dimensionality. Results We develop a simulation-based Bayesian MCMC method employing an approximate likelihood for the efficient and accurate inference of GRN parameters when only some of their products are observed. We illustrate our approach using a two-step activation model: an activation signal leads to the accumulation of an unobserved regulatory protein, which triggers the expression of observed fluorescent proteins. With prior information about observed fluorescent protein synthesis, our method successfully infers the dynamics of the unobserved regulatory protein. We can estimate the delay and kinetic parameters characterizing target regulation including transcription, translation, and target searching of an unobserved protein from experimental measurements of the products of its target gene. Our method is scalable and can be used to analyze non-Markovian models with hidden components. Availability and implementation Our code is implemented in R and is freely available with a simple example data at https://github.com/Mathbiomed/SimMCMC.

Consider a biochemical reaction network with u species Z 1 , . . ., Z u and v reactions R 1 , . . ., R v where the reaction R k is represented as with the stoichiometric coefficients of species Z j , p kj and q kj .
As shown by Choi et al. (2020), the approximate likelihood function is given by where Here, fk is the reaction completion propensity of the k-th reaction, and z is the collection of vectors for the number of species, i.e., the protein counts, at the discrete measurement time points.The collection of parameters θ = {θ k } and ∆ = {∆ k } define the reaction initiation propensities and delay distributions, respectively, and r = (r 1 , . . ., r v ) is the collection of vectors for reaction counts completing within each of the time intervals, i.e., r k = (r k1 , . . ., r kT ), and r ki is the count of reaction of type k that completes within the time interval (i − 1, i]. The indicator function, χ(z|r), has value one if the trajectory of species matches the counts of completed reactions and zero otherwise.For example, for a simple birth-death process ∅ → Z → ∅ with the initial condition Z(0) = 0, let r 1 and r 2 be the number of birth (i.e., ∅ → Z) and death (i.e., Z → ∅) reactions, respectively, and suppose that we measured the time series of Z at time t = 0, 1, 2, and 3.If r 1 = (3, 2, 4) and r 2 = (0, 1, 2), then z = (0, 3, 4, 6).The vector z is unique for a given r, and thus χ(z|r) equals one only for this specific z, and equals zero for all other trajectories.

Derivation of the Metropolis-Hastings acceptance probability for reaction counts in the simulation-based MCMC method
In the conventional Metropolis-Hastings algorithm, the acceptance probability for a proposed state x * , ρ(x * , x), is given as follows: where g(x) is a target distribution that one intends to sample from and q(x * |x) is a proposal distribution for given x.
After canceling common terms, we obtain Eq. ( 10) in the main text: Explicit formula of the approximate likelihood for the two-step activation model In the following, we use the notation introduced in the main text.From Eq. ( 9) in the main text, we have L(r, θ, ∆|y obs , σ) = L(ȳ(r)|y obs , σ) × L (r, θ, ∆|x(r), ȳ(r)) . (S4) For the two-step activation model, the unobserved and observed species are denoted by X and Y, respectively, and the reaction counts are denoted by r = (r 1 , r 2 , r 3 , r 4 ) where r 1 , r 2 , r 3 , and r 4 are the vectors of counts of the production of X, the decay of X, the production of Y, and the decay of Y, respectively, completing within each of the time intervals.The vector of the kinetic parameters, θ, is given by (A X , K M , A Y , B), and the delay parameters ∆ is given by (α X , β X , α Y , β Y ).For notational simplicity, we let Xr (i), Ȳr (i), and Y obs (i) be the values of x(r), ȳ(r), and y obs evaluated at time i, respectively.Since the observational noise, ϵ(t), is i.i.d. and follows N (0, σ 2 ), the likelihood function for the observed processes, L(y obs |ȳ(r)) is given by T i=0 p(Y obs (i) − Ȳr (i); 0, σ 2 ) where p(•; µ, σ 2 ) is the probability density function of the normal distribution with the mean µ and the variance σ 2 .Then Eq. (S4) can be re-expressed as follows: where fk is the approximate completion propensity for the k-th reaction given as follows: Here, Ga(t; α X , β X ) is the cumulative density function of the gamma distribution with the shape parameter α X and the rate parameter β X , so A X i i−1 Ga(t; α X , β X )dt represents an activation rate reflecting the delay compared with the full activation, A X .Note that if there are multiple observed trajectories, we can regard them as independent, so the product of the likelihood function for each trajectory can be the likelihood for the given multiple trajectories.
To compute C(m) and D(m), let us recall the reaction completion propensity formula Eq. ( 6) for the production of Y: When we estimate the kinetic and delay parameters, we fix the parameter for the observational noise, σ.To obtain estimates from simulated data (Figs. 3 and 4), we set σ = σ e = 10.The generative model and the inference model are thus mismatched because the noise in the generative model, proportional to the state of Y (t), i.e., N (0, Y (t)), is not taken into account in the inference model.We made this choice intentionally to mimic the real experiment (Fig. 5) where it is hard to quantify the exact noise structure.Therefore, we fix σ for inference using experimental data as the basal noise, which is separately estimated using the observed time series before the introduction of the inducer, IPTG (i.e., before t=0).Specifically, we use standard deviation of time series before t = 0 as the estimated σ value.

Description of the simulation-based Bayesian MCMC method for a two-step activation model
We illustrate our simulation-based Bayesian MCMC method using a stochastic two-step activation model (Fig. 3a).This method is basically a Metropolis-Hastings (MH) within Gibbs sampling approach.However, the difference is that we use a stochastic simulation to generate proposal reaction counts r while we use direct sampling or the MH algorithm to sample the other parameters.
The posterior distribution of the parameters for measured trajectories y obs (t) at discrete time points, t = 0, . . ., T , is given by π where π(A X ), . . ., π(β Y ) are the prior distributions, and L is the approximate likelihood function.For all the parameter inferences, we used non-informative gamma priors, Γ(10 −6 , 10 −6 ).Because the likelihood functions for A X , A Y , and B are proportional to gamma distributions, we can directly sample them from each of the conditional posteriors given on the other parameters.Specifically, Xr (i − 1) + Xr (i) + Ȳr (i − 1) + Ȳr (i) 2 + 10 −6 .
(S6) For the rest parameters, K M , α X , β X , α Y , β Y , their conditional posterior distributions are not well-known distributions, so we use the Robust Adaptive Metropolis algorithm (Vihola, 2012) to sample from their conditional posterior distributions.For reaction counts r, we use a stochastic simulation, τ -leaping method (Gillespie, 2001).
In sum, our simulation-based Bayesian MCMC for the two-step activation model is described by the following steps.
Step 1. Initialize the kinetic and delay parameters, Y ,and β (0) Y ; and the reaction counts r (0) using the initial parameters and the stochastic simulation.
Step 2. Using the parameter set at the j-th iteration, we generate the proposal reaction counts r * , which automatically determines corresponding trajectories X * r and Ȳ * r .We then accept the proposal reaction counts based on the acceptance probability derived in the previous section (Eq.( S2)) Step 3. Draw posterior samples for A X , A Y , B from their conditional gamma posterior distributions given by Eq. (S6).For the rest parameters, K M , α X , β X , α Y , β Y , we use the Robust Adaptive Metropolis algorithm (Vihola, 2012) to draw from their conditional posterior samples.
Step 4. Repeat Steps 2-3 until a convergence criterion is met.
We typically used 110,000 iterations of Steps 2-4.The first 10,000 iterations were discarded as a burn-in period to avoid the initial condition dependency.The last 100,000 iterations were thinned by a factor of 100 to obtain independent samples from a posterior distribution.Thus, the remaining 1,000 samples were saved to estimate the parameters.Note that if there are multiple observed trajectories, we propose reaction counts for each trajectory and accept or reject them individually.
Explicit formula of the approximate likelihood for a more complex model (Fig. S5) From Eq. ( 9) in the main text, we have L(r, θ, ∆|y obs , σ) = L(ȳ(r)|y obs , σ) × L (r, θ, ∆|x(r), ȳ(r)) . (S7) Based on Eq. (S7), we derive the approximate likelihood for a model describing a network of three genes (Fig. S5).We assume that we can observe two types of proteins, Y 1 and Y 2 , and we cannot observe protein X.Each protein promotes another gene's expression.
Since the observational noise, ϵ(t), is i.i.d. and follows N (0, σ 2 ), the likelihood function for the observed processes, L(ȳ l (r)|y l,obs , σ) is given by T i=0 p(Y l,obs (i) − Ȳlr (i); 0, σ 2 ) for l = 1, 2 where p(•; µ, σ 2 ) is the probability density function of the normal distribution with the mean µ and the variance σ 2 .Then Eq. (S7) can be re-expressed as L(r, θ, ∆|y 1,obs , y 2,obs , σ) where fk is the approximate completion propensity for the k-th reaction given as follows: We can compute C k (m) and D k (m) for k = 1, 2, 3 using the same procedure as when computing C(m) and D(m) in Eq. (S5).These are given by Norm.Norm.
Norm.S5: Illustration of the scalability of our method using a regulatory network with three genes.(a) Diagram of the gene regulatory network.We assume that while product X of gene x is unobserved, the counts of proteins Y 1 and Y 2 are observed.(b) Ten trajectories of measurements, Y 1, obs (t) and Y 2, obs (t) with kinetic parameters A X = A Y1 = A Y2 = A Y2→X = 10min −1 , B = 0.05min −1 , K M1 = K M2 = K M3 = 100, and delay τ X ∼ Γ(18/5, 3/5), τ Y1 ∼ Γ(18/5, 3/5),τ Y2 ∼ Γ(18/5, 3/5), and observational noises ∼ N (0, Y l (t) + 10) for l = 1, 2. (c) Using these trajectories, we estimated kinetic parameters A X , A Y1 , A Y2 , A Y2→X , K M1 , K M2 , K M3 , and mean delay µ τ X by assuming that B, τ Y1 , and τ Y2 are known.The posterior distributions of A Y1 , A Y2 , K M1 , K M2 , and K M3 are biased, and those of A Y2→X and K M3 showed the high variances.(d) Such biases and variances are mainly due to the strong correlations between the pairs of parameters: (A Y1 , K M1 ), (A Y2 , K M2 ), and (A Y2→X , K M3 ).The strong correlations lead to identifiability issues between the parameters, which is also seen in the two-step activation model (Fig. 3).(e) Consequently, the estimate of the mean delay µ τ X was also biased.(f ) To resolve this identifiability issue, we assumed that K M1 , K M2 , and K M3 are known as done in the two-step activation model (Fig. 3).We then obtained accurate estimates of kinetic parameters, A X , A Y1 , A Y2 , A Y2→X .(g) Consequently, we also obtained an accurate estimate of the mean delay, µ τ X .
Fig.S3: Estimation results of parameters A X /K M , A Y , τ X , and B using the 40 trajectories in Fig.3bin the main text.The estimates became more accurate and precise when the time interval between measurements was shortened.
Fig.S5: Illustration of the scalability of our method using a regulatory network with three genes.(a) Diagram of the gene regulatory network.We assume that while product X of gene x is unobserved, the counts of proteins Y 1 and Y 2 are observed.(b) Ten trajectories of measurements, Y 1, obs (t) and Y 2, obs (t) with kinetic parametersA X = A Y1 = A Y2 = A Y2→X = 10min −1 , B = 0.05min −1 , K M1 = K M2 = K M3 =100, and delay τ X ∼ Γ(18/5, 3/5), τ Y1 ∼ Γ(18/5, 3/5),τ Y2 ∼ Γ(18/5, 3/5), and observational noises ∼ N (0, Y l (t) + 10) for l = 1, 2. (c) Using these trajectories, we estimated kinetic parameters A X , A Y1 , A Y2 , A Y2→X , K M1 , K M2 , K M3 , and mean delay µ τ X by assuming that B, τ Y1 , and τ Y2 are known.The posterior distributions of A Y1 , A Y2 , K M1 , K M2 , and K M3 are biased, and those of A Y2→X and K M3 showed the high variances.(d) Such biases and variances are mainly due to the strong correlations between the pairs of parameters: (A Y1 , K M1 ), (A Y2 , K M2 ), and (A Y2→X , K M3 ).The strong correlations lead to identifiability issues between the parameters, which is also seen in the two-step activation model (Fig.3).(e) Consequently, the estimate of the mean delay µ τ X was also biased.(f ) To resolve this identifiability issue, we assumed that K M1 , K M2 , and K M3 are known as done in the two-step activation model (Fig.3).We then obtained accurate estimates of kinetic parameters, A X , A Y1 , A Y2 , A Y2→X .(g) Consequently, we also obtained an accurate estimate of the mean delay, µ τ X .
dt where ga(s; α Y , β Y ) is a probability density function of the gamma distribution with the shape parameter α Y and the rate parameter β Y .By simplifying this equation, we obtain Eq. (S5) with C(m) and D(m) given by