Inferring transcriptional logic from multiple dynamic experiments

Abstract Motivation The availability of more data of dynamic gene expression under multiple experimental conditions provides new information that makes the key goal of identifying not only the transcriptional regulators of a gene but also the underlying logical structure attainable. Results We propose a novel method for inferring transcriptional regulation using a simple, yet biologically interpretable, model to find the logic by which a set of candidate genes and their associated transcription factors (TFs) regulate the transcriptional process of a gene of interest. Our dynamic model links the mRNA transcription rate of the target gene to the activation states of the TFs assuming that these interactions are consistent across multiple experiments and over time. A trans-dimensional Markov Chain Monte Carlo (MCMC) algorithm is used to efficiently sample the regulatory logic under different combinations of parents and rank the estimated models by their posterior probabilities. We demonstrate and compare our methodology with other methods using simulation examples and apply it to a study of transcriptional regulation of selected target genes of Arabidopsis Thaliana from microarray time series data obtained under multiple biotic stresses. We show that our method is able to detect complex regulatory interactions that are consistent under multiple experimental conditions. Availability and implementation Programs are written in MATLAB and Statistics Toolbox Release 2016b, The MathWorks, Inc., Natick, Massachusetts, United States and are available on GitHub https://github.com/giorgosminas/TRS and at http://www2.warwick.ac.uk/fac/sci/systemsbiology/research/software. Supplementary information Supplementary data are available at Bioinformatics online.


Alternative regulation models
In this section we discuss various alternative regulation models that are used to describe the transcription rates of target genes in the current literature. We first consider the models that describe the transcription rate, τ , of a target gene as a regression function of the regulators' profiles. The general form of those models is where β a parameter vector and p 1 (t), p 2 (t), . . . , p n (t) the regulators profiles. Two commonly used models of this type (see Madar et al. [2010]; Yip et al. [2010]; Huynh-Thu et al. [2010]; Wang et al. [2006]; Ou-Yang et al. [2017]) are the linear model, and the non-linear model .
Note that in both models the effects of the two regulators in the transcription rate are additive.
That is, the terms corresponding to each regulator are added to derive a weighted sum of their individual effects with weights controlling their effect to the value of the transcription rate.
Consider the example in Fig. I1 where two regulators form a repressed activation regulation network and suppose for simplicity that the regulator profiles are reduced to binary. Then the four states of the regulation model are given in Table 1.
p 1 p 2 τ 0 0 τ 0 0 1 τ 0 1 0 τ 0 + τ 1 1 1 τ 0 Table 1: Logic table of the repressed activation regulation model. Here p 1 is the profile of the activator and p 2 the profile of the repressor of this activation.
Here the transcription rates are such that τ 1 is much larger than the basal rate τ 0 > 0. We can easily see that the linear model is not able to describe the above regulation. For instance, if β 2 is taken to be −β 1 , then the model would fit for all states except p 0 = 0, p 1 = 1.
Note that the same principle applies to any regulation mechanism where interactions between the regulators are involved. For example AND activations as in Table 2 cannot be described by the linear model. It is easy to extend the case for continuous scale profiles.
p 1 p 2 τ 0 0 τ 0 0 1 τ 0 1 0 τ 0 1 1 τ 0 + τ 1 Similarly, AND repressions, XOR activations or repressions and other regulation mechanisms that involve interaction between multiple regulators cannot be described by the above linear model. It is important to stress here that the above limitation does not arise because of the linear relation of τ with p 1 and p 2 , but because the terms that describe the effect of each regulator to τ are added to obtain the transcription regulation model. The non-linear model described above cannot model the above interactive regulation mechanism. It can only model non-interactive non-linear regulations. For example, the non-linear model can describe regulation of a target gene by two regulators that independently activate the target (say from a different binding site) and they also appear to cooperate and produce some additional activation of the target when both are present.
It is necessary to add an interaction term to the model in order to describe interactive regulation using this type of regression model. That is, for the linear model with two regulators, we can use the following model of the transcription rate function τ (t) = β 0 + β 1 p 1 (t) + β 2 p 2 (t) + β 12 p 1 (t)p 2 (t).
However, the inclusion of the interaction term increases the number of parameters of the model and this can be prohibiting when the model involves a large number of regulators. In particular, the four parameters of the latter model would not be identifiable unless sufficient amount of data is observed from all four combinations of different activation states.
Another approach is to use a Boolean logic model that can describe interactive regulations.
However, the Boolean logic would not be able to distinguish between regulators with different strength of the interaction. The arbitrariness of the discretisation algorithm which is performed as a pre-processing step in methods of statistical inference based on Boolean logic models (see for example [Bornholdt, 2008;Han et al., 2014]) can also be problematic, particularly in cases where the switches of the target gene from 0 (OFF) to 1 (ON) states and vice versa are not clear in the continuous scale.
2 Approximation of TF protein profile from mRNA expression data In most experimental protocols that are currently used for inferring transcription regulation, the putative regulators are observed simultaneously with target genes only in terms of their mRNA expression level. The TF protein level is typically not observed simultaneously with the target mRNA expression. It is generally accepted that in eukaryotic organisms mRNA of regulating genes first translate to protein before regulating the target's transcription. As the TF protein level is typically unobserved, it has to be approximated using mRNA expression.
One standard model for protein translation is the following simple ODĖ that describes the translation of the mRNA expression M (t) at rate α to protein P (t) that also degrades with rate δ P .
More complicated models are also commonly used. In particular, a commonly used model for eukaryotic transcription takes into account that the translated protein is first located in the cytoplasm and then moves to the nucleus to initiate transcription. Export of the protein from the nucleus to the cytoplasm is also possible while the protein can degrade both in the cytoplasm and the nucleus. These reactions can be described by the following ODEṡ where P c(t) and P n(t) the cytoplasmic and nuclear concentrations, respectively, and k i and k e the import and export rates, respectively. This model is used for producing the simulated data in our examples (see Sect. 5).
On the other hand, a simple approach is to assume that the TF profile is simply a delayed version of the mRNA expression profile and take where h the delay time. This is an implicit assumption used in many reverse-engineering methods (e.g. Madar et al. [2010]; Yip et al. [2010]; Huynh-Thu et al. [2010]; Wang et al. [2006]; Ou-Yang et al. [2017]) that use the mRNA expression level of the regulating gene to describe the regulation of the target gene transcription rate.
In the examples used below, we consider both the ODE model in (1) and the delay model (3) with fixed parameter values for approximating the TF protein profiles P f (t) of the candidate regulators.
To derive the mRNA expression profile, M (t), of the regulating genes from the observed data, we use a smoothing spline kernel. More specifically, we use the MATLAB 2016b function To characterise the noise levels of the mRNA expression, we use the Wild bootstrap method Wu [1986]. In summary, a smoothing spline, s(t), is fitted to observations Y (t i ), i = 1, 2, . . . , n.
The residuals e i = Y (t i ) − s(t i ) are obtained and used to repeatedly generate normally distributed errors,ê i = e i Z, where Z ∼ N (0, 1). These are used to compute bootstrap sampleŝ Smoothing splinesŝ(t) are fitted to each of these bootstrap samples which gives a sample of smoothing splines. As we explain below, this bootstrap samples are used for constructing informative priors. As we discuss in I, the use of these priors is optional. We run the simulation studies and the real data examples with s = 0.1 because we wish to eliminate noisy local changes in the profiles of the candidate regulators. However, in Section S5, we provide the results of running the TRS algorithm with s = 0.5.

Prior distributions
We begin by describing the informative prior distributions for the set of regulators and their threshold levels briefly introduced in I.
As discussed in Sect. I2.2, these prior distributions are derived under the assumption that during an experiment candidates with more dynamic profiles are more likely to be regulators than candidates with less dynamic or "flat" profiles. Similarly, transcriptional switches are more likely to be caused by substantial changes in the regulator levels than by small noisy changes. The overall dynamics of any candidate regulator are measured by the range of its smoothed expression profile across experiments while the gradient (first derivative) is used to measure the temporal dynamics at any given time. We account for nonlinear effects such as saturation (i.e. effect of larger values saturates after some high level) by applying sigmoid functions, S 1 (·) and S 2 (·) to the range and gradient value, respectively. The noise levels of the observed regulators profile are taken into account by standardizing the value of the range and gradients with respect to their variance estimated by the Wild bootstrap method [Wu, 1986].
The prior probability, π(f ∈ Φ), of each of the candidate regulators, f ∈ F, to be a regulator is set proportional to the sigmoid function, S 1 (·), of the length of its standardized range (see section S3.1 for more details). To derive the prior distribution π(ρ φ ) of any given value of the threshold level, ρ φ , in the profile range, R φ , of the regulator φ, we first derive the intersections of the constant line ρ φ to the profile, P φ (t), across the multiple experiments. If a single intersection is derived, we compute the value of the gradient at this intersection and standardize it based on the bootstrap estimate of variance. If there are multiple intersections, the minimum of their standardized gradient values are derived to penalise threshold levels which give any crossings at non-dynamic levels. The prior distribution π(ρ φ ) is then set proportional to the sigmoid function S 2 (·) of the minimum value of the standardized gradients at the intersections produced by ρ φ unless a range with no intersections exists. Such ranges occur if multiple experiments are considered and the regulator consistently lies higher in one experiment and lower in another. Such differences may explain changes in transcriptional activity of the target gene, particularly, if the length of these "gap" ranges is substantial. In case such ranges occur in R φ of regulator φ, the prior distribution π(ρ φ ) is a continuous mixture distribution with uniform mixture component for the gap ranges and the above gradient-based distribution for the covered ranges. The mixture weight for each range is set proportional to the length of its range (see section S3.2 for more details).

Set of regulators
T ] be the (smoothed) profile of a candidate regulator f ∈ F in experiment k, k = 1, 2, . . . , K. We consider the length of the range of the regulator profile across experiments We use the Wild bootstrap method [Wu, 1986] to derive an estimate of its standard deviation σ L f . The standardised range is then where L is the median range of all candidate regulators. A sigmoid function S 1 (·) is applied tō L f . Here we use the cumulative distribution function of the Gaussian distribution with mean and variance respectively equal to the overall median and variance ofL f for all candidates f ∈ F. The variance is multiplied by a non-zero factor (possibly equal to one) to allow for increasing or decreasing non-linearities and saturation effects. The prior probability for each candidate, f ∈ F, to belong to the regulator set Φ is then These probabilities are used to derive the prior probability for each regulator set where P(Φ) the set of permutations of the elements of the regulators set Φ.

Threshold level of regulators
f (t), of its standard deviation. This gives the standardized gradient (accounting for noise in the observed profile) where g the overall median gradient value across F and over time.
Take a threshold level ρ f and let C ρ f be the set of crossings of the constant line at ρ f on the profiles P (k) f (t), k = 1, 2, . . . , K. This set is finite with probability 1 for noisy profiles P (k) f (t). If it is non-empty, take p(ρ f ) = S 2 (min c∈Cρ φḡ f (c)), where S 2 (·) the Gaussian cumulative distribution function with mean g and standard deviation the overall median of σ g (k) We also set a cut-off value,ḡ min , for the standardised gradient, so that if min c∈Cρ φḡ f (c) <ḡ min , then p(ρ f ) = 0. This allows for eliminating the possibility of setting thresholds at levels which the changes are too small for causing switches in the target's transcription. If no between-experiment gaps are observed in the profile range, f , of each of the gap and covered ranges, R (1) f and mixing components either a uniform distribution in R (i) φ if the latter is a gap range or the gradient-based distribution computed above, but restricted in R

The parametric weighted least squares method
To express heterogeneity across observations, we set the observation variance of experiment dependent function and ψ k ∈ [0, 1] an unknown parameter. In matrix form, the vector of time-series observations y = (y k (t i )) has covariance Σ which is a block-diagonal matrix with blocks Σ k = σ 2 k Q k and Q k a diagonal matrix with main diagonal entries equal to v k (t i ), i = 1, 2, . . . , n k . Let Q be the block diagonal matrix with blocks Q k .
Define the vector of parameters β = (M 1 (t 0 ), . . . , M K (t 0 ), τ α 1 , . . . , τ αq ) with M k (t 0 ) the initial values of experiment k and τ α j the transcription rates associated with regulator activation state α j , j = 1, . . . , q. Let X be the matrix of covariates of the linear model in (I2). Assume first that ψ k , k = 1, . . . , K, is fixed and thus Q is also a fixed matrix. Then, weighted least squares (wls) is performed as ordinary least squares (ols) but X and y are replaced by Q −1/2 X and Q −1/2 y, that isβ where y k = (y k (t 1 ), . . . , y k (t n k )) andM k = (M k (t 1 ), . . . , M k (t n k )) the observed and predicted (with parameter valuesβ) mRNA profile of the target gene in experiment k and q k the number of distinct transcription rates in experiment k, q k ≤ q. After computing the above, we can derive the full conditional posterior distributions σ 2 k |β, y which are scaled inverse-χ 2 distributions with n k,0 + n k degrees of freedom and scale parameters (n k,0 σ 2 This is used in step 5 of the MCMC algorithm. The full conditional posterior distribution of β is The estimateβ is used in step 2 of the MCMC algorithm, instead of the full distribution.
The posterior distribution of the parameter ψ k is Alternatively, as explained below, the parameters ψ k , k = 1, 2, . . . , K can be computed by least squares estimation within the MCMC sampler. Let e k (t) = y k (t) −M k (t) be the residuals at time t of experiment k. Based on our normality assumption, the residuals are and thus we can get ψ k from the linear regression of We would also like to note that the least square estimation of the parameters in β makes them deterministic functions of the rest of the TRS model parameters. Therefore, the term often referred as Jacobian that typically appears in the acceptance ratio of trans-dimensional moves is equal to 1 because the proposed parameters do not alter the value of the current parameter vector except from the parameters in β that, because they are deterministically derived from the rest of the current parameters, have a degenerate proposal distribution.

Simulation studies
We next provide details related to the simulation studies in Sect. I3.1 and I3.2. In both studies, the simulated data satisfy where Ω = 1000 and σ 1 = 20. The values of Ω and σ 1 are chosen so that the magnitude and than those of f 6 . More details for each simulation study are given below.

Repressed activation network
The logic of the interaction in this simulation study is that a TF f 1 activates the target T in the absence of a repressor TF, f 2 . When f 2 is present, the activation is repressed and prevented. The other candidate regulators, f 3 , f 4 , f 5 and f 6 are not involved in the interaction but they have profiles that, as we discussed in Sect. I3.1, can provide alternative options to the regulation set Φ = {f 1 , f 2 }. Figure 3 provides a logic diagram of this network. The profiles, h i (t), where i = 1, 2, . . . , 6 and T (target gene) using which we generate the artificial data Y i (t) as in (4) for each gene in the network are derived as solutions of the following ODE system.ḣ h 6 = τ 6 (t) − δ 6 h 6 (t), τ 6 (t) = τ 6 g 6 (t), As we explained in the previous section, the transcription regulation of the target gene by The values of the protein translation, degradation, import and export rates are α = log(2)/0.5, δ P = log(2)/2, k i = 2, k e = 1, respectively. The values of the mRNA degradation rates are δ 1 = δ 2 = δ 3 = δ T = log(2)/1.25, δ 4 = δ 6 = log(2)/2 and δ 5 = log(2)/0.5.
These give half-life between 0.5 − 2h. The different values are due to the various different profiles that we wish to give to the alternative candidate regulators f 3 , f 4 , f 5 and f 6 compared to genes that are part of the repressed activation network. More specifically, the regulator f 4 has slower degradation to provide a profile that remains constant at low or high levels in the  Figure 4: Hill-type functions of the form P n h /(P n h +k h ) for activation (blue) and k h /(P n h +k h ) for repression (red) as a function of the nuclear protein levels P n for parameters k = 5 and h = 10.
first and second experiment, respectively. The regulator f 5 has larger degradation to provide a profile similar to a reflection of regulator f 2 , which degrades quickly (f 2 transcripts quickly).
Finally, the regulator f 6 has slower degradation as this provides similar form with f 1 but with slower and smaller scale changes.
The transcription rate functions τ i (t), i = 1, 2, . . . , 6 are of the form the latter, f 6 , changes slower than f 1 . The functions g i (t) provide the form of the profiles. As previously discussed, we wish that the regulators f 1 and f 6 have the same profile and thus in the first experiment g 1 (t) = g 6 (t) = 1, t ∈ [0, 9) and g 1 (t) = g 6 (t) = 0, t ∈ [9, 10] and in the second experiment g 1 (t) = g 6 (t) = 1, t ∈ [0, 1) and g 1 (t) = g 6 (t) = 0, t ∈ [1, 10]. This provides the activation of the target gene in the first experiment, while in the second experiment the form of g 1 (t) implies that the regulator quickly starts to degrade causing a fall of the target transcription rate to basal levels at about the middle of the observed time interval. We also wish that the regulator f 3 has the same profile in both experiments and this is the same with the profile of f 2 in experiment 1. Thus, in both experiments g 3 (t) = 0, t ∈ [0, 3) and g 3 (t) = 1, t ∈ [3, 10]. The regulator f 2 has the same profile with f 3 in the first experiment and g 2 (t) = 1, t ∈ [0, 10] in the second experiment. This produce repressions of the target in both experiments with the repression occurring earlier in the second experiment. To produce a profile constantly at low levels in the first experiment for f 4 , g 4 (t) = 0.1, t ∈ [0, 10], while in the second experiment g 4 (t) = 1, t ∈ [0, 10] that produces a profile constantly at high levels (in combination with a high initial value, see below). For the fifth candidate, f 5 , which has a profile similar to a reflection of the profile of f 2 , the profile in the first experiment is g 5 (t) = 1, for t ∈ [0, 5) and g 5 (t) = 0, for t ∈ [5, 10]. In the second experiment the profile of f 5 is switched off for the whole observed time-interval, g 5 (t) = 0, t ∈ [0, 10].
A MATLAB script file that contains all the above parameter specifications is provided with the TRS software.

Prior distributions
The prior distribution used for the results reported in Sect. I3.1.
• for the threshold priors the cut-off valueḡ min is set equal to the median of the standardized gradient valuesḡ  Figure 6: Prior distributions for the number of regulators π(ν)(bottom left), the probability of each candidate, f , f = f 1 , f 2 , . . . , f 6 , T to be in the regulators' set π(f ∈ Φ) (bottom right) and for the threshold value of each regulator, π(ρ f ) (right panel, rows 1-7). The left and centre panel of rows 1-7 provide the smoothed regulators profiles, P (k) f (t), k = 1, 2 (solid line) and their 90% bootstrap confidence envelope (grey area) in the first and second experiment, respectively. All the plots in each of rows 1-7 correspond to the same regulator and they have the same y-axis. The hyperparameters for all the displayed prior distributions are provided in the section above. These priors are used for deriving the results of the simulation study discussed in Sect. I3.1. We also provide the running averages of various parameters, namely the degradation rate, the precision in each experiment and the threshold levels during the MCMC run as well as the running estimate of the probability of the number of selected TFs. This clearly show the quick convergence of these parameters during the MCMC run. π(ρ f |y) All plots in each of rows 1-3 have the same y-axis. The two a posteriori most likely models are marked with red and green color. The transcription profiles and fit to target data of these two models are respectively displayed in the left and centre panel of the two bottom rows.

Results for less informative prior distributions
In this section we report results of TRS algorithm for two different sets of prior distributions.
Both of these set of prior distributions use the same priors for the degradation rate and precision parameters as in Sect. S5.1.2. The first set of these prior distributions also has the same Poisson(λ = 0.15) prior for the number of regulators as in Sect. S5.1.2, but uses uniform distributions both for the choice of regulators (discrete uniform) and the threshold of the regulators (uniform in the covered range of each regulator). The second set of these prior distribution uses a larger λ = 1 parameter for the number of regulators with all other priors being the same as the first set. Fig. S10 present the Monte Carlo Markov Chains a resulted from a run of 1M iterations of the TRS algorithm using the first set of prior distributions. and larger posterior probability of selecting its alternative regulator f 6 . This is due to that the priors in Fig. S9 give larger preference to the more dynamic profile of f 1 . Figure 10: Monte Carlo Markov Chains of the parameters degradation rate (δ), precision (σ −2 k ) and threshold level (ρ f j , j = 1, 2, . . . , 15) of the TRS algorithm for the simulation example of repressed activation network with less informative priors for the regulator set and their thresholds as explained above.  π(ρ f |y) Figure 13: Posterior inference for the simulation study of the repressed activation network using the TRS model. Here less informative priors for the regulator set and their thresholds as explained above and Poisson(λ = 1) prior for the number of regulators. The setup of the figure is the same as Fig. S9.

Results for smaller values of smoothing parameter
Following the discussion for the stability of the TRS algorithm to the smoothing splines in Sect. S2, we present the results of running the TRS algorithm with a smaller value of the smoothing parameters. This implies a better fit to the observed mRNA expression levels of the candidate regulators but less smooth regulator profiles. The same set of hyperparameters for the informative prior distributions is used as in the results in Sect. I3.1. However, the change in the splines slightly affects the prior distributions for the choice of regulators and their thresholds (see Fig. S14). This is reflected with small changes in the results of the TRS algorithm (see Fig. S15 and S16). Note that the prior distributions are very similar to those in Fig. S6 and the results are very similar to those in Fig. S7 and S9. . . , f 6 , T to be in the regulators set π(f ∈ Φ) (bottom right) and for the threshold value of each regulator, π(ρ f ) (right panel, rows 1-7). The left and centre panel of rows 1-7 provide the smoothed regulators profiles, P (k) f (t), k = 1, 2 (solid line) and their 90% bootstrap confidence envelope (grey area) in the first and second experiment, respectively. All the plots in each of rows 1-7 correspond to the same regulator and they have the same y-axis. The hyperparameters for all the displayed prior distributions are the same as in section 5.1.2. The only difference is that the value of the smoothing parameter is smaller (0.5 as opposed to 0.9) which makes the spline estimates of the mRNA profiles of the candidate regulators less smooth but better fit to the mRNA expression data. π(ρ f |y)  Table 3 provides the output of the GRNInfer tool.  Table 3: The output of the GRNInfer tool for the data produced in this simulation study. The (i, j) entry of the table corresponds to the coefficient of the regulation of the i-th gene by the j-th gene where the names of the regulating and target genes are given in the first row and first column, respectively.

Flowering time network
Here we consider the gene regulation network related to the flowering time of A. thaliana Next, we provide an ODE system that represents this regulation. As in the previous simulation study, regulation of the target gene (either activation or repression) is modelled by Hill functions where the input is the nuclear concentration of the regulator protein associated with the regulating gene. The solutions, h i (t), i = 1, 2, . . . , 8, provide the profiles of the genes in the network. As we explained above, these profiles are then used to generate data that satisfy equation (S4).
As in the previous simulation study, the values of the protein translation, degradation, import and export rates are α = log(2)/0.5, δ P = log(2)/2, k i = 2, k e = 1 for all 8 genes. Here, the values of the transcription and mRNA degradation rates are also the same for all genes, The profiles g i (t), i = 1, 2, 6 and the initial conditions listed next are chosen to generate the various dynamic profiles in different hypothetical experiments that are presented in Fig. S18.

Prior distributions
The prior distribution used for the results reported in Sect. I3.2. Note that the same hyperparameters are used for the repressed activation simulation study.
• for the threshold priors the cut-off valueḡ min is set equal to the median of the standardized gradient valuesḡ π(ρ f ) Figure 19: Prior distributions for the simulation study of the flowering network of A. thaliana. The setup of the figure is the same as in Fig. S6 adapted to the larger number of candidate regulators and experimental conditions.  π(ρ f |y) Figure 21: Posterior inference for the simulation study of the repressed activation network using the TRS model. This is a summary figure for the results discussed in Sect. I3.2. The setup of the figure is the same as Fig. S9.

Prior distributions
The hyperparameters used in this example are the same as in the two simulation studies.
Target: ANAC092 π(ρ f ) time time time Figure 23: Prior densities for the thresholds of the first ten candidate regulators of ANAC092. The smoothed regulators profile (solid line) and Bootstrap 95% confidence envelopes (grey areas) of the candidate regulators are displayed in the first three columns, while the priors for their thresholds, π(ρ f ), are displayed are provided in the last column. π(ρ f ) time time time Figure 24: Prior densities for the thresholds of the last ten candidate regulators of ANAC092. The smoothed regulators profile (solid line) and Bootstrap 95% confidence envelopes (grey areas) of the candidate regulators are displayed in the first three columns, while the priors for their thresholds, π(ρ f ), are displayed are provided in the last column.
Target: SCL3 Similarly, with the application related to the regulation of the target gene ANAC092, the hyperparameters used in this example are also the same as in the two simulation studies.