Optimal design of stimulus experiments for robust discrimination of biochemical reaction networks

Motivation: Biochemical reaction networks in the form of coupled ordinary differential equations (ODEs) provide a powerful modeling tool for understanding the dynamics of biochemical processes. During the early phase of modeling, scientists have to deal with a large pool of competing nonlinear models. At this point, discrimination experiments can be designed and conducted to obtain optimal data for selecting the most plausible model. Since biological ODE models have widely distributed parameters due to, e.g. biologic variability or experimental variations, model responses become distributed. Therefore, a robust optimal experimental design (OED) for model discrimination can be used to discriminate models based on their response probability distribution functions (PDFs). Results: In this work, we present an optimal control-based methodology for designing optimal stimulus experiments aimed at robust model discrimination. For estimating the time-varying model response PDF, which results from the nonlinear propagation of the parameter PDF under the ODE dynamics, we suggest using the sigma-point approach. Using the model overlap (expected likelihood) as a robust discrimination criterion to measure dissimilarities between expected model response PDFs, we benchmark the proposed nonlinear design approach against linearization with respect to prediction accuracy and design quality for two nonlinear biological reaction networks. As shown, the sigma-point outperforms the linearization approach in the case of widely distributed parameter sets and/or existing multiple steady states. Since the sigma-point approach scales linearly with the number of model parameter, it can be applied to large systems for robust experimental planning. Availability: An implementation of the method in MATLAB/AMPL is available at http://www.uni-magdeburg.de/ivt/svt/person/rf/roed.html. Contact: flassig@mpi-magdeburg.mpg.de Supplementary information: Supplementary data are are available at Bioinformatics online.


INTRODUCTION
Mathematical models of complex biological processes provide the basis for systems understanding. They are of vital importance for generating predictions of systems behavior based on hypothesized mechanisms. In the case of limited experimental access to biological systems, models can help to find missing links and provide a tool to aggregate existing knowledge and data. In this work, we focus on mathematical models in the form of coupled ordinary differential equations (ODEs), which are being used to describe dynamics of biochemical reaction networks, e.g. signal transduction, metabolic or genetic regulation, on a deterministic, (semi-)mechanistic basis. Here, scientists are often facing limited or even contradicting knowledge about the underlying mechanisms, confined experimental possibilities, large biological variability as well as measurement noise. This leads to largely distributed model parameter sets, which in combination with several plausible alternative model structures renders model-based prediction highly uncertain. Therefore, model-based experimental design (e.g. optimal stimulus or additional experimental readout selection) is used to generate experiments that yield optimal experimental data to (i) reduce the spread in the model parameters (¼optimal parameter identification) and/or (ii) reduce the pool of plausible models (¼optimal model discrimination).
Much work on optimal experimental stimulus design (OESD) for biological systems focuses on information maximization with respect to parameter identification (Balsa-Canto et al., 2008;Bandara et al., 2009;Heine et al., 2008;Raue et al., 2010;Schenkendorf et al., 2009a). Here, for a given pool of plausible ODE models, OESD is used to find experimental conditions that reduce the parameter uncertainties and thus model response variabilities. The methods used to quantify parameter uncertainties and model response variabilities include linearization, sigma-points (Julier et al., 2000), profile likelihood (Raue et al., 2009) and Markov chain Monte Carlo (MCMC, Geyer, 1992;Vanlier et al., 2012). An OESD aiming at model discrimination for biological systems has been addressed by few authors (Apgar et al., 2008;Melykuti et al., 2010;Skanda and Lebiedz, 2010). Although measurement noise has been included, a rigorous consideration of model response variabilities due to distributed parameters has been missing so far. For chemical reaction kinetics and biotechnology, there exists some work on designing dynamic stimuli for the purpose of model discrimination (Asprey and Macchietto, 2002;Box and Hill, 1967;Kremling et al., 2004). As has been illustrated, the consideration of model response variabilities strongly improves the designed experiments and experimental data quality (Chen and Asprey, 2003;Donckels, 2012;Michalik et al., 2010). Therefore, linearization of the system's parameter mapping has been used. However, by using the so-called Sigma-Point method (Julier et al., 2000), (Heine et al., 2008), (Schenkendorf et al., 2009a) showed that the performance of a linear OED for best parameter estimation is rather poor for nonlinear systems having widely distributed parameters.
An experimental design aimed at model discrimination is typically generated at a point where existing data do not provide further discriminative information for a pool of competing, validated and identifiable models. With these models, robust experimental designs can be generated, which presumably yield data with optimal discrimination information. Since predictions with ODE models depend on the model parameters, identifiability is important as it ensures the existence of a unique solution to the parameter estimation problem and, consequently, unique model predictions (Raue et al., 2009). A non-identifiable model would yield non-unique model predictions under altered experimental design conditions, as there exists a set of several solutions to the parameter estimation problem. Robustness of the experimental design is achieved by considering (i) pure uncertainty about the model itself, (ii) distributed model predictions that arise from distributed model parameters and (iii) design variabilities (e.g. variations of the applied stimulus) during the conduction of the experiment.
In this work, part (ii) of design robustification for nonlinear models is considered, focusing on computational efficient and accurate estimation of the nonlinear propagated parameter probability distribution function (PDF) in an optimal control framework. We suggest using the sigma-point method as an alternative to the classical linearization approach. Therefore, both approaches are presented, applied and compared in the light of OESD for robust model discrimination, assuming perfect experimental conduction. In the following section, we describe the essential parts of our design approach, including (i) ODE modeling of biological reaction networks with and extended interpretation of distributed determinism, (ii) the model overlap as a PDFbased discrimination criterion, (iii) linearization and sigma-point methods and (iv) two numerical approaches to solve optimal control problems. Then, using two numerical examples, we formulate the optimization problem and benchmark the linearization approach against the sigma-point approach (Section 3).

Dynamic modeling of biochemical reaction systems
ODEs provide the modeling basis to describe the dynamics of biochemical reaction networks. The dynamics of the internal states xðt, uðtÞ, h x Þ 2 A x & R nx , e.g. protein concentrations, is determined by the solution of an initial value problem of the form d dt xðtÞ ¼ fðxðtÞ, uðtÞ, h x Þ ð 1Þ with initial system states xðt 0 Þ ¼ x 0 and right-hand side function fðxðtÞ, uðtÞ, h x Þ describing biologic interaction mechanisms, which depends on the system states x(t), (multiple) inputs u(t) (¼stimulus) and kinetic parameter set h x . Assuming f to be Lipschitz in x(t), u(t) and continuous in t, the readout variables are determined by where the function g-assumed to be sufficiently smooth-relates the internal system states to the readouts of the experiment with corresponding readout parameters h y , which together with dynamic parameters and initial conditions are merged into the model parameter vector h ¼ ½h x , h y T , with redefined dynamic parameter vector h x ½h x , x 0 T . (1) and (2) can be understood as a time-dependent mapping from the model parameter space

Distributed determinism
Although biological systems might follow deterministic rules, repeated measurements, even though with very accurate measurement techniques, will yield different results. The reasons for that are manyfold. Additionally to unavoidable measurement errors, biologic variability, i.e. systems with intrinsically distributed parameters, can induce a large spread in the transient dynamics and stationary behavior. In the case of the existence of multiple steady states, this spreading effect might even be more pronounced. Varying parameters during the measuring procedure and local parameter perturbations by non-stationary noise also contribute to a distributed measurement signal (Lorenz et al., 2007). Complex, nonlinear models of biological systems might also behave chaotic, further contributing to distributed response measurements. Thus, the conventional sharp, deterministic system representation needs to be extended by the notion of distributed determinism, i.e. although the system might completely be deterministic, its perceived signals are distributed realizations of the underlying deterministic mechanisms. This can be done by considering the parameters, and hence, the model responses as random variables Â and Y, respectively, each characterized by a PDF. We point out that, within this interpretation, the system and hence the model is assumed to naturally possess distributed parameters. Consequently, a distributed response is not solely explained by additive measurement noise but also by other sources of variations, which may be represented as distributed parameter sets. Let the model parameters be distributed according to some welldefined PDF Â ðhÞ, with h 2 A h being a realization of Â. The PDF of the random model response Y at time t can be derived from the normalized integral over all possible parameter and corresponding response realizations, weighted with the parameter PDF, i.e.
with % Y ðy, tÞ ¼ where 1 y ½hðt, hÞ represents the indicator function The normalization employs the L 1 -norm with respect to y 2 A y . Note that y represents an arbitrary possible realization of Y in A y , whereas hðt, hÞ describes the model response at time t for a fixed stimulus time course uðt 0 ! tÞ given parameter realization h. Consequently, for every single point in time, the shape of Equation (5) is determined by the parameter PDF, the choice of model [Equations (1) and (2)] and stimulus time course (¼experimental design). Consequently, the discrimination process can be robustified by discriminating competing models based on their model response PDFs [Equation (5)], accounting for variabilities in the parameters and model-specific mapping to the response space.

Robust design criteria for model discrimination
Here, we present the model overlap as a robust discrimination criterion, measuring dissimilarities of model response PDFs used to rate the discriminative power of a design during optimization. We define the general model overlap as the probability product kernel used in vector machine learning to measure statistical distances for the sake of discriminative learning (Jebara et al., 2004). Following Jebara et al., (2004), the probability product kernel of two multivariate PDFs p YjA , p YjB 2 L 2 ðA y Þ is defined as the scalar product with the densities being raised to some power p 2 R þ nf0g. It provides a bounded, positive-definite measure (Jebara et al., 2004) of similarity between distributions on the set A y , whereas the parameter p controls the weighting of regions with small versus large densities. From this, the average overlap of the time course is where D 2 D ¼ T Â U Â Y represents the experimental design within the feasible design region D, encompassing for instance selection of discrete measurement time points t k 2 T, stimulus design uðtÞ 2 U and readout design g 2 Y. Using the model response PDFs Yjm ðy, t, uðtÞÞ from Equation (5) for two competing models, m 2 fA, Bg in Equation (9) provides us with a measure of average model dissimilarities. For p ¼ 1, the overlap is the expected model response probability of model A under model B and vice versa [Equation (8)]. In this case, assuming one of the models to be true, the overlap yields the expected likelihood of the other model depending on the experimental design D. Consequently, an optimal model discrimination design D y minimizes Equation (9). Singh (1998) proposed to use Equation (8) with p ¼ 1/2-known as Bhattacharyya's affinity between distributions-for discriminating nonlinear regression models. It is closely related to Hellinger's distance, which represents a symmetrized approximation to the Kullback-Leibler divergence (Topsoe, 2000). In this work, we use the overlap as defined in Equation (9) with p ¼ 1, which we refer to as model overlap, as this directly represent the time-averaged, expected likelihood of one model under the other. In this form, it has been applied by Lorenz et al., (2007) to discriminate dynamic models based on the distributional fit performance. In Schenkendorf et al., (2009b), it is used to optimize an initial condition for a substrate uptake model to discriminate two competing kinetic approaches.

Estimation of nonlinear PDF mapping
If the solution hðt, hÞ can be obtained in closed form, it is straightforward to derive the model response PDF for a given parameter PDF using Equation (5). However, in most of the cases, the model response hðt, hÞ for a specific parameter realization is obtained by numerical integration.
Here, besides random sampling techniques based on Monte-Carlo simulations, the approximate model response PDF may also be obtained by deterministic sampling, e.g. by simple enumeration, optimized Latin hypercubes of the parameter space and application of Equation (5).
For an infinite number of samples, the true model response PDF can be constructed from these samples, which can be used for a subsequent evaluation of Equation (9) to judge the quality of a given design. Such procedures become computational inefficient for increasing number of model parameters and cannot be used in an optimization framework. Therefore, an approximation has to be made.
From initial data, one may obtain accurate estimates for the true parameter PDF, e.g. by MCMC sampling, which can be approximated by unimodal normal PDFs Â ðhÞ ' N ð Â , AE Â Þ, possibly multivariate. The model response PDF can also be approximated by Yjm ðy, t, uðtÞÞ ' N Yjm ðt, uðtÞÞ, AE Yjm ðt, uðtÞÞ À Á . Then, the integration in Equation (8) can be performed to yield the approximate overlap where Here, m ¼ {A, B} represents the model index, Yjm Yjm ðt, uðtÞÞ the n Ydimensional true mean and AE Yjm AE Yjm ðt, uðtÞÞ the true variancecovariance matrix of the corresponding model response PDFs, which both depend on the model structure m, measurement time point t, stimulus u(t) and readout selection. For details about the derivation, see for instance, Jebara et al. (2004). For n t discrete measurement time points t k , the approximate mean overlap is then This approximation dramatically reduces the computational costs as only the two first statistical moments (i.e. expectation and variancecovariance) need to be estimated. The task of solving Equation (5) to obtain model-response PDFs and subsequent integration of Equation (11) to evaluate the discriminative power of a given design in an optimization framework is then reduced to estimate the time course of mean vector Yjm ðt, uðtÞÞ and variance-covariance matrix AE Yjm ðt, uðtÞÞ of two model response PDFs for given parameter expectation Âjm and variance-covariance AE Âjm (Fig. 1). As true mean and variance-covariance of the parameters are unknown, these are replaced by their estimates, i.e. Â ! E½Â and AE Â ! C½Â, which we will do in the following. Note that for skewed PDFs one should apply a transformation, e.g. Box-Cox or inverse hyperbolic sine transformation to achieve normality of the PDF (Box, 1964;Johnson, 1949). In this way, our presented robust OESD method is not restricted to normal PDFs only.
Estimates of response expectation and variance-covariance can be obtained by linearizing the system at additional computational costs that scale linear with the number of parameters using forward-sensitivity analysis. But this approach can become suboptimal or yield even misleading designs (Section 3.2). On the additional expense of Oðn 2 h Þ estimates may be improved by a quadratic approximation of the system, which may become infeasible for larger systems, as do Monte Carlo-based approaches. Worst-case approaches yielding a minimax design have also been proposed to take model response variabilities into account (Walter and Pronzato, 1997;Skanda and Lebiedz, 2012). However, these are in general NP-hard (Du and Pardalos, 1995). The sigma-point method has an additional computational expense of Oðn h Þ, which compares to linearization.

Estimation based on linearization
The classical approach to estimate model response variabilities is linearization of the nonlinear model mapping hðt, hÞ with respect to the parameters. The linearization of the model response is given by applying the chain rule to Equation (2) with response sensitivity matrix Sðt, yÞ ¼ @hðt, hÞ @x S x ðt, xÞ þ @hðt, hÞ @h and statesensitivity matrix S x ðt, xÞ ¼ @x @h , which can be obtained by solving d dt S x ðt, xÞ with initial condition S x ðt 0 , x 0 Þ along the systems dynamics, which is known as the forward-sensitivity matrix equation. The additional computational effort is of order Oðn hx Þ, as only n hx additional ODEs have to be solved in Equation (13), since @x @hy ¼ 0 nxÂnh y . One may also formulate an adjoint system to derive the state sensitivities in a backward manner or use numerical differentiation.
Having determined the parameter sensitivities of the system, the linear estimates of expectation and variance-covariances of the model response PDF can be calculated to yield For nonlinear models, the estimate of the expectation is typically biased, i.e. Bi ¼ E L t ½Y À Y 6 ¼ 0, and errors are introduced at second and higher orders. The quality of the predicted variance-covariance cannot readily be judged as the errors are of fourth and higher order, whereas the contributions depend on the system. Notice that the linear design approach yields a local estimate in the parameter space, i.e. parameter-dependent coexisting stable states will be missed, resulting in significant estimation errors in both moments (Section 3.2). The estimators are exact for linear systems, as higher-order terms vanish. Julier et al. (2000) introduced the sigma-point method for advanced Kalman filtering and state estimation. It is based on the idea that with a fixed set of parameters (sigma-points), it is easier to approximate a nonlinearly transformed PDF by a Gaussian distribution than the nonlinear transformation itself. Julier et al. (2000) show that expectation and variance-covariance of a random variable Y, given by a transformation Y ¼ hðt, ÂÞ, possibly nonlinear, of a random variable Â with expectation E½h and variance-covariance C½h can be estimated according to the following procedure:

Estimation based on sigma-points
(1) Select 2n h þ 1 sigma-points in the original domain according to where ffiffiffiffiffiffiffiffiffiffi C½Â p ðiÞ is the ith column of the square root of the variance-covariance matrix.
(2) Propagate these points through the model Þ: (3) Estimated expectation and variance-covariance of the transformed variable based on the sigma-points are given by the linearly weighted sums with weights w ð0Þ ¼ nhþ , w ðAEiÞ ¼ 1 2ðnhþÞ and ¼ 2 ðn h þ Þ À n h : According to Julier et al. (2000), the errors of the expectation estimate is of fourth and higher order, whereas the variance-covariance estimates have an error of fourth and higher order. This, however, only holds for scalars, i.e. n h ¼ 1, as pointed out by Gustafsson and Hendeby (2008). For n h 41, the sigma-point parameters (, , ) can be used to tune the estimated moments by including a priori knowledge about the PDFs, i.e. and allow to account for higher-order moments of the parameter PDF and should be set to ¼ 2 for an initial Gaussian, whereas for n h 43 one should choose ¼ 0. Further, controls the sigma-point spread and should lie within 05 1 (Julier et al., 2000). The sigma-point has several advantages: no need to calculate derivative information (neither Jacobian nor Hessian have to be available or need to exist), which makes this method numerically robust and applicable to a wide range of system classes, use of curvature information of the system, deterministic sampling method with computational effort that scales linearly with the number of distributed variables, i.e. Oðn h Þ, since each sigma-point is independently propagated, parallelization can easily be applied to speed up estimate calculation of the transformed expectation and variance-covariance.

Robust optimal stimulus design
The problem of finding an optimal stimulus design can be stated as an optimal control problem. Given a nonlinear dynamic system of the form Equations (1) and (2) and corresponding parameter set (expectation and variance-covariance), an optimal stimulus is an admissible control defined over an interval ½t 0 , t f , say experimental time window, at which a cost function assumes its infimum (or supremum) with the set of all admissible controls. Robustness of such a control with respect to distributed model responses can be achieved by incorporating expectation and variance-covariance into a robust design criterion (e.g. model overlap). Within the sigma-point approach, variabilities in the stimulus conductions can also be accounted by interpreting a design u(t) as a time-dependent mean of a distributed variable U. Then, for a design u(t), the spread in the model response PDF is determined by the propagation of sigma-points given by mean and variance-covariances of (i) model parameters and (ii) stimulus. The problem of finding an optimal control may be solved by (i) Hamilton-Jacobi-Bellman, (ii) variational or (iii) NLPbased approaches (Nevistic nonlinear programming, 1997). We use the following two direct NLP-based approaches (Biegler, 2007), which can easily be combined with the methods discussed in Sections 2.4 and 2.4.1 for mapping distributed parameters onto the design criterion: Direct sequential approach: A control vector parameterization in combination with numerical integration of the model equations. This approach is suited for design problems without nonlinear Direct simultaneous: A full discretization of the problem, e.g. control vector and state/response vector parameterization, is based on orthogonal collocation on finite elements. If the design problem includes nonlinear path constraints, this solution approach can be beneficial, because feasibility of the solution is ensured at the collocation points of each finite element.
Both NLP approaches are typically non-convex, i.e. there exist several local and possibly one global optimal design solution. Therefore, resulting solutions to the NLP problem are local optima. Global optimality of the design can be achieved-but is not ensured-by (i) performing local optimizations from different initial starting points and/or (ii) deterministic/ stochastic/heuristic global optimizers (Floudas and Gounaris, 2009;Horst et al., 2000;Zabinsky, 2003). We point out that optimal design solutions need not necessarily be global in real-life applications. Local optimal solutions can be very close to the global solution with respect to the design criterion. Therefore, non-convexity allows to account for further experimental constraints-restricting the degrees of freedom in the design space-without losing, e.g. discriminative power. In the applications, we use global stochastic and multistart optimizations to avoid a biased comparison between linearization and sigma-point approach.

Application I: signaling cascade
The highly conserved mitogen-activated protein kinase signaling cascade (Pearson et al., 2001) with two different hypothesized negative feedbacks is used as a nonlinear test system for benchmarking the two design approaches with respect to estimation accuracy and design quality. Multistep signaling cascades are enrolled in many signal-transduction processes of cells to sense and react to external signals. Upon an external stimulus, e.g. growth factor, hormones or stress signals, cascades transduce the signal from the cell membrane to the nucleus to start different cell programs. The respective ODE systems-adapted from Behar et al. (2007)-of two model candidates m 2 fA, Bg that describe the change in protein concentration of the phosphorylated forms are and model B: no x 4B ðtÞ, x Ã 4B ðtÞ: For both models, we assume x Ã im ðt 0 Þ ¼ 0 with the total concentration of each species x tot im as an additional model parameter and i 2 f1, 2, 3, ð4Þ B g.
The measurement response signals are defined as where h represents additive measurement noise, which we assume to be normally distributed with zero mean and variance 2 . We assume that the response signals can be measured at n t specific time points. Based on an initial stimulus design, we adjust the model parameters so that both model responses match up to a small error, which does not allow to prefer one over the other model, to mimic the starting point of an OESD for model discrimination. Identifiability of the models has been checked with the software tool DAISY in combination with global sensitivity analysis (see Supplementary Material, Bellu et al., 2007;Sobol', 1993). Because biological systems often follow a log-normal distribution, we apply a log-normal transformation to the response to improve the normal approximation used in the estimation approaches for the response PDF (Sections 2.4 and 2.4.1). Therefore, we redefine the response signal Equation (18) used for the overlap calculation as with i ¼ 1, 2, 40. For each model, we assume that all dynamic parameters are log-normally distributed, with nominal value being the expectation E log ½Â¼h and diagonal covariance matrix ffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi C log ½Â p ¼ diagðE Log ½ÂÞ, with scaling parameter . The measurement noise is typically independent on the stimulus design and thus held constant at ¼ 0:01. The sigma-points for the log-normal parameter PDF are obtained in the following way: In the parameter space, we derive the normal equivalents of log-normal expectation and covariance to calculate the normal sigma-points, which we then exponentiate (see Supplementary Material). The log-normal sigma-points are propagated through the model, including Equation (19), to obtain the normal estimates via Equations (16) and (17). In the following, we drop the tilde from the redefined response signal in Equation (19).
Following the direct sequential approach, the stimulus (single input) is parameterized as uðU, tÞ ¼ u k for t k t t kþ1 , with ½U k ¼ ðu k , dt k Þ T , k 2 f0, . . . , n u g, whereas dt nu 0. Here, u k represents the amount of stimulus between the time point t k and t kþ1 ¼ dt k þ P k j¼0 t j . If for the last time point, we have t nu 5t f , we put uðU, t nu 5t t f Þ ¼ u nu . On the other hand, if t nu 4t f , the design is given a penalty. Depending on the estimation method E 2 fL, Sg, the resulting optimization problem for discriminating between models A and B is formulated as an NLP problem (see Supplementary Material for details), i.e.
subject to systems dynamic, additional constraints and method to estimate E E tAjB ½Y and C E tAjB ½Y. The number of optimization parameters is n utot ¼ 39, which allows 20 stimulations u k with 19 stimulus durations dt k . Because the problem is non-convex, we use a hybrid optimization strategy, consisting of the evolutionary-based CMA-ES algorithm (Covariance Matrix Adaptation Evolution Strategy Hansen and Ostermeier, 2001), in combination with a subsequent gradient-based optimizer. Because of the stochastic nature, the hybrid optimization is performed 40 times for each parameter variance level, which we derive from the scaling parameter . The benchmark is based on a Monte Carlo verification of the resulting optimal stimulus designs. For each optimal design, the overlap, including expectation and variance-covariance of the model responses, is calculated based on sampling the parameter space 10 4 times for each model and corresponding optimal stimulus design (see Supplementary Material for details). The relative mean-squared error (MSE) of the moment estimates are given by with M E being the moment estimates of the best designs (expectation E E tAjB ½Y i , variance-covariance split into variance VAR E tAjB ½Y i and covariance terms COV E tAjB ½Y i ). The Monte Carlo reference is represented by M M .
In Table 1, we see that for all parameter variance levels, both methods have negligible relative MSE in the mean response estimates (maximal MSE: he E i L 510 À7 ; he E i S 510 À9 ). In contrast, the relative MSEs for linearization increases with parameter variance levels up to 0.03 for the variance and 0.18 for the covariance estimates. Here, the sigma-point approach performs better with maximal relative MSE of the variance 0.007 and covariance 0.096. In this application, both approaches estimate mean responses of the models very well, although the maximal MSE of the sigma-points is still two orders of magnitudes smaller than the maximal MSE for linearization. For the (co)variance estimates, the sigma-point approach consistently outperforms linearization approach for increasing parameter variance level.
In the lower part of . However, for widely distributed parameters (starting at ¼ 0:3) sigma-point based designs perform up to 1.3 times better than linearization-based designs and their estimates coincide with the MC validation, which is not the case for linearization. For both methods, optimization time for one design is 1:3 AE 0:1h on a standard desktop computer (4 GB RAM, 3 GHz quad core processor) and mainly determined by the optimizer itself, whereas the validation time (10 4 MC samples) of a single optimal design is 0:4 AE 0:1h.
In Figure 2, we show the best designs based on the estimates for two levels of parameter variances and the first response component. For small parameter variances, both designs yield the same discriminative power. For widely distributed parameters, we see that both methods tend to minimize the amount of stimulation, as this results into little response variances at maximal distances between expected model responses. Although the linear design is characterized by a strong stimulation right at the beginning, with a subsequent plateau of little stimulation, the sigma-point design starts with an even stronger initial pulse, followed by two small pulses.

Application II: Schlo¨gl model
In this section, we compare the performance of the nonlinear design based on the sigma-point approach to the linear design in the presence of multiple steady states. The Schlo¨gl model is a canonical example of a biochemical reaction system exhibiting bistability (Schlo¨gl, 1972). It describes an autocatalytic, Table 1. Relative MSEs of moment estimates and overlap (scaled 10 5 ) of the best designs based on linearization/sigma-point estimation and corresponding Monte Carlo verification where we assume the four kinetic parameters k 1 À k 4 as well as the system parameters a and b to be distributed Â / N ðE½Â, diag(E[Â]) 2 ), the initial condition as X 0 / N ðE½X 0 , ðE½X 0 Þ 2 Þ. The mean parameter values are taken from Vellela and Qian (2009), which also discuss the consequences of bi-(multi)stability for biological modeling in the light of non-equilibrium thermoand stochastic dynamics. The model alternatives simply differ in the input layer s m ðuðtÞÞ. Parameters a and b represent the concentration of two reaction partners a and b of species x, which both are in constant exchange with a material reservoir. For an initial, suboptimal experiment with stimulus uðtÞ ¼ 1, models A and B cannot be distinguished, given y m ðtÞ ¼ x m ðtÞ þ h to be the response signal, where h represents additive measurement noise with zero mean and 2 . The stimulus is thought to control the concentration in the reservoir of species a to find an optimal discriminative stimulus, whereas subsequent stimulations can only be applied after a minimal time period has passed to account for possible control limitations (see Supplementary Material for details). Such nonlinear constraint optimization can efficiently be solved within the direct simultaneous approach, which we apply using orthogonal collocation on 100 finite elements (each with three collocation points) to discretize control and system states. The objective of the resulting non-convex NLP problem is the same as in Equation (20), however, subjected to different constraints, i.e. system dynamics in form of a nonlinear algebraic equation system and additional constraints (details see Supplementary Material). For the linear design strategy, we implement the sensitivity Equation (13) and corresponding constraints. For the sigma-point design, the constraints have to simultaneously hold for all ð2n h þ 1Þ sigma-points. The solver AMPL in combination with the optimizer CONOPT is used to solve the aforementioned NLP problem (Drud, 1994). For a given optimization setup ( and estimation method), the solution takes about 2 min on a standard desktop computer. Because CONOPT yields local solutions, the optimization is performed for 1000 different randomized initial designs for a given optimization setup, from which the best solution is selected.
In Figure 3, As can be seen in Figure 3 (e,f), the non-local propagation property of the sigma-points enables the optimizer to find a stimulus that mostly drives model B to the upper steady state.

CONCLUSION
Biological variability in combination with experimental measurement noise results into widely distributed response signals, which is one of the main challenges when modeling biological system deterministically with nonlinear ODEs. The parameter set needs to be extended to a parameter distribution. In this way, natural variability in the dynamic parameters as well as measurement noise can be readily accounted for. However, an exact quantification is computationally expensive and infeasible in an optimization framework for large systems. Therefore, approximate descriptions of the PDFs and nonlinear mapping process between parameter and model response space have to be used. Here, we have presented a nonlinear design approach based on the sigma-points within the application of model-based OESD aimed at model discrimination. We illustrate the application and performance in combination with two numerical approaches from optimal control for two nonlinear model examples. Using the overlap as a robust design criterion based on the model response PDFs, we show that in the case of nonlinear models with widely distributed parameter PDFs, the sigma-point predictions and designs consistently outperform a linear design approach. In the case of bi-(multi)stability, we further illustrate the benefit of the nonlocal propagation property. Finally, the sigma-points come with several numerical advantages, including linear scaling of the numerical costs with respect to distributed parameters and derivative free estimation of nonlinearly mapped expectation and variance-covariance. The latter property allows applying a robust OESD to dynamic models that have non-smooth right-hand side functions, e.g. cybernetic models of cellular metabolism (Ramkrishna, 1982).