- Split View
-
Views
-
Cite
Cite
Helge Hass, Carolin Loos, Elba Raimúndez-Álvarez, Jens Timmer, Jan Hasenauer, Clemens Kreutz, Benchmark problems for dynamic modeling of intracellular processes, Bioinformatics, Volume 35, Issue 17, September 2019, Pages 3073–3082, https://doi.org/10.1093/bioinformatics/btz020
- Share Icon Share
Abstract
Dynamic models are used in systems biology to study and understand cellular processes like gene regulation or signal transduction. Frequently, ordinary differential equation (ODE) models are used to model the time and dose dependency of the abundances of molecular compounds as well as interactions and translocations. A multitude of computational approaches, e.g. for parameter estimation or uncertainty analysis have been developed within recent years. However, many of these approaches lack proper testing in application settings because a comprehensive set of benchmark problems is yet missing.
We present a collection of 20 benchmark problems in order to evaluate new and existing methodologies, where an ODE model with corresponding experimental data is referred to as problem. In addition to the equations of the dynamical system, the benchmark collection provides observation functions as well as assumptions about measurement noise distributions and parameters. The presented benchmark models comprise problems of different size, complexity and numerical demands. Important characteristics of the models and methodological requirements are summarized, estimated parameters are provided, and some example studies were performed for illustrating the capabilities of the presented benchmark collection.
The models are provided in several standardized formats, including an easy-to-use human readable form and machine-readable SBML files. The data is provided as Excel sheets. All files are available at https://github.com/Benchmarking-Initiative/Benchmark-Models, including step-by-step explanations and MATLAB code to process and simulate the models.
Supplementary data are available at Bioinformatics online.
1 Introduction
Dynamic models based on ordinary differential equations (ODEs) have become a widely used tool in systems biology to quantitatively describe regulatory processes in living cells. Within this approach, known biochemical interactions of important compounds can be translated into rate equations describing the temporal evolution of the state of biological processes. Experimental data is then used to estimate parameters like rate constants or initial concentrations and to validate or improve the model structure.
The dimensionality and nonlinearity of these models constitute a challenge for numerical and statistical methods regarding parameter estimation and identification of the most plausible model structure. For that reason, a multitude of new modeling techniques have been developed within recent years. However, they are often not well-tested in realistic application settings and, therefore, their performance benefits or limitations are unknown (Degasperi et al., 2017; Hug et al., 2013; Lillacci and Khammash, 2010; Maier et al., 2016; Raue et al., 2013; Stapor et al., 2018; Vyshemirsky and Girolami, 2008). Since the performance of computational approaches depends on model characteristics such as nonlinearity, number of parameters or amount of experimental data, it is essential to have a reasonably large set of benchmark problems that covers these characteristics. Only if the collection of benchmark problems is representative, the results of performance studies will generalize to new modeling projects.
One frequent limitation is that realistic measurements are typically not available for evaluations. Simulated data, as an example, is often much more informative in terms of number of data points (Tönsing et al., 2014) and does not have a complex noise structure (Villaverde et al., 2015) like measurements from living cells. Moreover, in most cases experimental measurements require augmenting the equations of the dynamic model with so-called observation functions containing scalings- and/or offset parameters, together with transformations of the data such as a log-transformation.
Benchmark collections are used in many scientific fields, but there are currently only a limited number of benchmark problems for modeling intracellular processes and they cover only a small set of application setups: (i) Six benchmark models have been published by Villaverde et al. (2015), however for most of them, only simulated data are provided. For the models with experimental data, one has less data points than parameters, and the other provides its equations only in a compiled version, which limits their use for model evaluation. (ii) Additional benchmark problems were defined within the DREAM6 (Dialogue on Reverse-Engineering Assessment and Methods) and DREAM7 challenges. However, both challenges only had simulated data available because the models do not represent real biological networks occurring in specific living cells. In addition, abundances of the molecular compounds were assumed as known initial values and the dynamic variables were assumed as directly measured without observation functions which renders these problems as rather unrealistic. (iii) Public repositories, e.g. the Biomodels database (Le Novere et al., 2006) provide a large number of realistic/published models. Unfortunately, for most models the measured data used for calibration is not or only partly provided. Moreover, if the data is published, the description of the link between model and data is often not sufficient, i.e. the noise model and observation functions are not comprehensively defined as required for a non-ambiguous benchmark problem. One major reason for this might be that current standards for defining models like the Systems Biology Markup Language (SBML) (Hucka et al., 2003) only comprise the biological part of the model but do not contain equations for observations and noise models used to estimate parameters. Standards for the encoding of experimental descriptions, such as the Simulation Experiment Description Markup Language (SED-ML) (Waltemath et al., 2011), are unfortunately not yet used widely and only supported by a fraction of the available tools.
In this manuscript, 20 models of biochemical reaction networks which should serve as a comprehensive set of benchmark problems enabling testing of a multitude of data-based modeling approaches are presented. The models have different complexity ranging from 9 to 269 parameters. All models comprise measured data (21 to 27 132 data points per model). We also provide measurement errors either determined experimentally or from an underlying error model.
2 Methodology
2.1 Pathway models
The initial values x(0) of Eq. (1) might be known. However, in most applications some elements of x(0) are unknown and defined as parameters, i.e. , or functions of parameters, i.e. . Mathematically, we distinguish between three classes:
The initial conditions might be known or given, e.g. zero before treatment.
The initial conditions might be analytical functions of the parameters, e.g. analytical solutions to a steady-state constraint (Rosenblatt et al., 2016).
The initial conditions might be non-analytical expressions of the parameters, e.g. the result of a pre-simulation of an experimental condition (Fiedler et al., 2016; Rosenblatt et al., 2016).
For a detailed discussion we refer to Rosenblatt et al. (2016) and Fiedler et al. (2016).
2.2 Measurement errors
Note that in the chosen notation, index i enumerates each observation/data point yi at a specific time point and each corresponding standard deviation σi of the measurement error individually.
We consider two broad classes of error models:
The standard deviation σi of measurement errors might be determined as part of the experiment and processing of raw data, e.g. by computing standard errors across replicates. In this case, each data point yi has a given, fixed value σi specifying the accuracy of the measurement.
Standard deviations might be unknown and therefore described as error models with error parameters which might be jointly estimated with other model parameters. The function can depend on parameters, state variables or both.
While class 1 yields a parameters estimation problem with fewer parameters, class 2 does not require the calculation of σi from a potentially small number of replicates and the statistical model accounts for imperfect knowledge of σi (Raue et al., 2013).
2.3 Parameters
Dynamic models in systems biology comprise up to three classes of parameters:
Dynamic parameters that determine the initial states x(0) and the dynamics of the process, see Eq. (). These parameters are rate constants such as association/dissociation rates or -constants, translocation rates between intra- or extracellular compartments, or parameters like Michaelis-Menten- and Hill-coefficients, efficiencies of genetic perturbations or parameters of input functions. We note that the dynamic parameters do not change over time, although the name might suggest otherwise.
Observation parameters that describe the relationship between concentrations of intracellular compounds with outputs, e.g. intensities in an assay. These parameters are for example scaling factors or offsets (Weber et al., 2011).
Error parameters that describe the unknown noise levels (see Section 2.2).
Since dynamic parameters depend on the biological context and observation- and error parameters are determined by the experimental setup, there is often only a limited amount of prior knowledge about parameters available. For the benchmark models, upper- and lower bounds are defined for all parameters. In most cases, these bounds cover eight orders of magnitude or even more. In some cases, additional prior knowledge in terms of prior distributions or penalties is available for specific parameters.
The parameters of the biological process are often transformed to improve the convergence of optimization (Raue et al., 2013) and to eliminate structural non-identifiabilities (Maiwald et al., 2016). A common practice is the transformation of the parameters from linear to logarithmic scale. However, there are also problem-specific transformations as described in the supplements of Bachmann et al. (2011) and Becker et al. (2010).
2.4 Inputs
Inputs u describe the dependence of the biochemical reaction network on external factors as well as perturbations. Examples are externally controlled concentration of ligands or nutrients, or genetic perturbations like knockouts or overexpression. Time-dependent inputs are often parameterized functions such as polynomials, splines (Schelker et al., 2012) or control vectors (Banga et al., 2005). Parameter dependence of inputs is in the following indicated by writing .
3 Model and data formats
For a thorough evaluation of computational methods, we provide a set of 20 published models and their corresponding datasets. The models have been extracted from the literature and have been developed by more than 10 different research groups. The information is stored in an easily accessible and standardized format, including an Excel file with general specifications of the model and its fit results. Measurements and model equations are stored as separate Excel files and for each experiment individually. In the data files, measurements with uncertainties and results from the corresponding model simulations are stored. The model files contain finalized ODEs including experiment-specific parameter assignments and observation functions, and are provided as user-readable Excel file and in the standardized, machine-readable SBML standard (Hucka et al., 2003). For a detailed description of the provided files, we refer to Supplementary Section S1.
4 Results
4.1 Benchmark collection
The main focus of this paper is to introduce a comprehensive collection of benchmark problems and their formulation in a standardized format. A comprehensive overview of the benchmark problems is provided in Table 1.
Name . | Description . | Biochemical species . | Observables . | Data points . | Experimental conditions . | Parameters . | Features . |
---|---|---|---|---|---|---|---|
Bachmann | The model by Bachmann et al. (2011) describes JAK2/STAT5 regulation via two transcriptional negative feedbacks, CIS and SOCS3 in murin blood forming cells | 25 | 11 | 542 | 23 | 113 | C, , NI, |
Becker | The model by Becker et al. (2010) shows that rapid EpoR turnover and large intracellular receptor pools enables linear ligand response. | 6 | 4 | 85 | 13 | 16 | |
Beer | The model by Beer et al. (2014) uses Escherichia coli as chassis to demonstrate heterologous T domain exchange in non-ribosomal peptide synthetases (NRPSs). | 4 | 2 | 27 132 | 19 | 72 | , ev, NI, |
Boehm | The model by Boehm et al. (2014) evaluates possible homo- and heterodimerization of the transcription factor isoforms STAT5A and STAT5B. | 8 | 3 | 48 | 1 | 9 | C, |
Brannmark | The model by Brännmark et al. (2010) describes insulin signaling in adipocytes. | 9 | 2 | 43 | 8 | 22 | , ev, NI, u(t), |
Bruno | The model by Bruno et al. (2016) investigates the activity of Arabidopsis carotenoid cleavage dioxygenase 4 (AtCCD4) as regulator of carotenoid of seeds. | 7 | 6 | 77 | 6 | 13 | Ex, |
Chen | The model by Chen et al. (2009) describes signaling in ErbB-activated MAPK and PI3k/Akt pathways, including seven receptor dimers and two ErbB ligands. | 500 | 3 | 105 | 4 | 154 | , ev, NI, |
Crauste | The model by Crauste et al. (2017) describes CD8 T cell differentiation after virus infection. | 5 | 4 | 21 | 1 | 12 | Ex, NI, |
Fiedler | The model by Fiedler et al. (2016) describes Raf/MEK/ERK signaling in synchronized HeLa cells upon stimulation with MEK and ERK inhibitors. | 6 | 2 | 72 | 3 | 19 | , NI, |
Fujita | The model by Fujita et al. (2010) describes the epidermal growth factor (EGF)-dependent Akt pathway in PC12 cells. | 9 | 3 | 144 | 6 | 19 | Ex, ev, NI, |
Hass | The model by Hass et al. (2017) establishes early Reelin-induced signaling and identifies Src family kinases (SFKs) as crucial part for Dab1 signaling. | 9 | 6 | 221 | 17 | 49 | Ex, ev, |
Isensee | The model by Isensee et al. (2018) describes the Protein Kinase A (PKA)-II cycle in primary sensory neurons and its response to multiple stimuli, e.g. forskolin and cAMP analogues and is based on quantitative microscopy and Western blotting. | 25 | 3 | 713 | 109 | 46 | C, , ev, NI, |
Lucarelli | The model by Lucarelli et al. (2018) describes activation of Smad proteins upon TGFβ stimulation, identifies the relevant complexes and linked them to target genes. | 33 | 43 | 1755 | 12 | 84 | , Ex, NI, |
Merkle | The model by Merkle et al. (2016) describes Epo-induced signaling simultaneously for CFU-E and H838 cells, with a parsimonious set of differing parameters. | 23 | 22 | 1141 | 62 | 197 | C, , Ex, ev, NI, u(t), |
Raia | The model by Raia et al. (2011) describes interleukin-13 (IL13)-induced activation of the JAK/STAT signaling pathway for B-cells and two lymphoma cell lines. | 14 | 8 | 205 | 4 | 39 | C, , NI, |
Schwen | The model by Schwen et al. (2015) describes binding of insulin to primary mouse hepatocytes based on flow cytometry and ELISA data. | 11 | 4 | 292 | 7 | 30 | , NI, |
Sobotta | The model by Sobotta et al. (2017) presents IL-6-induced JAK1-STAT3 signal transduction and expression of target genes in hepatocytes. | 13 | 11 | 2220 | 110 | 260 | C, , ev, |
Swameye | The model by Swameye et al. (2003) demonstrates that rapid shuttling of STAT5 from the nucleus back to the cytoplasm following Epo stimulus is recognized as a remote sensor. | 9 | 3 | 46 | 1 | 13 | C, Ex, NI, |
Weber | The model by Weber et al. (2015) describes the interactions of PKD, PI4KIIIβ and CERT at the trans-Golgi network of mammalian cells. | 7 | 8 | 135 | 3 | 36 | , ev, NI, u(t), |
Zheng | The model is adapted from Zheng et al. (2012) and describes methylation at histone H3 lysines 27 and 36. | 15 | 15 | 60 | 1 | 46 | , ev, NI, |
Name . | Description . | Biochemical species . | Observables . | Data points . | Experimental conditions . | Parameters . | Features . |
---|---|---|---|---|---|---|---|
Bachmann | The model by Bachmann et al. (2011) describes JAK2/STAT5 regulation via two transcriptional negative feedbacks, CIS and SOCS3 in murin blood forming cells | 25 | 11 | 542 | 23 | 113 | C, , NI, |
Becker | The model by Becker et al. (2010) shows that rapid EpoR turnover and large intracellular receptor pools enables linear ligand response. | 6 | 4 | 85 | 13 | 16 | |
Beer | The model by Beer et al. (2014) uses Escherichia coli as chassis to demonstrate heterologous T domain exchange in non-ribosomal peptide synthetases (NRPSs). | 4 | 2 | 27 132 | 19 | 72 | , ev, NI, |
Boehm | The model by Boehm et al. (2014) evaluates possible homo- and heterodimerization of the transcription factor isoforms STAT5A and STAT5B. | 8 | 3 | 48 | 1 | 9 | C, |
Brannmark | The model by Brännmark et al. (2010) describes insulin signaling in adipocytes. | 9 | 2 | 43 | 8 | 22 | , ev, NI, u(t), |
Bruno | The model by Bruno et al. (2016) investigates the activity of Arabidopsis carotenoid cleavage dioxygenase 4 (AtCCD4) as regulator of carotenoid of seeds. | 7 | 6 | 77 | 6 | 13 | Ex, |
Chen | The model by Chen et al. (2009) describes signaling in ErbB-activated MAPK and PI3k/Akt pathways, including seven receptor dimers and two ErbB ligands. | 500 | 3 | 105 | 4 | 154 | , ev, NI, |
Crauste | The model by Crauste et al. (2017) describes CD8 T cell differentiation after virus infection. | 5 | 4 | 21 | 1 | 12 | Ex, NI, |
Fiedler | The model by Fiedler et al. (2016) describes Raf/MEK/ERK signaling in synchronized HeLa cells upon stimulation with MEK and ERK inhibitors. | 6 | 2 | 72 | 3 | 19 | , NI, |
Fujita | The model by Fujita et al. (2010) describes the epidermal growth factor (EGF)-dependent Akt pathway in PC12 cells. | 9 | 3 | 144 | 6 | 19 | Ex, ev, NI, |
Hass | The model by Hass et al. (2017) establishes early Reelin-induced signaling and identifies Src family kinases (SFKs) as crucial part for Dab1 signaling. | 9 | 6 | 221 | 17 | 49 | Ex, ev, |
Isensee | The model by Isensee et al. (2018) describes the Protein Kinase A (PKA)-II cycle in primary sensory neurons and its response to multiple stimuli, e.g. forskolin and cAMP analogues and is based on quantitative microscopy and Western blotting. | 25 | 3 | 713 | 109 | 46 | C, , ev, NI, |
Lucarelli | The model by Lucarelli et al. (2018) describes activation of Smad proteins upon TGFβ stimulation, identifies the relevant complexes and linked them to target genes. | 33 | 43 | 1755 | 12 | 84 | , Ex, NI, |
Merkle | The model by Merkle et al. (2016) describes Epo-induced signaling simultaneously for CFU-E and H838 cells, with a parsimonious set of differing parameters. | 23 | 22 | 1141 | 62 | 197 | C, , Ex, ev, NI, u(t), |
Raia | The model by Raia et al. (2011) describes interleukin-13 (IL13)-induced activation of the JAK/STAT signaling pathway for B-cells and two lymphoma cell lines. | 14 | 8 | 205 | 4 | 39 | C, , NI, |
Schwen | The model by Schwen et al. (2015) describes binding of insulin to primary mouse hepatocytes based on flow cytometry and ELISA data. | 11 | 4 | 292 | 7 | 30 | , NI, |
Sobotta | The model by Sobotta et al. (2017) presents IL-6-induced JAK1-STAT3 signal transduction and expression of target genes in hepatocytes. | 13 | 11 | 2220 | 110 | 260 | C, , ev, |
Swameye | The model by Swameye et al. (2003) demonstrates that rapid shuttling of STAT5 from the nucleus back to the cytoplasm following Epo stimulus is recognized as a remote sensor. | 9 | 3 | 46 | 1 | 13 | C, Ex, NI, |
Weber | The model by Weber et al. (2015) describes the interactions of PKD, PI4KIIIβ and CERT at the trans-Golgi network of mammalian cells. | 7 | 8 | 135 | 3 | 36 | , ev, NI, u(t), |
Zheng | The model is adapted from Zheng et al. (2012) and describes methylation at histone H3 lysines 27 and 36. | 15 | 15 | 60 | 1 | 46 | , ev, NI, |
Note: The models are abbreviated with the last name of the first author. Many models are based on Western blot data. Number of parameters denotes unknown parameters that are estimated in the model. The number of experimental conditions is specified as the number of different simulation conditions. The feature abbreviations denote the following: C = several compartments, = constant error parameters, Eq. (3), = error model of Eq. (4), = error model of Eq. (5), Ex = known measurement errors, ev = events, NI = non-identifiable parameters, u(t) = time dependent input function, = input function with unknown parameter(s). Initial values are specified according to the following order: = known initial values, = initial condition given by unknown parameters, = parameter dependent functions and = pre-equilibration for initial steady state conditions. The models are described in more detail in Supplementary Section S4.
Name . | Description . | Biochemical species . | Observables . | Data points . | Experimental conditions . | Parameters . | Features . |
---|---|---|---|---|---|---|---|
Bachmann | The model by Bachmann et al. (2011) describes JAK2/STAT5 regulation via two transcriptional negative feedbacks, CIS and SOCS3 in murin blood forming cells | 25 | 11 | 542 | 23 | 113 | C, , NI, |
Becker | The model by Becker et al. (2010) shows that rapid EpoR turnover and large intracellular receptor pools enables linear ligand response. | 6 | 4 | 85 | 13 | 16 | |
Beer | The model by Beer et al. (2014) uses Escherichia coli as chassis to demonstrate heterologous T domain exchange in non-ribosomal peptide synthetases (NRPSs). | 4 | 2 | 27 132 | 19 | 72 | , ev, NI, |
Boehm | The model by Boehm et al. (2014) evaluates possible homo- and heterodimerization of the transcription factor isoforms STAT5A and STAT5B. | 8 | 3 | 48 | 1 | 9 | C, |
Brannmark | The model by Brännmark et al. (2010) describes insulin signaling in adipocytes. | 9 | 2 | 43 | 8 | 22 | , ev, NI, u(t), |
Bruno | The model by Bruno et al. (2016) investigates the activity of Arabidopsis carotenoid cleavage dioxygenase 4 (AtCCD4) as regulator of carotenoid of seeds. | 7 | 6 | 77 | 6 | 13 | Ex, |
Chen | The model by Chen et al. (2009) describes signaling in ErbB-activated MAPK and PI3k/Akt pathways, including seven receptor dimers and two ErbB ligands. | 500 | 3 | 105 | 4 | 154 | , ev, NI, |
Crauste | The model by Crauste et al. (2017) describes CD8 T cell differentiation after virus infection. | 5 | 4 | 21 | 1 | 12 | Ex, NI, |
Fiedler | The model by Fiedler et al. (2016) describes Raf/MEK/ERK signaling in synchronized HeLa cells upon stimulation with MEK and ERK inhibitors. | 6 | 2 | 72 | 3 | 19 | , NI, |
Fujita | The model by Fujita et al. (2010) describes the epidermal growth factor (EGF)-dependent Akt pathway in PC12 cells. | 9 | 3 | 144 | 6 | 19 | Ex, ev, NI, |
Hass | The model by Hass et al. (2017) establishes early Reelin-induced signaling and identifies Src family kinases (SFKs) as crucial part for Dab1 signaling. | 9 | 6 | 221 | 17 | 49 | Ex, ev, |
Isensee | The model by Isensee et al. (2018) describes the Protein Kinase A (PKA)-II cycle in primary sensory neurons and its response to multiple stimuli, e.g. forskolin and cAMP analogues and is based on quantitative microscopy and Western blotting. | 25 | 3 | 713 | 109 | 46 | C, , ev, NI, |
Lucarelli | The model by Lucarelli et al. (2018) describes activation of Smad proteins upon TGFβ stimulation, identifies the relevant complexes and linked them to target genes. | 33 | 43 | 1755 | 12 | 84 | , Ex, NI, |
Merkle | The model by Merkle et al. (2016) describes Epo-induced signaling simultaneously for CFU-E and H838 cells, with a parsimonious set of differing parameters. | 23 | 22 | 1141 | 62 | 197 | C, , Ex, ev, NI, u(t), |
Raia | The model by Raia et al. (2011) describes interleukin-13 (IL13)-induced activation of the JAK/STAT signaling pathway for B-cells and two lymphoma cell lines. | 14 | 8 | 205 | 4 | 39 | C, , NI, |
Schwen | The model by Schwen et al. (2015) describes binding of insulin to primary mouse hepatocytes based on flow cytometry and ELISA data. | 11 | 4 | 292 | 7 | 30 | , NI, |
Sobotta | The model by Sobotta et al. (2017) presents IL-6-induced JAK1-STAT3 signal transduction and expression of target genes in hepatocytes. | 13 | 11 | 2220 | 110 | 260 | C, , ev, |
Swameye | The model by Swameye et al. (2003) demonstrates that rapid shuttling of STAT5 from the nucleus back to the cytoplasm following Epo stimulus is recognized as a remote sensor. | 9 | 3 | 46 | 1 | 13 | C, Ex, NI, |
Weber | The model by Weber et al. (2015) describes the interactions of PKD, PI4KIIIβ and CERT at the trans-Golgi network of mammalian cells. | 7 | 8 | 135 | 3 | 36 | , ev, NI, u(t), |
Zheng | The model is adapted from Zheng et al. (2012) and describes methylation at histone H3 lysines 27 and 36. | 15 | 15 | 60 | 1 | 46 | , ev, NI, |
Name . | Description . | Biochemical species . | Observables . | Data points . | Experimental conditions . | Parameters . | Features . |
---|---|---|---|---|---|---|---|
Bachmann | The model by Bachmann et al. (2011) describes JAK2/STAT5 regulation via two transcriptional negative feedbacks, CIS and SOCS3 in murin blood forming cells | 25 | 11 | 542 | 23 | 113 | C, , NI, |
Becker | The model by Becker et al. (2010) shows that rapid EpoR turnover and large intracellular receptor pools enables linear ligand response. | 6 | 4 | 85 | 13 | 16 | |
Beer | The model by Beer et al. (2014) uses Escherichia coli as chassis to demonstrate heterologous T domain exchange in non-ribosomal peptide synthetases (NRPSs). | 4 | 2 | 27 132 | 19 | 72 | , ev, NI, |
Boehm | The model by Boehm et al. (2014) evaluates possible homo- and heterodimerization of the transcription factor isoforms STAT5A and STAT5B. | 8 | 3 | 48 | 1 | 9 | C, |
Brannmark | The model by Brännmark et al. (2010) describes insulin signaling in adipocytes. | 9 | 2 | 43 | 8 | 22 | , ev, NI, u(t), |
Bruno | The model by Bruno et al. (2016) investigates the activity of Arabidopsis carotenoid cleavage dioxygenase 4 (AtCCD4) as regulator of carotenoid of seeds. | 7 | 6 | 77 | 6 | 13 | Ex, |
Chen | The model by Chen et al. (2009) describes signaling in ErbB-activated MAPK and PI3k/Akt pathways, including seven receptor dimers and two ErbB ligands. | 500 | 3 | 105 | 4 | 154 | , ev, NI, |
Crauste | The model by Crauste et al. (2017) describes CD8 T cell differentiation after virus infection. | 5 | 4 | 21 | 1 | 12 | Ex, NI, |
Fiedler | The model by Fiedler et al. (2016) describes Raf/MEK/ERK signaling in synchronized HeLa cells upon stimulation with MEK and ERK inhibitors. | 6 | 2 | 72 | 3 | 19 | , NI, |
Fujita | The model by Fujita et al. (2010) describes the epidermal growth factor (EGF)-dependent Akt pathway in PC12 cells. | 9 | 3 | 144 | 6 | 19 | Ex, ev, NI, |
Hass | The model by Hass et al. (2017) establishes early Reelin-induced signaling and identifies Src family kinases (SFKs) as crucial part for Dab1 signaling. | 9 | 6 | 221 | 17 | 49 | Ex, ev, |
Isensee | The model by Isensee et al. (2018) describes the Protein Kinase A (PKA)-II cycle in primary sensory neurons and its response to multiple stimuli, e.g. forskolin and cAMP analogues and is based on quantitative microscopy and Western blotting. | 25 | 3 | 713 | 109 | 46 | C, , ev, NI, |
Lucarelli | The model by Lucarelli et al. (2018) describes activation of Smad proteins upon TGFβ stimulation, identifies the relevant complexes and linked them to target genes. | 33 | 43 | 1755 | 12 | 84 | , Ex, NI, |
Merkle | The model by Merkle et al. (2016) describes Epo-induced signaling simultaneously for CFU-E and H838 cells, with a parsimonious set of differing parameters. | 23 | 22 | 1141 | 62 | 197 | C, , Ex, ev, NI, u(t), |
Raia | The model by Raia et al. (2011) describes interleukin-13 (IL13)-induced activation of the JAK/STAT signaling pathway for B-cells and two lymphoma cell lines. | 14 | 8 | 205 | 4 | 39 | C, , NI, |
Schwen | The model by Schwen et al. (2015) describes binding of insulin to primary mouse hepatocytes based on flow cytometry and ELISA data. | 11 | 4 | 292 | 7 | 30 | , NI, |
Sobotta | The model by Sobotta et al. (2017) presents IL-6-induced JAK1-STAT3 signal transduction and expression of target genes in hepatocytes. | 13 | 11 | 2220 | 110 | 260 | C, , ev, |
Swameye | The model by Swameye et al. (2003) demonstrates that rapid shuttling of STAT5 from the nucleus back to the cytoplasm following Epo stimulus is recognized as a remote sensor. | 9 | 3 | 46 | 1 | 13 | C, Ex, NI, |
Weber | The model by Weber et al. (2015) describes the interactions of PKD, PI4KIIIβ and CERT at the trans-Golgi network of mammalian cells. | 7 | 8 | 135 | 3 | 36 | , ev, NI, u(t), |
Zheng | The model is adapted from Zheng et al. (2012) and describes methylation at histone H3 lysines 27 and 36. | 15 | 15 | 60 | 1 | 46 | , ev, NI, |
Note: The models are abbreviated with the last name of the first author. Many models are based on Western blot data. Number of parameters denotes unknown parameters that are estimated in the model. The number of experimental conditions is specified as the number of different simulation conditions. The feature abbreviations denote the following: C = several compartments, = constant error parameters, Eq. (3), = error model of Eq. (4), = error model of Eq. (5), Ex = known measurement errors, ev = events, NI = non-identifiable parameters, u(t) = time dependent input function, = input function with unknown parameter(s). Initial values are specified according to the following order: = known initial values, = initial condition given by unknown parameters, = parameter dependent functions and = pre-equilibration for initial steady state conditions. The models are described in more detail in Supplementary Section S4.
The benchmark problems cover a wide range of model and dataset sizes (Fig. 1A). Some of these properties are correlated, e.g. log-transformed numbers of data points and parameters (, P-value ). A local identifiability analysis (see Supplementary Section S12) using the identifiability test by radial penalization (ITRP) (Kreutz, 2018) revealed that most benchmark models possess non-identifiable parameters. Furthermore, we found that initial conditions are specified in multiple ways, e.g. as equilibrium points of an unperturbed condition, and that different types of noise models and input functions are used (Fig. 1B). This results in a large number of combinations which have to be covered by computational modeling tools.
Although our collection is not unbiased, the spectrum of properties in the published models reveals requirements to be covered by modeling and parameter estimation tools. In the following, we will use the benchmark collection to assess a few common questions and statements.
4.2 Log-transformation of model parameters
A variety of studies in the systems biology field advocate the use of log-transformed parameters, , for optimization:
‘For parameters that are by definition non-negative a log-scale should be used in the parameter estimation’. (Raue et al., 2013)
Recent evaluations verified that this can improve computational efficiency (Kreutz, 2016; Villaverde et al., 2019). A comprehensive evaluation on application problems is however missing and the precise reason for the improvement is still unclear. Here, we used the compiled benchmark collection to confirm the finding for multi-start local optimization (Fig. 2A) and to assess whether changes in the objective function landscape might be a potential reason. The performance metric is the average number of converged starts per minute (see Villaverde et al., 2019). Starts are considered to be converged if the objective function value differs at most by from the best objective function value found across all runs for the given benchmark problem, whereby we only included the models for which the best value was found more than once. An assessment of the influence of the convergence threshold is provided in the Supplementary Figure S30.
Accordingly, the fraction of triples for which (6) holds provides a measure of convexity. We evaluated this measure for different combination of sampling strategies for and (lin or log scale, indicated in the x-axis of Fig. 2C), and checking the connecting line between the two parameters in lin or log scale (see Supplementary Section S7). For each combination, we sampled 1000 triples. Our comparison revealed that for most application problems, log-transformation increases the considered measure of convexity (Fig. 2C). Indeed, some problems appear to be completely convex when using log-transformed parameters. This provides a mechanistic explanation for the observed improvement in optimizer convergence.
4.3 Performance of local optimization methods
The no free lunch theorem for discrete optimization states that
‘[…] what an algorithm gains in performance on one class of problems is necessarily offset by its performance on the remaining problems’. (Wolpert and Macready, 1997)
This implies that effective optimization relies on a fortuitous matching between an optimization method and an optimization problem. Similarly, it was shown for continuous optimization problems, for which the no free lunch theorem does not apply, that there are optimal algorithms for certain problem classes (Auger and Teytaud, 2010). Here, we used the benchmark collection to assess the performance of the trust-region-reflective and the interior-point algorithm in the MATLAB function fmincon (The MathWorks, 2016) to parameter optimization problems encountered in systems biology to assess which one is generally more suited. These local optimizers are widely used. Indeed, there are studies using both optimizers to exploit their individual benefits and performance differences (Stapor et al., 2018). The choice of the optimizer has direct implication for multi-start local optimization methods (Raue et al., 2013) and meta-heuristics (Villaverde et al., 2019), but also for uncertainty analysis using profile likelihoods (Raue et al., 2009).
For fmincon, mainly the default settings provided by MATLAB were chosen, which can be obtained by . Therein, the algorithm was chosen as trust-region-reflective or interior-point, respectively. Additional changes to the default settings comprise:
A user-defined gradient and Hessian for Gauss-Newton optimization.
The tolerance on first-order optimality was set to 0.
Termination tolerance on the parameters was set to .
As subproblem-algorithm, cg (conjugate gradient) was always chosen.
The maximum number of iterations was set to 10 000.
The trust-region-reflective algorithm is tailored to optimization problems with linear constraints. The trail step of the optimizer is obtained by minimizing a quadratic approximation of the objective function within the trust region (which is chosen adaptively). Parameter bounds are handled in the step construction by scaling and reflection. The interior-point algorithm is a general purpose method (and the MATLAB default) for optimization problems with linear and nonlinear constraints. It solves a sequence of approximate optimization problems with barrier functions. In each iteration, a direct step obtained by solving the so-called Karush-Kuhn-Tucker condition or conjugate gradient step using a trust region is performed. For details we refer to the MATLAB documentation (The MathWorks, 2016).
We performed multi-start local optimization with 1000 fits for all benchmark models. Our results revealed that for the considered benchmark problems the trust-region-reflective algorithm tends to outperform the interior-point algorithm (Fig. 3 and Supplementary Figs S7–S26). Indeed, the trust-region-reflective algorithm achieved a higher number of converged starts per computation time for 18 of the 20 benchmark problems and is for 9 benchmark problems the only algorithm finding the optimal solution. However, the optimal solutions for 2 benchmark problems were only obtained using the interior-point algorithm. Accordingly, although the trust-region-reflective algorithm (which is not the MATLAB default) achieves the higher reliability and performance, it can be beneficial to test alternative local optimizers. Additional information of the multi-start fits and the computation time for each model, as well as a comparison of the trust-region-reflective and the interior point method with the least-squares solver implemented in the MATLAB function lsqnonlin can be found in Supplementary Sections S3, S8 and S9.
4.4 Number of steps for local optimizers
Common questions in practical applications are (i) for how many steps (or iterations) a local optimizer should be run, and (ii) how the number of steps depends on the number of the parameters. For many local optimization algorithms, such bounds and results for scaling properties are available. For interior-point algorithms it has been shown that for convex problems
‘[…] the number of Newton steps hardly grows […] with m [the number of constraints - author’s note] (or any other parameter, in fact)’. (Boyd and Vandenberghe, 2004, Section 11.5.6)
Similar findings are reported for other methods (see, e.g. Nesterov, 2013). As the independence of the number of optimization steps from the number of parameters might be surprising, we set out to assess the properties on the benchmark collection. For each problem, the trust-region-reflective algorithm implemented in the MATLAB function lsqnonlin was run, without constraints on the maximum number of function evaluation.
Our assessment of the average number of optimizer steps (Fig. 4) revealed that on average 391 ± 19 iterations were taken. There is—as predicted by theory for convex problems—no significant dependence on the number of parameters (, P-value ). Accordingly, our analysis on the benchmark collection validated for the first time that the theoretical results also hold for application problems in systems biology (which are in general not convex).
In contrast to the number of iterations, the computation time of local optimization depended on the number of parameters (, P-value ). For the trust-region-reflective algorithm using forward sensitivities for gradient calculation, we observed a roughly quadratic dependence (). As the objective and its gradient are evaluated simultaneously using forward sensitivity analysis, the increased computation time is not caused by an increase number of function evaluations (Supplementary Fig. S32). Instead, the computation time per function evaluation tends to increase as the number of parameters increases.
4.5 Incidence of sloppiness
The reliability of parameter estimates is limited by the quality and amount of available experimental data. This dependency translates to parameter sensitivities and to the Fisher information matrix (see Supplementary Material). For ODE models, Gutenkunst et al., 2007 suggested that
‘sloppy sensitivity spectra are universal in systems biology models’.
While the concept of sloppiness was controversially discussed in the literature (Apgar et al., 2010; Chis et al., 2016; Tönsing et al., 2014; Transtrum et al., 2015), an assessment of sloppiness for a large collection of models with experimental data appears to be missing. Here, we used the benchmark collection, calculated the eigenvalue spectra of the Fisher information matrix at the maximum likelihood estimate and thereby assessed incidence of sloppiness in our benchmark models. Details are provided as Supplementary Material.
Figure 5 shows that 19 out of our 20 models exhibit a sloppy spectrum, i.e. the eigenvalues spread over more than 6 orders of magnitude. For most models, the spread was even >15 orders which partially occurs because of non-identifiability. One model (Bruno et al., 2016) has a non-sloppy spectrum covering only 2.13 orders of magnitude which illustrates that the spread of the eigenvalues is a matter of experimental design.
5 Discussion
Mechanistic dynamical models are used to describe and analyze biochemical reaction networks, to determine unknown parameters, gain biological insights and perform in-silico experiments. Novel methods to address these challenging tasks are proposed on a regular basis, however, a thorough assessment is often problematic. To address this problem, we compiled a collection of 20 benchmark problems. Reusability was ensured by providing the models in the machine-readable SBML format and the experimental data in structured Excel files. In addition, all aforementioned models are included in the open-source MATLAB toolbox Data2Dynamics (Raue et al., 2015) and the analysis scripts are provided as Supplementary Material.
To ensure that the benchmark problems are realistic and practically relevant, we exclusively included published models and measured experimental data. This is a key difference to existing benchmark collections which mostly considered models with simulated data (Ballnus et al., 2017; Villaverde et al., 2015). The benchmark models possess a broad spectrum of properties (e.g. different types of initial conditions, noise models and inputs), as well as challenges (e.g. structural and practical non-identifiabilities, and objective functions with multiple minima and valleys). The size of the benchmark problems ranges from roughly 20 data points, 10 parameters to be optimized and a single experimental condition to large models with more than 1000 data points, over 200 parameters and up to 110 distinct experimental conditions. This facilitates the assessment of the scaling behavior of novel algorithms.
We illustrated the value of the benchmark collection by performing three different analyses: (i) Our study of parameter transformations confirmed that optimization benefits from log-transformed parameter space. Furthermore, it suggested that the reason could be a significant increase of convexity of most problems, which provides a more benign setting for local optimizers. The observed change in the convexity appears to be the first mechanistic explanation for the observed improvement in optimizer performance. (ii) Our comparison of trust-region-reflective and interior-point algorithms revealed that the former is better suited for most parameter estimation problems encountered in systems biology. (iii) Our analysis of the scaling behavior confirmed theoretical results showing that the number of optimizer steps does not depend on the number of model parameters. The results of analyses (i)–(iii) could not have been obtained without the benchmark collection, which provided the means for a fair comparison. Indeed, the reliability of the findings depends directly on the size and the representativeness of the benchmark collection. Amongst others, previous studies were not able to provide an assessment of the scaling properties (Ballnus et al., 2017; Raue et al., 2013; Villaverde et al., 2015).
Beyond the analysis carried out in this manuscript, the benchmark collection can be used to address questions such as: How do other local, global and hybrid optimization methods perform in practice? How does the number of iterations of global and hybrid optimization methods depend on the problem dimensions? How should the number of starting points depend on the dimension of the parameter space or properties of model and dataset (e.g. identifiability or oscillatory dynamics)? How do profile likelihood calculation and Markov chain Monte Carlo sampling methods perform? We expect that the assessment of these and other questions will pinpoint practically relevant limitations of existing methods. This will facilitate a targeted improvement of existing and development of new methods. Apparently, for more detailed questions and a more fine-grained analysis, more benchmark models will be required.
In conclusion, we think that the compiled benchmark collection will be an important resource for the systems biology community. It will facilitate the thorough evaluation of novel computational methods and support an unbiased assessment. In the future, the benchmark collection should be integrated with public resources such as the BioModels database (Le Novere et al., 2006). Furthermore, the collection should be extended to enable a more fine-grained analysis and to fill apparent gaps, such as the lack of models and datasets with sustained oscillations. Therefore, we encourage researchers to provide further models and datasets, e.g. by uploading them to our GitHub repository to obtain an even more powerful collection of benchmark models. Information about ways to contribute are provided on the GitHub page.
Funding
This work was supported by the German Ministry of Education and Research by the grant EA: Sys (FKZ 031L0080), the grant SYS-Stomach (01ZX1310B) and the grant INCOME (FKZ 01ZX1705A), as well as by the German Research Foundation (DFG) via grant TRR179. The authors also acknowledge support for high-performance computing by the state of Baden-Württemberg through the bwHPC initiative, which is supported by DFG grant INST 35/1134-1 FUGG.
Conflict of Interest: none declared.
References
The MathWorks (
Author notes
The authors wish it to be known that, in their opinion, Helge Hass and Carolin Loos authors should be regarded as Joint First Authors.