Comparison of approaches for parameter identifiability analysis of biological systems

Motivation: Modeling of dynamical systems using ordinary differential equations is a popular approach in the field of Systems Biology. The amount of experimental data that are used to build and calibrate these models is often limited. In this setting, the model parameters may not be uniquely determinable. Structural or a priori identifiability is a property of the system equations that indicates whether, in principle, the unknown model parameters can be determined from the available data. Results: We performed a case study using three current approaches for structural identifiability analysis for an application from cell biology. The approaches are conceptually different and are developed independently. The results of the three approaches are in agreement. We discuss strength and weaknesses of each of them and illustrate how they can be applied to real world problems. Availability and implementation: For application of the approaches to further applications, code representations (DAISY, Mathematica and MATLAB) for benchmark model and data are provided on the authors webpage. In addition, profiles that have a unique minimum but do not cross the confidence threshold reveal six practically non-identifiable parameters: DecoyR_binding, SOCS3_degradation, SOCS3_translation, pRec_degradation, pRec_intern and scale_pIL4Ra_obs. The remaining parameters are both structurally and practically identifiable and have finite confidence intervals


INTRODUCTION
The dynamics of cellular processes such as signal transduction pathways can be described by models consisting of ordinary differential equations (ODEs). We used a model of IL13-induced JAK/STAT signaling (Raia et al., 2011) to present a comprehensive comparison of three approaches for structural identifiability analysis. Mathematically, such dynamical models can be characterized by the ODE _ xðtÞ ¼ fðxðtÞ, uðtÞ, hÞ with xð0Þ ¼ x 0 ¼ gðhÞ ð 1Þ In this case, the vector of state variables xðtÞ describes the dynamics of 15 molecular components involved in the JAK/STAT signaling pathway (for details see equation (3)). The function uðtÞ represents possibly time-dependent experimental treatments; in this case it represents a constant treatment with the cytokine IL13. The vector h contains all parameters of the dynamical model, such as the reaction rate constants. For a specific cell type or biological context, the parameters h are often not available from literature and have to be estimated from experimental data. The initial concentrations x 0 can be known or unknown; in the latter case they have to be estimated from experimental data as well. Each possible measurement is mathematically represented by a functional mapping yðtÞ ¼ hðxðtÞ, uðtÞ, hÞ ð 2Þ that might include additional parameters and thus increase the dimension of h, such as scaling parameter for relative data obtained by immunoblotting in our case. The inference problem of concern is the determination of the model parameters h from the measurements y and inputs u described by Equation (2). In general, this may not be possible. Structural or a priori identifiability (Bellman and Å stro¨m, 1970) is a property of the systems (1) and (2) that guarantees that the unknown model parameters can, in principle, be determined from generic input and output functions of the model, provided they satisfy certain minimal conditions called 'persistence of excitation' [for a broad introduction and classical approaches Walter (1987)]. Structural identifiability is a prerequisite and necessary condition for any estimation procedure to render the recovery of h from input-output measurements as a well-posed problem and to return meaningful results about h. Well-known approaches to structural identifiability analysis often have problems with the realistically sized models (Karlsson et al., 2012;Sedoglavic, 2002), but there are approaches that can handle them.
Let us consider a simple model for the illustration of structural identifiability. The model describes a transition between two components _ x 1 ðtÞ ¼ À 1 x 1 _ x 2 ðtÞ ¼ þ 1 x 1 We assume that x 1 ð0Þ ¼ 2 and x 2 ð0Þ ¼ 0, and only the measurement y 1 ðtÞ ¼ 3 x 2 ðtÞ, are available. In this case, we can solve for *To whom correspondence should be addressed.
x 2 ðtÞ analytically and substitute into the measurement equation. The result y 1 ðtÞ ¼ 3 2 expðÀ 1 tÞ Á ðexpð 1 tÞ À 1Þ shows that 3 and 2 are structurally non-identifiable. 1 and only the product of 3 and 2 are structurally identifiable. Such redundant parameterization can be detected directly for very simple examples. For analysis of realistic models, one has to resort to more sophisticated approaches.
We compared three current conceptually different approaches for identifiability analysis using the model and data of Raia et al.
(2011) as case study. In particular, we compared the Differential Algebra Identifiability of Systems (DAISY) approach proposed by Saccomani et al. (2003), the Exact Arithmetic Rank (EAR) approach implemented by Karlsson et al. (2012) and the Profile Likelihood (PL) approach proposed by Raue et al. (2009). The results of all three approaches are in good agreement; however, each approach has specific strength and weaknesses that will be discussed.

METHODS
Approaches for identifiability analysis can be classified according to several criteria. The main difference is between a priori versus data-based type approaches. A priori approaches can be applied irrespective of which input functions are used and before the availability of experimental data. Data-based type approaches can be applied if actual experimental data are available or can be simulated under reasonable assumptions. Some a priori approaches allow to test global identifiability, a property holding for all possible parameter values, i.e. independently of the actual parameter value. Other approaches allow to test local identifiability, holding around a point in the parameter space. Some data-based approaches also allow for conclusions about practical non-identifiability (Raue et al., 2009) that is caused by limited quality of experimental data. The classification of the three approaches investigated is shown in Table 1.

DAISY approach
This approach implements a differential algebra algorithm to perform a global parameter identifiability analysis for dynamic models described by polynomial or rational equations (Bellu et al., 2007). The basic idea is that of manipulating algebraic differential equations as polynomials depending also on derivatives of the variable. Ritt's algorithm permits to eliminate the non-observed state variables x from the system of equations and to find the input-output relation of the system: a set of polynomial differential equations involving only the variables u and y, thus describing all input-output pairs satisfying (1) and (2). The inputoutput relation is linearly parameterized by certain algebraic functions of the unknown parameters called the exhaustive summary, which can be easily extracted. These functions lead to a system of algebraic nonlinear equations in the unknown . By applying a computer algebra algorithm, i.e. the Buchberger algorithm, it is possible to check whether there is one or multiple solutions and hence distinguish between global, or local identifiability or non-identifiability of the original dynamic system. An additional advantage of using this computer algebra tool is that it does not require expertise on mathematical modeling by the experimenter.

EAR approach
This approach is based on applying the inverse function theorem to the system of algebraic equations relating higher order derivatives of the output y with respect to time at the initial time with the initial state and parameters (Pohjanpalo (1978). Using a differential algebra approach, an upper bound of the order of differentiation can be given, resulting in a non-linear algebraic system of equations in the parameters. The rank of the Jacobian matrix for this system of equations gives information about its solvability, and in case of a rank-deficient matrix, a more detailed analysis of the Jacobian provides information about which parameters are involved in relations rendering the system nonidentifiable. The EAR method provides means to efficiently compute the generic rank of the Jacobian matrix to return a conclusive result if the system is structurally identifiable. It considers local identifiability but around a generic point, i.e. the computations are carried out for a random specialization of the unknown parameters, and initial state to integer values and a random input in terms of a truncated integer coefficient power series. Local structural identifiability is an almost everywhere property by definition, i.e. it holds everywhere apart from possibly on a set of measure zero. The EAR approach is based on a method for local algebraic observability (Sedoglavic, 2002). The EAR analysis is implemented as a fully automatic Mathematica function. The user simply inputs the equations and gets the answer. Based on the EAR identifiability analysis, it is also possible to find minimal sets of outputs giving identifiability (Anguelova et al., 2012).

PL approach
This approach checks for non-identifiability by posing a parameter estimation problem using real or simulated data. The central idea is that nonidentifiability manifests as a flat manifold in the parameter space of the estimation problem, e.g. the likelihood function. A profile can be calculated for each parameter i individually by repeated optimization of all parameters f j j8j 6 ¼ ig for a series of fixed values of the parameter i . A flat profile indicates a structurally non-identifiable parameter. For detecting structural non-identifiability, simulated data are sufficient. In case real experimental data are available, practical non-identifiability can also be detected and confidence intervals for the parameter estimates can be calculated. The traces in parameter space that correspond to the profiles can be used to analyze the reason for non-identifiability and point to missing experimental information and interrelated parameters. It was demonstrated in a study by Raue et al. (2010) that the approach facilitates an iterative experimental design strategy and Kreutz et al. (2012)  extended the approach to detect non-observability of the dynamics directly. The approach was applied to biological data in Bachmann et al. (2011) and Becker et al. (2010).

Equations of benchmark model
In the following, we describe the equations of the IL13-Induced JAK/ STAT signaling model (Raia et al., 2011) that correspond to Equations (1) and (2). These equations determine the structural identifiability of the model parameters. The ODE system determining the time evolution of the state variables is given by where x is the 14D state vector, the dot denotes the time derivative, u 1 is an input functions and c 1, 2 are constants. The initial conditions of the state variables are xð0Þ ¼ ½1:3, 23 , 0, 0, 0, 2:8, 0, 165, 0, 0, 0:34, 0, 0, 0 ð 4Þ the values are given in units of molecules per cell (Â1000). The set of measurement equations is defined by the following equation The components x, u and c of Equation (3) and y of Equation (5) are described in Table 2. The model Equations (3-5) contain 23 unknown parameters h that are described in Table 3. They need to be determined from the available measurements given in Equation (5).

RESULTS
We present results on the identifiability for the benchmark model (Raia et al., 2011) using the three approaches described in the Section 2: the Differential Algebra Identifiability of Systems (DAISY) approach proposed by Bellu et al. (2007); EAR approach implemented by Karlsson et al. (2012) and the PL approach proposed by Raue et al. (2009). For the last approach, we also use the original data of Raia et al. (2011). The application of the three approaches for identifiability analysis will be described in detail. In summary, all three approaches consistently classify five parameters as structurally non-identifiable: 11 , 15 , 17 , 21 , 22 .

DAISY approach
The differential-algebra-based approach provides a direct check of global identifiability of the above model, showing the nonidentifiability of some model parameters. The approach suggests a reduction of the model to minimal form so that fundamental system theoretic properties, such as accessibility, hold and the model is more suitable for further mathematical investigations.
Thus, before checking model identifiability from the designed experiment, it is convenient to check its minimality. In this case, just by visual inspection, it is easy to see that some of the 14 model equations defining the model are redundant.
In particular, the equations for x 6 , x 8 , x 11 are the same as those for x 7 , x 9 , x 12 with the opposite sign. For example, from the eighth and ninth equations, one obtains _ x 9 ¼ À _ x 8 . By integrating and using the known initial conditions, one arrives at x 9 ¼ Àx 8 þ 165, thus eliminating one differential equation. Same procedure is followed for x 7 , x 12 . Finally, x 14 can be expressed as x 14 ¼ x 10 17 = 11 , and the last equation is also redundant. The model can then be rewritten in a simplified form involving only 10 state variables. This is done not only for the sake of mathematical simplification, but also to satisfy some important structural property, such as minimality and accessibility (Saccomani et al., 2003). This is important, because the lack of minimality of the model may lead to spurious nonidentifiability results for some parameters that may not occur in a minimal model. Also, if the model has two or more differential equations dependent on each other, as in this case, the model would not be accessible, making more difficult the identifiability check of the model from the given initial conditions.
To check global identifiability of this simplified 10D model with DAISY, the user has to write in the input file the ordered list of the output and state variables, the list of the unknown parameters, the model equations and the known initial conditions. Later in the text, the results will be illustrated. For explanations of the technical terms, one may consult Bellu et al. (2007).
DAISY automatically ranks the input, output, state variables and their derivatives, starts the pseudodivision algorithm, i.e. the Ritt algorithm, and calculates the characteristic set of the model. This is a minimal set of differential polynomials, which provides an equivalent description of the model. The subset made of the first eight (i.e. the number of model outputs) differential polynomials does not depend on the state variable x and provides the so-called input-output relation of the model. In particular, these involve higher derivatives of the input and output signals.  Note: Parameter values are given on a log 10 -scale and are allowed to vary between À5 and þ3.
To speed up the algorithm, it may be advisable to add derivatives of the actual output functions to the system equations. This is legitimate, as Ritt algorithm is based on differentiation, besides the usual algebraic operations. After a suitable normalization, the input-output polynomials can be rendered monic, and their coefficients provide a set of rational functions of the unknown parameter h, which form the so-called exhaustive summary of the model. Identifiability is tested by checking injectivity of the exhaustive summary with respect to the parameter h. We should, in principle, calculate the range set of these functions. This could be done in symbolic language evaluating them at a symbolic parameter value. In practice, but only to speed up the process, instead of choosing a symbolic parameter value, we can use a set of randomly chosen numerical points in the range set. Solution of these algebraic equations is done by computing a Gro¨bner basis, by applying the Buchberger algorithm. The results show that all the parameters are uniquely identifiable except for 11 , 15 , 17 , 21 and 22 , which have an infinite number of solutions. Thus, the model is non-identifiable.
The analysis actually provides some hint on how to simplify the model to make it globally identifiable. For example, by assigning known values to parameters 21 and 22 , the model would become globally identifiable.
Further information about DAISY and instructions on how to obtain it can be found at: http://www.dei.unipd. it/wdyn/?IDsezione¼4364.

EAR approach
The identifiability of a dynamic model is closely related to the properties of the Jacobian matrix containing the derivatives of signals assumed to be measured (i.e. model outputs) and their time derivatives, with respect to the parameters. Furthermore, structural identifiability is a generic property of the symbolic form of the system and measurement equations, and hence it is sufficient to analyze this property for a specific (generic) point in parameter space. In the EAR approach, this is utilized using exact modular integer arithmetics for fast computations, i.e. the specialization of parameter values to random integers only serves the purpose of fast computation of structural properties and has nothing to do with biological feasibility. The direct approach of first deriving the entries of the Jacobian matrix in symbolic form, inserting integer values and then computing the matrix rank is not feasible for anything but very small systems, because of extensive swell of the size of symbolic expressions. Instead, it can be shown that the numerical values of the entries of the Jacobian matrix can be computed efficiently by computing power series solutions to the original ODEs augmented by their corresponding parametric sensitivity differential equations, followed by insertion into the output sensitivity expressions the obtained truncated power series solutions of the state and state sensitivities. To prevent the need for computation with rational numbers and the inherent swell in size of numerators and denominators, all computations are carried out modulo a large prime. To summarize, the above approach is based on exact (modular) arithmetics for obtaining the entries in the Jacobian as well as for the subsequent rank computation; hence, the name EAR. It is implemented in terms of a fully documented Mathematica application and is completely automatic once a specific system description has been provided.
Here is an outline of the steps of the algorithm (1) Identifiability is a generic property of the symbolic form of the system and measurement equations. Therefore, any generic point can be analyzed. First, specializations of parameters and initial conditions are generated.
(2) Specialize inputs to truncated random integer coefficient power series.
(3) Truncated power series solutions of x, @x @xð0Þ and @x @ are computed, using the system and sensitivity system equations.
(4) The power series are inserted in the expressions for the derivatives of the outputs with respect to the initial state and with respect to the parameters resulting in power series representations of the output sensitivity derivatives.
(5) Identification of the coefficients of the truncated power series of the output derivatives with the coefficients of a general Taylor expansion of the output sensitivity derivatives gives the higher order time-derivatives of the output sensitivity derivatives, i.e. the entries of the specialized Jacobian matrix.
(7) If the matrix is rank-deficient, the non-identifiable parameters are found using the fact that removing the corresponding columns from the matrix do not change the rank.
The analysis using the Mathematica package is fully automatic. Further information about the package and instructions on how to obtain it can be found at: http://www.fcc.chalmers.se/ sys/products/identifiabilityanalysis. Included in the package is, apart from the identifiability test demonstrated above, also functionality for automatically finding minimal sets of output expressions that guarantee a structurally identifiable model. This functionality is described in Anguelova et al. (2012).

PL approach
The PL approach determines the identifiability of the model parameters by posing a parameter estimation problem. Here, we use the original data of Raia et al. (2011) and investigate both the structural and practical identifiability of the model parameters. For parameter estimation, maximum likelihood estimation (MLE) is applied. The likelihood LðhÞ describes the probability of the data given certain parameter values h. The MLE fit of the dynamic model to the experimental data for MedB-1 cell is shown in Figure 1, and the MLE parameter values are given in Table 3. Likelihood profiles were calculated as described in Raue et al. (2009) by where for each fixed value of parameter i , all other parameters j 6 ¼ i are reoptimized. Figure 2 shows the results of the analysis. A perfectly flat profile indicates a structural non-identifiable parameter. Perfectly flat profiles reveal the five structural nonidentifiable parameters. The change of the parameters j along a profile of a structurally non-identifiable parameter i can be used to determine functionally related groups between the structurally non-identifiable parameters (Fig. 3). Here, the five structurally non-identifiable parameters are functionally related in two groups. In the first group, indicates that the concentration scale of SOCS3 mRNA (x 10 ) is not fixed by measurements. Using this information, two new experiments that determine the respective concentration scales can be used to resolve these structural non-identifiabilities.
The likelihood profiles can also be used to assess practical identifiability and to calculate confidence intervals of the model parameters. A threshold in the likelihood, measured from the MLE point, can be used to compute likelihood-based confidence intervals [for details on the statistics, see in Raue et al. (2009)]. Profiles that have a unique minimum, but do not cross the confidence threshold, reveal six practically non-identifiable parameters: 3 , 4 , 12 , 14 , 16 and 19 . The likelihood profiles can also be used to design experiment that resolve practical nonidentifiabilities [for an illustrative example, see Raue et al. (2010)]. The remaining parameters are both structurally and practically identifiable, and have finite confidence intervals; see Table 3 for values.
The PL approach is implemented in the freely available MATLAB software packages Data2Dynamics (Raue et al., 2013) and PottersWheel (Maiwald and Timmer, 2008). For the Data2Dynamics software packages, the Raia et al. (2011) model and data are included in the software as an example application. The software package is open source and freely available on the Web site: https://bitbucket.org/d2d-development/d2d-software.

DISCUSSION
The results of all three approaches are in good agreement for the benchmark application considered here. Five of 23 parameters are consistently classified as structurally non-identifiable. The procedure to reproduce and interpret the results obtained by each of the three approaches was presented and can serve as reference for further application.
The strength of the DAISY approach is to check for global identifiability, i.e. it checks the uniqueness of the parameter solution. Thus, it is able to distinguish between global and local identifiability. Being based on differential algebra methods, DAISY can directly deal only with polynomial or rational functions f and h. However, the method can be generalized to deal with some non-polynomial functions, e.g. exponential functions. Although the program is usually very fast, in the order of few seconds, for complex models the algorithm may not successfully terminate because of a lack of memory of the system. In most cases, however, it is possible to simplify the calculations required by the algorithm by eliminating redundant model equations by hand. This was done for the benchmark application considered here.
The main advantage of the EAR approach is that it is fast and can handle large and complex systems. The system analyzed in this article is considered small and simple for this approach. Systems on the scale of 100 states and 100 parameters can be handled. One limitation of the EAR approach is that it requires the vector valued functions f, g and h to be rational functions of their arguments. This limitation is not as restrictive as it may first sound, however, as it can be shown that any function, which in itself is the solution to an equation such as (1), can be handled through an extended state space approach (Lindskog 1996).
Even if-statements can be closely approximated using rational functions.
The strength of the PL approach is that it does not pose any restrictions on the algebraic form of the model equations. Even non-algebraic constructs such as if-statements or constraints on the parameters or model dynamics can be handled. It also allows for statements on practical identifiability and confidence intervals. Besides the structurally non-identifiable parameters, six parameters are practically non-identifiable considering the data available from Raia et al. (2011). However, owing to the underlying parameter estimation problem, issues such as local minima have to be handled with care. In such case, it might be necessary to repeat profile calculations for multiple minima detected in the objective function to enhance robustness of the results (Raue et al., 2013). Likelihood profiles for all 23 model parameters. The parameters are allowed to vary between À5 and þ3 on a log 10 -scale. The MLE point is indicated by asterisks. The red dashed line corresponds to a threshold that indicates a 95% confidence level. The points of pass-over of profile and threshold determine likelihood-based confidence intervals. Perfectly flat profiles reveal five structural non-identifiable parameters: CD274mRNA_production, SOCS3_accumulation, SOCS3mRNA_production, scale_CD274mRNA_obs and scale_SOCS3mRNA_obs. In addition, profiles that have a unique minimum but do not cross the confidence threshold reveal six practically non-identifiable parameters: DecoyR_binding, SOCS3_degradation, SOCS3_translation, pRec_degradation, pRec_intern and scale_pIL4Ra_obs. The remaining parameters are both structurally and practically identifiable and have finite confidence intervals

CONCLUSION
The presented approaches allow for comparable and reliable conclusions about structural identifiability and can readily be used in Systems Biology applications. Software implementation of all approaches is freely available. It is important and good practice to double check results with different, but comparable, approaches. Here, we provide a case study that can serve as reference and hand-on guide to apply and interpret the results of three current approaches to structural identifiability analysis. The results of identifiability analysis can be helpful to provide guidelines on how to simplify the model structure or design additional experiments that enhance the predictive power of a mathematical model.
All three approaches examined in this paper are useful for real application examples. in many cases, all three approaches work equally well, but in some cases one of the three is preferred. If the system is very large and/or if the analysis must be fast, then EAR is the preferred approach. If it is of importance to get truly global identifiability, DAISY is the preferred approach. If practical identifiability is important, or if the equations include nonrational expressions like if-statements, then PL should be used. Using combinations of approaches can sometimes give the strengths of all approaches to the analysis.