pyPESTO: a modular and scalable tool for parameter estimation for dynamic models

Abstract Summary Mechanistic models are important tools to describe and understand biological processes. However, they typically rely on unknown parameters, the estimation of which can be challenging for large and complex systems. pyPESTO is a modular framework for systematic parameter estimation, with scalable algorithms for optimization and uncertainty quantification. While tailored to ordinary differential equation problems, pyPESTO is broadly applicable to black-box parameter estimation problems. Besides own implementations, it provides a unified interface to various popular simulation and inference methods. Availability and implementation pyPESTO is implemented in Python, open-source under a 3-Clause BSD license. Code and documentation are available on GitHub (https://github.com/icb-dcm/pypesto).


Introduction
In many research areas, including computational biology, mathematical models are important tools to study complex systems and understand underlying mechanisms (Kitano, 2002).While there are a variety of formalisms to describe biological systems, ordinary differential equation (ODE) models are popular as they provide natural means to describe and explain dynamic changes after perturbations, common in experimental biology (Fröhlich et al., 2019).However, models usually have unknown parameters that need to be estimated -their value and uncertainty -from observed data (Tarantola, 2005).We present pyPESTO, a Python-based parameter estimation tool that provides various inference approaches in a modular manner via a streamlined pipeline (Figure 1; see the Supplementary Information, Section 1, for a tool comparison).pyPESTO can be applied to various problem types, including large-scale problems.

Problem definition
To ease application to biological systems, pyPESTO supports the community standard PEtab for the specification of parameter estimation problems (Schmiester et al., 2021), and interfaces in particular the ODE simulation and sensitivity engine AMICI (Fröhlich et al., 2021).Furthermore, pyPESTO allows for user-defined continuous parameter estimation problems given via scalar objective functions or vector-valued residual functions.This includes log-likelihood and log-posterior based objective functions, where estimated values and uncertainties can be interpreted statistically.As many inference methods benefit from objective function derivatives, pyPESTO supports user-supplied functions that compute derivatives, but also provides adaptive finite differences.

Optimization
Finding globally optimal parameters as point estimates is a common starting point in parameter estimation.As many problems are non-convex and possess multiple local optima, globalization strategies are necessary.To this end, pyPESTO provides interfaces to global optimizers as well as a multi-start globalization strategy for local and global optimizers, which performs well for biological problems (Raue et al., 2013).pyPESTO provides a unified interface to local and global optimization libraries such as Ipopt, Dlib, PySwarms, pycma, SciPy, NLopt, Fides (see the Supplementary Information, Section 2.1).An extension to other tools is easily possible.Moreover, pyPESTO provides a hierarchical approach to efficiently handle relative data and noise parameters (Schmiester et al., 2019a) and ordinal data (Schmiester et al., 2019b).

Uncertainty analysis
For uncertainty analysis, pyPESTO implements the optimization based (frequentist) profile likelihood and (Bayesian) profile posterior approaches using the aforementioned interfaces to different optimization libraries (Raue et al., 2009).Moreover, pyPESTO implements the Bayesian sampling algorithms adaptive Metropolis (Haario et al., 2001) and adaptive parallel tempering (Miasojedow et al., 2013), and provides a unified interface to the popular sam-pling tools Emcee, PyMC, and Dynesty (see the Supplementary Information, Section 2.2), supporting in particular gradient-based sampling.

Further aspects
pyPESTO provides various routines to visualize and analyze obtained results.Results can be saved, recovered, and shared in a compact storage format based on HDF5.Moreover, pyPESTO supports shared-memory parallelization via multi-threading and multi-processing.Thus, pyPESTO can be deployed flexibly on desktop machines as well as high-performancecomputing infrastructure.
3 Implementation and Availability pyPESTO is implemented in Python, open-source under a 3-Clause BSD license.The code, designed to be modular and extensible, is hosted on GitHub and can be installed from PyPI.Extensive documentation is hosted on ReadTheDocs, including numerous notebooks containing tutorials and outlining pyPESTO's functionality.We ensure correctness during development via unit tests and continuous integration.

Discussion
pyPESTO has already been used in at least 12 publications, and is continuously being developed by at least 4 core contributors at 3 institutions.In the future, we plan to implement additional optimization and uncertainty quantification algorithms, to interface pyPESTO with further popular tools, and to extend and further standardize the supported parameter estimation workflows.We anticipate that pyPESTO will continue to be useful in a variety of computational biology applications and beyond.This overview was compiled to the best of our knowledge, in May 2023.pyPESTO originated as a reimplementation of the MATLAB tool PESTO (Stapor et al., 2018), but now offers a streamlined pipeline with a much broader spectrum of modern methods and interfaced tools.

Comparison of simulation and inference tools features
2 List of interfaced tools

Table 1 :
Comparison of features of various systems biology simulation and inference tools (line break for readability).