FitMultiCell: simulating and parameterizing computational models of multi-scale and multi-cellular processes

Abstract Motivation Biological tissues are dynamic and highly organized. Multi-scale models are helpful tools to analyse and understand the processes determining tissue dynamics. These models usually depend on parameters that need to be inferred from experimental data to achieve a quantitative understanding, to predict the response to perturbations, and to evaluate competing hypotheses. However, even advanced inference approaches such as approximate Bayesian computation (ABC) are difficult to apply due to the computational complexity of the simulation of multi-scale models. Thus, there is a need for a scalable pipeline for modeling, simulating, and parameterizing multi-scale models of multi-cellular processes. Results Here, we present FitMultiCell, a computationally efficient and user-friendly open-source pipeline that can handle the full workflow of modeling, simulating, and parameterizing for multi-scale models of multi-cellular processes. The pipeline is modular and integrates the modeling and simulation tool Morpheus and the statistical inference tool pyABC. The easy integration of high-performance infrastructure allows to scale to computationally expensive problems. The introduction of a novel standard for the formulation of parameter inference problems for multi-scale models additionally ensures reproducibility and reusability. By applying the pipeline to multiple biological problems, we demonstrate its broad applicability, which will benefit in particular image-based systems biology. Availability and implementation FitMultiCell is available open-source at https://gitlab.com/fitmulticell/fit.


Introduction
Biological tissues are complex entities composed of cells and extracellular components.Tissues occur in different developmental stages and compositions, are highly dynamic and often heavily structured.Specific tissue properties are relevant for a broad range of processes, including tissue homeostasis, viral infection, and tumor development and treatment.For the experimental analysis of biological tissues, imaging techniques are widely used.Common approaches include light and fluorescence microscopy, but more recently also imaging mass cytometry, spatial transcriptomics and related methods are employed [see Lewis et al. (2021) for a review].These experimental techniques provide a variety of quantitative information about biological tissues.Yet, mechanisms underlying specific pattern formation or tissue dynamics often remain elusive.To address these aspects, computational modeling has established itself as a key element to obtain a comprehensive understanding of causal relationships in multi-cellular spatio-temporal systems.Computational models of multi-cellular processes usually capture multiple spatial and temporal scales and describe the emergence of the system's behavior based on individual building blocks, e.g.individual cells and their interactions.There are several modeling approaches, including discrete, continuous, and hybrid model formalisms (Anderson and Quaranta 2008, Waclaw et al. 2015, Meyer et al. 2020).Cells and their interactions can, for instance, be described using (energy-based) Cellular Potts Models (CPM) or (force-based) vertex models.Several software packages have been developed, including MCell (Kerr et al. 2008), FLAME (Richmond et al. 2010), CompuCell3D (Swat et al. 2012), Chaste (Mirams et al. 2013), Morpheus (Starruß et al. 2014), and PhysiCell (Ghaffarizadeh et al. 2018) to simplify and standardize the demanding task of model formulation and implementation.Yet, while computational modeling has substantially improved the understanding of multi-cellular systems, it remains a challenging task (Fletcher and Osborne 2022).
For multi-scale models of multi-cellular processes, a key challenge is parameter estimation, the process in which values for the unknown model parameters are determined by fitting the model simulations to experimentally observed data.Parameter estimation is necessary to obtain quantitative models of processes, to analyse processes, to compare competing hypotheses about processes, and to predict the dynamics of processes (e.g. in response to perturbations) (Toni et al. 2011, MacLean et al. 2014, Jagiella et al. 2017, Imle et al. 2019, Durso-Cain et al. 2021, Joslyn et al. 2021).Systematic, rigorous and uncertainty-aware parameter estimation is only just becoming accessible for multi-scale models with advanced methods and growing computational resources.Reasons for this are that (i) the simulation of multi-scale models accounts for different biophysical processes (e.g.intra-and extracellular signaling as well as physical interaction), which requires computationally efficient simulation algorithms (e.g.hybrid discrete-continuum approaches), and that (ii) the stochasticity of most models (e.g.due to randomness arising from small cell numbers) necessitates repeated simulations.
To cope with the challenges of parameter estimation for advanced computational models, approximate Bayesian computation (ABC) methods have been developed and applied (Pritchard et al. 1999, Beaumont et al. 2002, Sisson et al. 2018).These methods generate samples from an approximation of the Bayesian parameter posterior distribution, without evaluating a likelihood function which can quickly become inaccessible for complex stochastic models.To enable the application of ABC methods to multi-scale models, they have been parallelized on high-performance computing (HPC) infrastructure (Johnston et al. 2014, Sottoriva et al. 2015, Babtie and Stumpf 2017, Jagiella et al. 2017).Generic implementations are provided by, in particular, pyABC (Klinger et al. 2018), ABCpy (Dutta et al. 2017), and ELFI (Kangasra ¨a ¨sio ët al. 2016).Yet, while simulation and inference tools are available, the parameter estimation for multi-scale models of multi-cellular processes still requires a high level of technical expertise.The available tools are not interfaced and code from published application examples is often difficult to reuse and extend.Thus, there is a need for a platform that facilitates and streamlines the entire workflow, from the construction of multi-scale models of various types based on biological principles, to systematic uncertainty-aware data-driven parameter estimation (Hasenauer et al. 2015).A first software tool tackling this problem is the spatial model editor (SME) (https:// github.com/spatial-model-editor)(Keegan et al. 2023).However, this only supports partial differential equation models of reaction-diffusion systems, while stochastic and multi-scale models cannot be considered and parameter uncertainty analysis is not supported.
In this work, we introduce the FitMultiCell pipeline, an open-source, user-friendly, and scalable end-to-end platform that integrates modeling, simulation, and parameter estimation, to simplify the analysis of multi-scale and multi-cellular systems.FitMultiCell integrates the state-of-the-art tools Morpheus (Starruß et al. 2014) for model building and simulation, and pyABC (Scha ¨lte et al. 2022) for parameter estimation.It builds on an extension of the PEtab standard (Schmiester et al. 2021) for the specification of parameter estimation problems that we additionally introduce in this manuscript.We demonstrate and evaluate the FitMultiCell pipeline using models of viral infection, tumor growth, and liver regeneration.

Problem description
We consider the problem of developing quantitative computational models of multi-cellular processes.These models might account for a broad range of biochemical and biophysical processes, including • cellular signal transduction, metabolism, and gene regulation, • cell movement, proliferation and death, and • cell-cell communication (e.g. via direct cell-cell-contact areas or the secretion/uptake of biochemical substances).
The spectrum of modeling frameworks for such integrated processes is broad and the models are often stochastic, like the behaviors and decisions by individual biological cells are inherently stochastic.Mathematically, we can write any such model as where y denotes the vector of observed properties and h denotes the vector of unknown properties of biochemical, biophysical, or observation processes (e.g.reaction rates).Process noise-arising, for instance, from low molecular numbers-as well as measurement noise is described via the vector of random variables n.Marginalizing the model simulation over the random variables n yields the likelihood pðyjhÞ, which is the conditional probability of observing y given h.
Quantitative mathematical modeling requires the inference of the unknown parameters h from data.In this study, we consider Bayesian inference and aim to approximate the Bayesian posterior distribution pðhjy obs Þ / pðy obs jhÞ Á pðhÞ of the unknown model parameters for experimentally observed data y obs and prior knowledge pðhÞ.The posterior distribution encodes all available information about the model parameter and, hence, allows for the assessment of parameter and prediction uncertainties as well as for the design of validation experiments.Experimental data used for the parameterization of the considered models are often obtained using imaging techniques.These can provide information about, e.g.spatial profile, temporal changes and changes between conditions.Overall, the spectrum of experimental setups is broad.

FitMultiCell pipeline
To facilitate quantitative computational modeling of multicellular processes, we developed the FitMultiCell pipeline (Fig. 1).This pipeline supports its users in 1) formulating the modeling problem, i.e. a) defining computational models and b) defining data obtained from experiments, 2) estimating unknown model parameters, and 3) evaluating parameter and model predictions (including planning validation experiments).
This pipeline streamlines the parameter estimation by providing a tight integration of tools for model specification and simulation, and parameter estimation and reporting.This is achieved using a Python interface with a broad spectrum of functionalities.The current version supports (i) multi-scale modeling and simulation using Morpheus (Starruß et al. 2014) and (ii) parameter estimation and uncertainty quantification using pyABC (Klinger et al. 2018).These state-of-theart tools cover modeling applications; yet, other modeling, simulation, and parameter estimation tools can be easily interfaced.Besides this Python interface, FitMultiCell offers a GUI to control pipeline execution and to analyse results.It also provides a collection of scripts and encapsulation of Morpheus that streamline pipeline execution on large computer clusters and cloud resources, making them accessible to non-expert users.Morpheus has been extended by GUI features for easily selecting parameters for estimation (auto-generated and interactive model graph to guide the user to relevant model parameters and automatic XPath creation to reference selected parameters) and by major performance improvements including adaptive parallel solvers for intracellular and partial differential equation (PDE) sub-models and the statistically exact multicore parallelization of the CPM formalism (see Supplementary data).Regarding pyABC, the construction of summary statistics has been simplified and robust adaptive distance functions were added.Lastly, PEtab-MS was developed to ensure reproducibility and ease of problem definition, often eliminating the need for coding.
The FitMultiCell pipeline enables parameter estimation for application problems with multiple experimental conditions, datasets, and data types.Furthermore, it offers a broad range of built-in visualization and analysis tools.To facilitate reproducibility and reusability, the FitMultiCell pipeline supports the MorpheusML and the PEtab-MS standards.
In the following, we describe the key components of the FitMultiCell pipeline and their features.

Standardization
The FitMultiCell pipeline uses standardized data formats to ensure interoperability, reproducibility, and reusability-important aspects of the FAIR principles (Wilkinson et al. 2016).The multi-cellular models are encoded using MorpheusML, an established XML-based standard at https:// doi.org/10.25504/FAIRsharing.78b6a6.The parameter estimation problems are encoded using the PEtab-MS format, a newly developed tsv-file-based standard.
We developed PEtab-MS as an extension of the Parameter Estimation tabular format (PEtab) (Schmiester et al. 2021), which was developed in the field of ordinary differential equation (ODE) based modeling and is already supported by various simulators and parameter estimation toolboxes.PEtab-MS conserves most core components of PEtab, including the tables defining model parameters and experimental conditions.Yet, PEtab-MS allows for model expression using MorpheusML, flexible data tables (including references to image files), as well as functions for computing summary statistics.These aspects are essential for multi-cellular processes and were not captured by PEtab.The specification of PEtab-MS and a tool for the validation of files can be found at https://gitlab.com/fitmulticell/libpetab-python-MS.

FitMultiCell
As imaging data are often processed to obtain informative summary statistics, the FitMultiCell pipeline includes a class definition for the construction of summary statistics.Several common summary statistics for imaging data are already provided (see Section 3) and customized statistics can be implemented.Furthermore, it is possible to automatically construct informative summary statistics (Fearnhead andPrangle 2012, Scha ¨lte andHasenauer 2022).

Simulation
The FitMultiCell pipeline is designed for the simulation-based analysis of multi-cellular processes.Accordingly, we allow for simulation of advanced models M for cells and tissues (using cellular automata and cellular Potts models), extracellular concentration fields (using partial differential equations) and intracellular dynamics (using ordinary or stochastic differential equations or continuous-time Markov jump processes).
The FitMultiCell pipeline implements a comprehensive interface to Morpheus, including parameter mapping and simulation result extraction.This renders the functionalities of Morpheus (e.g.model construction, simulation, and visualization) as well as features such as a comprehensive GUI for rapid prototyping available within the FitMultiCell pipeline.Morpheus is broadly applicable, widely used for a variety of biological problems, and computationally efficient [see, e.g.Additionally, the modular architecture of the FitMultiCell pipeline allows for the interfacing of additional modeling and simulation toolboxes, as well as user-provided simulation codes.

Inference
The FitMultiCell pipeline is designed for simulation-based inference, a class of approaches circumventing the evaluation of the likelihood function.This is important for the study of stochastic multi-cellular processes, but also allows for the application to deterministic models.
The FitMultiCell pipeline implements a comprehensive interface to pyABC (Scha ¨lte et al. 2022), including parameter and condition mapping for multi-experiment and multi-data type inference.This renders the state-of-the-art approximate Bayesian computation sequential Monte Carlo (ABC-SMC) method easily accessible to users of the FitMultiCell pipeline.ABC-SMC methods approximate the posterior distribution by constructing a sequence of particles which resembles the data (in terms of summary statistics) more and more closely (Toni and Stumpf 2010).Well-tested adaptation methods, e.g. for acceptance thresholds, proposal distributions, and population sizes, render pyABC accessible to non-expert users.Furthermore, pyABC allows for exact inference (Scha ¨lte and Hasenauer 2020) and we recently implemented automatic data normalization, summary statistics construction, and measurement noise handling to enable the robust estimation of parameters of multi-scale models (Scha ¨lte et al. 2021, Scha ¨lte and Hasenauer 2022).For the combined use of Morpheus and pyABC, we implemented an early rejection mechanism that discards simulations as early as possible, e.g. if they exceed an upper time limit.This makes the analysis robust to unexpectedly long-running simulations, e.g.due to an excessive number of cells.
We provided the interface to pyABC as this tool has been successfully used in a broad spectrum of applications, e.g.cardiac electrophysiology (Cantwell et al. 2019), human atrial cells (Houston et al. 2020), cancer (Colom et al. 2021), gene expression (Coulier et al. 2021), universe expansion (Bernardo and Said 2021), and bee colonies (Minucci et al. 2021).This underlines its applicability to a broad spectrum of inference tasks for multi-cellular processes.Additionally, the modular architecture of the FitMultiCell pipeline allows for the interfacing of additional inference tools, as well as userprovided inference algorithms.

Distributed execution
Parameter estimation often requires thousands to millions of stochastic model simulations, which is computationally demanding for complex multi-cellular processes.Distributed execution of the computational tasks is thus paramount (Fig. 2).Within the FitMultiCell pipeline, parallelization can happen on three levels to efficiently exploit HPC infrastructure: • "Individual simulations" can be parallelized within the employed simulation toolbox.Morpheus supports the use of multiple threads using OpenMP.In addition to ODEs and PDEs, Morpheus also provides a parallel and exact solver for CPMs. 4 Alamoudi et al.
• "Individual summary statistics evaluations" can be parallelized within the FitMultiCell pipeline if multiple individual simulations are required.This is for instance the case if experimental replicates are available for stochastic processes and multiple experimental conditions are considered.As the respective simulations are independent, they can be trivially parallelized across multiple threads.
• "Parameter estimation" can be parallelized by parameter estimation toolbox.PyABC supports single-machine multicore execution and multi-machine distributed execution.
A main process manages the communication across SMC generations and post-processing of accepted particle populations, while the computation-heavy simulations and summary statistic evaluations are delegated to a set of parallel worker processes (see Fig. 2).pyABC provides two parallelization strategies, static and dynamic scheduling, which distribute work across computational resources, to minimize the overall CPU time, or the wall-time, respectively (Klinger et al. 2018).
To facilitate the setup of the required server-worker architecture, the FitMultiCell pipeline comes with bash scripts that handle the resource allocation and communication between nodes, and which can be used to run the pipeline on HPC clusters with minimum code modification.While parallelization is available at multiple levels, the most efficient resource use will often be achieved by dedicating many workers to the particle population and setting Morpheus to single-threaded simulation mode.

Analysis and visualization
The FitMultiCell pipeline provides a GUI for the analysis of the result of the parameter estimation process (see Fig. 3).This GUI can load the shareable SQL database generated by pyABC, but is flexible enough to visualize the results for other sampling tools after the result files were reformatted.Among other things, the GUI allows for the visualization of (i) samples using uni-and bi-variate plots, (ii) sampling diagnostics like the decrease of the acceptance threshold over generations, and (iii) the ability to generate the code for any plot that is generated.A detailed documentation for the GUI is available on pyABC (https://pyabc.readthedocs.io/en/latest/visualization.html) and FitMultiCell (https://fitmulticell.readthedocs.io/en/latest/youtube.html) documentation pages.

Implementation and development
The FitMultiCell pipeline is written in Python and available on GitLab (https://gitlab.com/fitmulticell/fit)and Zenodo (https://doi.org/10.5281/zenodo.10040172)under a BSD-3-Clause license.It is being developed by contributors from three institutions and others are invited to contribute.Code quality is monitored via unit tests and continuous integration.It can be installed from the Python Package Index (PyPI).Detailed documentation of the platform is available at https:// fitmulticell.readthedocs.io,which covers installation, setting up the modeling and estimation problem, and running the parameter estimation process, including on HPC and cloud infrastructures.
To ensure ease of use, FitMultiCell pipeline integrates currently only tools which are easy to install and available under permissive licenses:

Results
To assess the performance of the FitMultiCell pipeline, we evaluated it using a broad spectrum of test and application examples.In the following, we present its core properties for a few selected models of multi-cellular processes.All the following results were obtained on the JUWELS standard compute nodes of the Supercomputing Center in Ju ¨lich, Germany.Details on the technical specifications are provided in the Supplementary data.

Standards supported by FitMultiCell pipeline allow for re-implementation of published application problems
The FitMultiCell pipeline allows for the standardized description of models, datasets and parameter estimation problems.To ensure that the supported standards are practically useful, we assessed to which degree we can implement already published application problems.
We considered three applications capturing different biological and technical problems: (M1) a model of virus transmission via cell-free virions and cell-to-cell contact (Kumberger et al. 2018); (M2) a model of tumor spheroid growth (Jagiella et al. 2017); and (M3) a model describing the mechano-sensing of the metabolic status during liver regeneration (Meyer et al. 2020).The models M1 and M2 were originally not available in MorpheusML and were re-implemented for the purpose of this study, while M3 was already published in MorpheusML.For model M1, Morpheus allowed for a one-to-one re-implementation of the original setup up to the numerical simulation schemes.For model M2, some reformulations were necessary.Here, we aimed to (i) resemble the biological processes (and subprocesses) contained in the original models, (ii) preserve the meaning of the model parameters, and (iii) achieve similar dynamics for key process characteristics.The most substantial change was the replacement of the representation of cells and modeling based on an unstructured grid by a cellular Potts model.To confirm the validity of the re-implementation, we assessed a broad spectrum of model characteristics, including statistics over space and time (such as radial profiles and growth curves for model M2).We observed a good resemblance for model M1-as expected-and minor difference for model M2 (results not shown).For all applications, we created the PEtab-MS description of the parameter estimation problems.PEtab-MS was flexible enough to describe all parameter estimation problems.This suggests that the standards employed in the FitMultiCell pipeline are FitMultiCell sufficient for a broad spectrum of applications.Details on the application problems are provided below and in the Supplementary Section 3. Furthermore, the Morpheus and PEtab-MS files provide a comprehensive standardized description.

Parallelization in FitMultiCell pipeline provides substantial wall-time reduction
As the efficient use of computing resources is of crucial importance for parameter estimation, we evaluated the parallel efficiency achieved using the FitMultiCell pipeline.We considered a model of the spread of a viral infection within a tissue distinguishing between virus transmission via cell-free virions and direct cell-to-cell contact (Kumberger et al. 2018) (Fig. 4A and Supplementary Section 3.1).In the FitMultiCell pipeline, parallelization is available within and across individual simulation particles.For the considered problems, we parallelize exclusively on the population level to provide the most efficient resource usage; however, this may depend on the specific problem.
To assess the parallel efficiency, we performed parameter estimation using the ABC-SMC algorithm implemented in pyABC with the dynamic scheduling option.The algorithms were run for three generations with two different population sizes: (Scenario A) 1000 particles; and (Scenario B) 10 000 particles.As higher population sizes improve the posterior approximation but are computationally more demanding, both scenarios are relevant.We performed this task using 1, 2, 4, 8, and 16 compute nodes, each consisting of 48 CPU cores.
For both Scenarios A and B, we observed a substantial reduction in wall-time when using higher numbers of nodes, to a fraction of the single-node execution wall-time which already uses 48 CPU cores (Fig. 4B).Yet, the parallel efficiency (PE), defined as PE ¼ fwall À time for run on single compute nodeg fwall À time for run on i compute nodesg Á i ; dropped (Fig. 4C).For the case of 16 nodes (768 cores), we observed a parallel efficiency of 49% and 75% in Scenario A and Scenario B, respectively.The higher parallel efficiency in Scenario B can be explained by the longer time required for individual generations, which reduces the effect of the idle time that usually arises at the end of each generation due to differences in computation times for individual particles [see also Klinger et al. (2018)].In summary, our evaluation confirmed an overall good scaling of the FitMultiCell pipeline, yielding a wall-time reduction of several 10-fold compared to a single-node execution and several hundred-fold compared to single-core execution.As the number of proposed particles required to generate a specific number of accepted particles increases over generations, the parallel efficiency for production runs with the usual 15 to 40 generations should be higher than what has been observed here.

FitMultiCell facilitates the study of heterogeneous datasets via automatically tuned algorithms
To assess the benefit of the tight integration of modeling and parameter estimation tools in the FitMultiCell pipeline, and the availability of advanced inference algorithms, we considered the model of tumor spheroid growth (Jagiella et al. 2017) (Fig. 5 (left)).The original model was used to study growth control mechanisms and the effect of nutrient supply (Jagiella et al. 2016(Jagiella et al. , 2017).Yet, the source code of this multicellular model entangled with its simulator is highly specific and no longer maintained, complicating the use of this model in further studies.
We implemented the model in MorpheusML and made it available through a public model repository (https://identi fiers.org/morpheus/M0007).The cell population was described using a stochastic CPM which accounts for cell division and death as well as cell-cell interactions, while the concentrations of extracellular substances were described using PDEs.The estimation problem was encoded using PEtab-MS, following the original publication by Jagiella et al. (2017).We selected the same seven unknown parameters that were also considered in Jagiella et al. (2017) based on a sensitivity analysis and biological relevance.This ensures comparability of our results against the reference publication.We considered all available datasets: a time-course for the tumor spheroid radius determined by bright field microscopy; and snapshots for radial profiles of markers for proliferating cells as well as extracellular matrix (ECM) abundance determined from fluorescence microscopy on day 17.

FitMultiCell
In our previous publication considering the parameter estimation problem (Jagiella et al. 2017), we manually defined weights to quantify the relative importance of different summary statistics for the heterogeneous dataset.As this was a time-consuming process, here we employed a fully automatic approach for summary statistics weighting based on inverse regression (see Supplementary Section 3.2.1 for details), which is available via the FitMultiCell pipeline.The specification of the parameter estimation task does only require a few lines of Python code given MorpheusML and PEtab-MS files.
We ran the ABC-SMC algorithm with a population size of 500 particles and set as stopping condition a maximum walltime of 48 h, within which the pipeline was able to finish 46 generations.The comparison of simulated and observed summary statistics showed that the model is able to fit the data accurately, with substantially improving fit quality in later ABC-SMC generations (Fig. 5 (middle)).Similar to the original study, the exceptions lie in the model's overestimation of the fraction of proliferating cells and underestimation of the ECM density at short distances from the rim.This corresponds to the findings of the original analysis (Jagiella et al. 2017).The analysis of credible intervals reveals that with the exception of the initial fraction of quiescent cells q init , all parameters are identifiable with small uncertainties.The parameter estimates and uncertainty intervals obtained using our model are in excellent agreement with those in the original analysis (Fig. 5 (right)), even matching the distribution of the least certain parameter q init .Hence, the FitMultiCell pipeline was able to reproduce the original results, while simplifying implementation and eliminating the need for manual tuning.

FitMultiCell pipeline facilitates parameter estimation for new applications
The parameters of multi-scale processes are often still adapted manually, e.g.due to the difficulty of setting up proper parameter estimation and its computational cost.As this involves manual work, it can lead to non-reproducible results and does not provide any information about parameter uncertainties.We assessed whether the FitMultiCell pipeline can easily replace manual parameter tuning.We studied a model describing the mechano-sensing of the metabolic status during liver regeneration after partial hepatectomy in mice, simulating the reaction network dynamics in hepatocytes along the central-portal axis of a liver lobule by a spatial array of coupled ODEs with spatially heterogeneous inputs (Meyer et al. 2020).For this model, the parameters were previously determined manually using data for two observables: the concentration of the YAP protein in the nucleus of hepatocytes (NYAP), and the total concentration of YAP protein in hepatocytes (TYAP), see Fig. 6 (left).
The model was already implemented in MorpheusML and is available from a public model repository (https://identifiers. org/morpheus/M7990).Yet, to avoid the simplifying assumptions that were used in the original study due to lack of appropriate parameter estimation tools and computational resources, we (i) dropped the quasi-steady-state assumption of the intracellular dynamics and performed a full spatiotemporal simulation, (ii) extended the biochemical model by an observer model of the microscopy setup that introduces additional scaling parameters between protein concentration and observed fluorescence intensity, and (iii) included a statistical model for the measurement noise.Complementary, we created the PEtab-MS files encoding the estimation problem for the 14 unknown parameters.Previously, due to the employed quasi-steady state assumption, only the ratios of forward and backward rate constants were identifiable.
We ran the ABC-SMC inference with a population size of 1000 particles and a wall-time limit of 24 h.To save computational resources, we employed an early rejection strategy to reject particles based on a maximum runtime of 15 min for individual simulations not matching the data.The simulated trajectories in later generations of the inference process fitted the data for TYAP and NYAP mostly well (Fig. 6 (middle)).An exception was the pericentral locations (CV-PV zones 1-5) at the last time point t ¼ 5.0 days, where the simulations couldn't match the low concentrations present in the data, hinting at a transition to homeostasis and a time-dependence of some parameters over longer time scales of many days.The credible intervals indicate that all 14 parameters are identifiable (Fig. 6 (right)).Interestingly, the parameter describing the fluorescence intensity normalization (k14) was estimated close to one, which justifies the previous assumption that both microscopy-based observables can be treated with the same normalization (Meyer et al. 2020).Moreover, the ratios of rate constants for reversible reactions were found to match very well with the previous point estimates using the quasisteady state assumption in (Meyer et al. 2020).Extending previous results, we here obtained estimates for the four rate constants k1, k5, k7, k8 and quantified the uncertainty of each estimate.Hence, our study underpinned previous conclusions (Meyer et al. 2020), but also demonstrated that the FitMultiCell pipeline allows directly for an uncertainty-aware study of multi-cellular processes.

Discussion
Quantitative data-based modeling of multi-cellular processes is challenging, because the models are often stochastic and computationally demanding.Furthermore, due to a lack of standards, the reusability of codes and pipelines is limited.Motivated by these issues, we developed the FitMultiCell pipeline, which covers the entire workflow of model development, simulation, and systematic parameter inference (on HPC or cloud infrastructures) based on standardized input formats.Thereby, it contributes to the accelerated testing of biological hypotheses.
We used the FitMultiCell pipeline to study various application problems.This demonstrated that the proposed pipeline is widely applicable and can recover and confirm previous results.Further, we showed that it can replace manual parameter tuning, thus accelerating and solidifying the quantitative modeling of multi-cellular processes.Its modular implementation scales to HPC infrastructures and thereby facilitates inference for computationally expensive problems.
While the FitMultiCell pipeline already provides the necessary features, there are multiple directions in which it can be developed further.(i) Statistical inference for computationally expensive models can become challenging even when using massive parallelization.To address this, the use of cheaper surrogate models could be investigated for guiding parameter search and reducing the number of model simulations (Prangle 2016, Prescott andBaker 2021).Furthermore, one could make use of the idle time at the end of generations, e.g. by presampling the next generation and subsequently correcting for bias.(ii) Methods for the automatic construction of summary statistics need to be explored.While it is possible to construct summary statistics semi-automatically (Blum et al. 2013) for the type of image data frequently accompanying models of multi-cellular processes, automatic construction based on machine learning approaches, more specifically convolutional networks with prior information gained from transfer learning, could substantially improve statistics quality at decreased training cost.(iii) Extension of the FitMultiCell pipeline to additional modeling, simulation and parameter estimation frameworks will increase its applicability.While Morpheus and pyABC cover already a wide range of applications, an integration of further simulation tools [e.g.In conclusion, we illustrated that standardized workflows for the quantitative modeling of multi-cellular processes are feasible.The FitMultiCell pipeline and the standard PEtab-MS provide starting points for further development.Already in its current form, a broad spectrum of projects can profit from them to achieve systematic and scalable inference of unknown parameters.

Figure 1 .
Figure 1.Overview of the FitMultiCell pipeline.First row: User perspective on the biological system, experiment, model formulation, and parameter estimation problem.Second row: Tool and format overview.Third row: Features and details for all components of the workflow.Fourth row: Visualization of the ABC-SMC parameter estimation algorithm.

Figure 2 .
Figure 2. Illustration of the parallel framework of FitMultiCell.Inference level: A main process manages the pyABC analysis workflow and SQL data storage.Population level: In each ABC-SMC generation, the main process delegates to a number of distributed worker processes the task of sampling parameters (from proposal distributions pðxÞ) and synthetic datasets, until sufficiently many fulfil the generation-specific acceptance criterion (such as the acceptance thresholds E).Work distribution, communication, and storage of intermediate results across various computational nodes and processing units are managed via a central Redis server.Particle level: Each single simulation of synthetic data (or groups of simulations for related perturbation scenarios in a multiexperiment setting) is performed using Morpheus, optionally with sharedmemory parallelization via OpenMP.

Figure 3 .
Figure 3. Several screenshots of the visualization and diagnostics GUI.(A) The main screen, (B) stacked bar plot of the number of required samples per generation, (C) 2D kernel density estimation for samples from approximate posterior distribution, (D) parallel coordinate plot for the development of the credibility intervals over generations, (E) line plot of the epsilon threshold per generation, and (F) line plot of the effective sample size per generation.

Figure 4 .
Figure 4. Performance of the parallelized FitMultiCell pipeline.(A) Illustration of the used model M1 for viral transmission within the host tissue.(B) Wall-time of two different fit scenarios across different numbers of used compute nodes, each consisting of 48 CPU cores.(C) Parallel efficiency of the FitMultiCell pipeline.The dashed line indicates perfect efficiency, i.e. a wall-time directly inversely proportional to the number of nodes, compared to single-node execution.For each choice of nodes, three consecutive ABC-SMC generations were run, with population sizes of N ¼ 1000 (Scenario A) and N ¼ 10 000 (Scenario B).

Figure 5 .
Figure 5. Result of FitMultiCell pipeline for the model of tumor spheroid growth.Left: Illustration of the model.Middle: Experimental data (gray) and model fit (green).The fit for ABC-SMC generation 1 (T1), generation 25 (T25) and final generation 51 (T51) is indicated using the 95% credibility interval of the accepted model simulations.Right: Credible intervals for the seven estimated parameters obtained using the original implementation (Jagiella et al. 2017) (gray) and the FitMultiCell pipeline (green).

Figure 6 .
Figure 6.Result of FitMultiCell pipeline for the model of mechano-sensing of the metabolic status during liver regeneration.Left: Illustration of the model.Middle: Experimental data (gray) and simulation results (green) for different combinations of tissue location (from 1 to 10) and time point (t ¼ 0 to t ¼ 5.0 days).The fit for ABC-SMC generation 1 (T1), generation 7 (T7) and final generation 15 (T15) is indicated using the 95% credibility interval of the accepted model simulations.Right: Credible intervals for the 14 estimated parameters compared to the result of Meyer et al. (2020) (circle).Note that the four parameters k1, k5, k7, k8 were not estimated in Meyer et al. (2020), therefore no comparison is available for those.

•
Morpheus is available as a git repository (https://gitlab.com/morpheus.lab/morpheus)under the BSD-3-Clause license.It is written in Cþþ and available for all major operating systems as pre-compiled packages together with training materials and a model repository at https://mor pheus.gitlab.io/.•pyABC is hosted on GitHub (https://github.com/icb-dcm/pyabc) under the BSD-3-Clause license.It is written in Python and can be installed directly via (PyPI).Its many features are documented at https://pyabc.readthedocs.io.