CellSium: versatile cell simulator for microcolony ground truth generation

Abstract Summary To train deep learning-based segmentation models, large ground truth datasets are needed. To address this need in microfluidic live-cell imaging, we present CellSium, a flexibly configurable cell simulator built to synthesize realistic image sequences of bacterial microcolonies growing in monolayers. We illustrate that the simulated images are suitable for training neural networks. Synthetic time-lapse videos with and without fluorescence, using programmable cell growth models, and simulation-ready 3D colony geometries for computational fluid dynamics are also supported. Availability and implementation CellSium is free and open source software under the BSD license, implemented in Python, available at github.com/modsim/cellsium (DOI: 10.5281/zenodo.6193033), along with documentation, usage examples and Docker images. Supplementary information Supplementary data are available at Bioinformatics Advances online.


S.1 Cell Shape Geometries
CellSium allows to customize the individual cell shape, its appearance and evolution in time. Shapes are modelled using non-overlapping closed polygon shape in 2D, cells are rendered and placed conforming with physics without overlap/collision. This way, the set of simulated morphologies can be easily extended and dynamically adapted over time. To demonstrate these capabilities we simulated an artificial growing tree colony (see Fig. S.1).

Fig. S.1: Artificial tree colonies without (left) and with (right) contours highlighted.
CellSium comes with several predefined microbial shapes that are commonly studied with microfluidic live-cell imaging. Figure S.2 shows these ready-to-use bacterial cell geometries implemented in CellSium.

S.2 Usage Example: Simulation of Time-lapse Sequences to Visualize and Analyze Cellular Growth Behaviour
How cells keep a stable size over time, despite various stochasticity in the underlying cellular processes, is a question that fascinates cell biologists since decades. Here, we show how CellSium can be used to produce synthetic time-lapse image sequences that render prominent cellular size models and derive characteristic distributions, such as for the cell length. This is exemplified for the so-called "Timer" and the "Sizer" models (Taheri-Araghi et al., 2015).
The "Timer" model assumed fixed time intervals between division events, whereas the "Sizer" model prescribes cell division after a specific cell size is reached. Python encodings for both models is given in listings 1 and 2. offspring_a.length = offspring_b.length = self.length / 2 Listing 2: SizerCell class implementing a simple "Sizer" model.
In Fig. S.3 the results of a simulation (0 ≤ t ≤ 12 h) with the two vanilla "Timer" and "Sizer" models are shown, as given in the listings. Both models produce exponential growth with a growth rate of approximately 0.68 h −1 (R 2 > 0.99). As long as no randomness is added apart from the elongation rate, for the "Timer" model all cells double in a synchronous fashion, resulting in step-wise increasing cell numbers. Here, the "Sizer" model produces a smoother growth curve. Comparing the cell length distributions derived from the simulations with the two models in Fig. S.3 (bottom row) shows that despite the differences in distributions, both simulations result in very similar average cell lengths of approximately 2.15 µm. Furthermore, it is easily possible to add various stochastic effects to the vanilla models. This may be useful for inferring measured distributions.  Starting with a single, randomized cell, the "Timer" and the "Sizer" models were simulated with CellSium. Simulation snapshots were taken for 12 h in 0.25 h intervals. The "Timer" model yielded a growth rate of 0.681 h −1 , the "Sizer" of 0.685 h −1 . In the bottom row, corresponding cell length distributions are shown. For the "Timer" example a mean cell length of 2.15±0.46 µm over 13 275 cells was observed, whereas for the "Sizer" the average length was 2.15±0.48 µm over 17 862 cells.

S.3 Usage Example: Cell and Microcolony Geometries for Computational Fluid Dynamics Simulations
Computational fluid dynamics (CFD) studies provide the ability to shed light on the impact physical conditions, such as nutrient concentrations or flow velocity, have on growing cells and microcolonies, see for example Westerwalbesloh et al. (2015). Here, we show how CellSium is used to automatically create input geometries of a bacterial microcolony from data acquired by live-cell imaging and image analysis. To this end, the .stl-output is used, producing cells as 3D solids-of-revolution, based on rotating their 2D representation. An example is given in Fig. S.4. Taken the 3D geometry of the microcolony, we produced a mesh using COMSOL ® Multiphysics (ver. 5.3) and simulated glucose diffusion. To this end, the cells were placed within a cube of 40 µm×60 µm×1.2 µm "filled" with medium. A constant glucose concentration of 222 mol m −3 was specified at the top/bottom surfaces of the cube (i.e. at y min/max in Fig. S.5), to model substrate replenishment via the nutrient channels. A more detailed description of microfluidic chip design is given in Westerwalbesloh et al. (2015). The simulated glucose concentration distribution is visualized in Fig. S Fig. S.5: Glucose distribution at steady-state in a rectangular growth chamber, simulated result produced using COMSOL ® Multiphysics. Cells were assumed to take up glucose at a rate of 1.14·10 −6 mol s −1 m −2 , with a glucose diffusion coefficient of 5.4·10 −10 m 2 s −1 at 30°C (Westerwalbesloh et al., 2015). The mesh was generated using COMSOL ® with the 'extra fine' pre-set for general physics, yielding a total of 114 843 elements.

S.4 Usage Example: Simulating Imaging Challenges -Focus Loss
Imaging microbial development is a challenging process and several difficulties occur in practice. For example, low-contrast to noise ratio, illumination changes and focus loss due to temperature changes or mechanical movement are common in microfluidic live-cell imaging. Mimicking these phenomena in the CellSium simulator is possible, by applying custom modifications to the generated images. As an example, the result of a simulated focus loss with different degrees of blurring is shown in Fig. S.6. Such images along with their ground truth help to diversify the training data set.
Fig. S.6: Simulating focus loss. A normalized box filter (5 × 5 kernel) is applied 1, 5 and 10 times to the original image generated with CellSium to simulate various degrees of out-of-focus images.

S.5 Cellsium Feature: Fluorescence Image Generation
To create synthetic fluorescence images with CellSium, the values of a simplified point spread function (PSF) evaluation are repeatedly drawn onto an empty canvas. In the implementation, random coordinates within each cell are picked and used as emitter positions. As PSF, a Gaussian function is used, being added to the image canvas for every emitter position. Fluorescence intensity is modeled by scattering more or less emitters into the cells, normalized by cell area and dependent on the fluorescence parameter of the cell. This approach models fluorophores which are equally distributed within the bacterial cytosol, an example with two differing levels of fluorescence intensity is shown in Fig. S.7.

S.6 Similarity of Real and Synthesized Images
A quantitative comparison of the similarity between real microscopic images and the synthetic images, generated with CellSium, is based on comparing their corresponding intensity histograms. For that, we pick crops of the images, containing similar amounts of cells and cell/background ratios. Figure S.8 shows the resulting histograms for the synthetic and real images, and their correlation. For the histograms, we can obtain 75% Pearson correlation coefficient (PCC), indicating high similarity. The PCC of the histogram pairs can be used as a metric to optimize the parameter of Cellsium to support other microscopic modalities (e.g., bright-field imaging), or to develop tailored noise models.