Modeling genotypes in their microenvironment to predict single- and multi-cellular behavior

Abstract A cell's phenotype is the set of observable characteristics resulting from the interaction of the genotype with the surrounding environment, determining cell behavior. Deciphering genotype-phenotype relationships has been crucial to understanding normal and disease biology. Analysis of molecular pathways has provided an invaluable tool to such understanding; however, typically it does not consider the physical microenvironment, which is a key determinant of phenotype. In this study, we present a novel modeling framework that enables the study of the link between genotype, signaling networks, and cell behavior in a three-dimensional microenvironment. To achieve this, we bring together Agent-Based Modeling, a powerful computational modeling technique, and gene networks. This combination allows biological hypotheses to be tested in a controlled stepwise fashion, and it lends itself naturally to model a heterogeneous population of cells acting and evolving in a dynamic microenvironment, which is needed to predict the evolution of complex multi-cellular dynamics. Importantly, this enables modeling co-occurring intrinsic perturbations, such as mutations, and extrinsic perturbations, such as nutrient availability, and their interactions. Using cancer as a model system, we illustrate how this framework delivers a unique opportunity to identify determinants of single-cell behavior, while uncovering emerging properties of multi-cellular growth. This framework is freely available at http://www.microc.org.

Introduction A comprehensive understanding of living organisms, including their development and the occurrence and progression of disease, requires systematic efforts into deciphering the link between the multitude of different genotypes and phenotypes, that is the set of observable characteristics resulting from the interaction of the genotype with the surrounding environment, determining cell morphology and behaviour.
Efforts to characterise such relationships have proliferated in recent years, thanks in part to the increased capability to efficiently collect the necessary data in an ever higher number of organisms and individuals.
Such efforts have spanned from cataloguing genetic variation in thousands of individuals [see e.g. (1)] and searching for genotypes-phenotypes associations, to silencing or inactivating thousands of genes in laboratory high-throughput screens to study their function [see e.g. (2)]. However, efficient instruments to achieve a comprehensive understanding of the causal nexus between a given genotype and the observed phenotype are still lacking, and this is particularly true when such a nexus is complex and multifactorial (3)(4)(5). One promising means to achieve such an understanding has been the characterisation and modelling of biological pathways (5).
Several of the key biological pathways regulating cellular function are increasingly understood, along with their dysregulation in multiple diseases (6)(7)(8)(9). However, much less is known about how these pathways interact and determine the behaviour of individual cells and multi-cellular systems. This has fuelled methodological development of efficient representations of such interactions in order to facilitate the study of the underlying potential mechanisms. A number of different approaches have been proposed, including modelling by differential equations, and network modelling methods, such as petri nets and logical networks (10)(11)(12). Such methods have produced encouraging results (13)(14)(15)(16), but it is becoming increasingly evident that modelling molecular pathways and signalling, or gene, networks in isolation, dissociated from the cellular context, does not reflect the crucial impact of the microenvironment in determining the phenotype (17)(18)(19). Additionally, as single-cell sequencing and imaging technologies are providing new in-depth information about the genotype and phenotype of single, or small groups of, cells (20), modelling approaches that enable consideration of cells both as independent entities and as a population, are becoming increasingly attractive.
To address the above needs, we have developed a novel computational framework which combines Agent-Based Modelling (ABM) and gene network modelling. This framework tackles the fundamental challenge of integrating genotype with phenotype data, while accounting for certain important aspects of the physical microenvironment, a key determinant of the phenotype. The most innovative aspect of this framework is that it enables to build models of the genotype-phenotype relationship in a 3D spatially-aware microenvironment, including aspects such as signalling to and from the microenvironment, and signalling between cells. Importantly, the collective behaviour and evolution of cellular populations emerges from the properties and behaviour of individual cells, which in turn are governed by the underling dynamics of specific signalling networks, and the interactions with the surrounding cells and microenvironment. This 1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 permits to predict the behaviour of individual cells, and the entire population of cells, and it also enables to study of the possible causative mechanisms of such behaviour.
In this paper, we introduce this modelling framework, illustrate the capabilities of microC, a first cloudbased implementation of the framework, and we illustrate the range of potential applications that it enables.
Specifically, we perform perturbation experiments of increasing complexity, in which we monitor over time the three-dimensional (3D) growth and evolution of mixed populations of cells.
To achieve this, we chose the example of cancer: a complex disease where methods to study the link between genotype and phenotype are particularly and urgently needed (4). To inform our choices for the gene network and the model parameters, we exploited previously acquired data on gene interactions and cell growth from a number of independent publications. We then built our initial model in a mechanistic "bottom-up" fashion, progressing from the individual elements to the whole system. Following this strategy, we simulated the 3D growth of cell spheroids focusing on pathways underlying the main hallmarks of cancer, including sustained proliferative signals, resistance to cell death and evasion of growth suppressors (9).
We thus considered a set of alterations amongst the most frequently observed across all cancer types, namely EGFR activation, p53 loss-of-function and PTEN loss-of-function (21). We then asked to what extent this initial model, which has been built based on general assumptions and not optimized for a specific cancer type, could reproduce experimental results not used for the model construction and obtained in a cancer which often displays these mutations. For this we chose basal-like triple negative breast cancer. This type of breast cancer lacks receptors for the hormones oestrogen and progesterone, and the Her2 protein, and thus is not responsive to treatments targeting these. Importantly, an immortal though not tumorigenic mammary tumour cell line exists, MCF10A, which is considered a suitable premalignant model (22), allowing us to assess the effect of inducing these alterations as single drivers, or together.
With this model, we gradually increased the complexity of our simulations to investigate the effect that varying genetic and microenvironmental parameters has on the model predictions, and on the resulting clonal competition, signalling to and from the microenvironment, and cell-cell interaction.

Rationale for the framework
Previous studies have modelled cells as computational agents and started to consider replacing ABM conditional statements, that drive cellular behaviour, with gene networks, that can represent more realistically the dynamical features of the intra-cellular system. In particular, small scale predefined logical networks have been used to model the cell-cycle arrest in a model of avascular spheroid growth (23) , and   1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 to study cell differentiation in a hyper-sensitivity reaction model (24). However, these networks were considered as static entities, providing simple rules and not fully embedded in the ABM simulation. In another example, cells have been equipped with relatively more complex decision making models, represented as differential equation (25), but this approach was also not fully embedded in the ABM modelling, and is further limited by the requirement of prior knowledge for the many kinetic parameters to represent pathways accurately. microC fully exploits ABM technology. Specifically, the cellular behaviour is determined by a gene network, encapsulated within the cell, and by the interactions of the gene network with the surrounding microenvironment ( Figure 1). Both the cells and the inner networks are simulated using ABM. As a result, cellular behaviour may be studied both at the single-cell level, decoding the link between a cell genotype and its phenotypic realization in a given contextual environment, but also at the more global level, allowing the search for emerging properties of the multi-cellular system.

Key elements of the framework
To define a specific model in the microC modelling framework, a number of inputs and parameter values need to be provided. These define the environment of the simulation, some of the cell's characteristics, the number of replicates considered in each simulation and the length of the simulation, the choice of the gene network, the cell mutation profiles, and how a cell interacts with the environment and other cells. These parameters, and how they define the specific models, are presented below, further discussed in the microC protocol available via the Protocol.IO repository (26). Of note, in this work, we set the parameter values for the individual elements of the model based on previously published data (see sections below). We then carry out sensitivity analysis to understand the implications of these choices for our specific case.

Cells as computational agents
Cells are represented in microC as computational agents acting and interacting in a three-dimensional (3D) space. Cells are arranged in a rectangular 3D grid where each voxel may be occupied by only one cell.
They may interact with the microenvironment by consuming oxygen, and with other cells, via diffusible substances, such as cytokines, chemokines and growth factors/hormones, whose production is defined by the network dynamics. The parameters regulating these aspects of the physical microenvironment, namely the grid dimensions, the presence of neighbouring cells, and local chemical concentrations, can be customised by the user based on specific assumptions suitable for the system to be modelled (the details for each parameter are discussed in the following sections). These assumptions give rise to certain aspects of physical dimension and spatial competition among cells. Furthermore, cells are also metaagents, namely each cell is itself populated by a community of computational agents, the genes and molecules, which act and interact as a network inside the cell agent.

Cell mutation profiles
The microC modelling framework enables to input mutation profiles, representing the specific mutations present in each simulated clone. Thus, it might be used to represent the clonal make-up of specific 1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 samples, such as 3D mixed-population cellular spheroids. Users may define mutation profiles via an input file uploaded to microC. Such files define the (sub)population of cells where specific gene mutations are present. By default, if not mutation profile is provided, the status of each gene in each cell is initialized randomly as active/inactive. If a mutation profile has been defined, this random initialization is overwritten by the specific mutations. The mutations are introduced in the model as constraints on the gene status, such as a constantly active/inactive status (e.g. constitutively activating or inactivating mutations), or they can be introduced as a change in the rules regulating the gene behaviour (e.g. amplified or conditional behaviour). There is no limitation with respect to the number of mutation profiles that can be simulated simultaneously, or the number of mutated genes . In the extreme case, each cell can have a distinct mutation profile, and multiple mutations can occur in each gene of the encapsulated network.

Subcellular gene networks
Gene networks are encapsulated within cell agents, and drive their decisions. Although any type of network model may be used in this framework, our current implementation exploits logical Boolean networks. The latter have been shown to preserve key dynamical characteristics of the gene network (27) and they can be designed quickly, without the need for accurate estimates for a large number of parameters. Briefly, the (gene) nodes of such logical networks can have only two states (active or inactive). Nodes may be connected with other nodes via links. All nodes are assigned logical rules that determine the current and future state of the node. We use asynchronous network update, first described by Thomas (28), and widely adopted since in Boolean gene networks. We distinguish between four types of nodes: genes, receptors In the specific model developed here, we considered as a starting point a previously developed large mitogen-activated protein kinase (MAPK) network (29). This is a Boolean network that has been assembled in a knowledge driven, mechanistic, manner by considering gene-gene relations (e.g. activation/repression) as reported in the published literature. As such, this network it is not specific to a given context, cell line or genotype; however, it is likely to be somewhat biased towards signalling and interactions observed in cancer cell lines, as the pathways modelled are key to the hallmarks of cancer.
We then extend this network to introduce a hypoxia responsive module; this new module and the full network can be explored in Figure 3    Indefinite proliferation is unlikely to happen due to spatial constrains (all neighbouring slots will be occupied at some point) that will in turn force the cell into a growth arrest state.
 Apoptosis. The cell dies and is removed from the simulation.
 Necrosis. The cells dies; it remains in the simulation and occupies space, but it does not otherwise actively participate.
 Growth Arrest. The subcellular network simulation for a cell in growth arrest is "paused" for a given period of time (3 rounds of simulations in our initial model). Furthermore, the cell interacts with the microenvironment at a reduced rate, for example the consumption rate drops to half of the normal rate.
 No decision. The cell takes no action; the simulation continues until a decision is made.
Actions associated with apoptosis and necrosis are executed immediately after the cell-fate decision has been made, whereas actions associated with proliferation and growth arrest decision are executed shortly after the end of a time frame defined by the user (decision window). During this time frame the network in each cell is simulated and the activation status of cell-fate decision nodes may change.
A spatially-aware environment One of the most novel aspects of microC is that cell-microenvironment and cell-cell interactions are modelled, via exchange of diffusible substances between cells, or between a cell and its microenvironment.
Importantly, the environment in microC is modelled as computational agent patches. Concentrations of the diffusible substances are transferred through step functions and may activate receptor nodes of the network. Those receptors may in turn trigger an autocrine or paracrine interaction, influencing cells towards specific cell-fate decisions, or the production of certain substances defined by output nodes.
Of note, in microC each 3D voxel is defined as an agent. These voxel or "patch" agents are assigned rules defining their geometry and the interaction with the cells they contain, and how they communicate with other voxel agents. This means that different parts of the environment could be defined by different patches, hence different rules. However, our in initial model and implementation the rules are set as equal for all voxels (e.g. shape, size, number of cell per patch, diffusion). Cell-microenvironment (cell agentvoxel agent) interactions are associated with a list of environmental resources (for example oxygen and growth factors), whereas cell-cell interactions may be the result of user-defined substances, such as cytokines, chemokines, growth factors, and hormones. Diffusible substances are also simulated in microC as agents, and their behaviour is simulated following the diffusionreaction equation: where, u is the concentration of the diffusible substance, Du is the diffusion coefficient of substance u, and  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63 64 65 rectangular lattice. The grid cell size may be adjusted to include 1 (1x1x1) or 27 (3x3x3) cells using the "grid sparsity" parameter.
Cells are modelled as sinks that consume oxygen at a rate proportional to the local oxygen concentration.
In particular, oxygen consumption is modelled through the equation: where R0 is the initial consumption rate, C is the concentration of oxygen in the specific grid cell, CN,f is a threshold value that determines the lowest possible oxygen concentration (currently fixed at 80% of the oxygen activation threshold), and finally CO2,opt is an optimal oxygen concentration, currently set to 0.28mM. The latter two parameters have predetermined values in microC, whereas the initial consumption rate (R0) and the oxygen activation threshold may be set by the user. The latter is a precondition that triggers the necrotic cell-fate decisions. We set the necrosis threshold at 0.02mM (23). Cells in growth arrest, consume oxygen at half the initial rate, whereas necrotic cells do not consume oxygen. Parameters such as the diffusion coefficient and the initial/boundary conditions of oxygen concentration can be adjusted as required by the specific application.
Cell-cell interaction is modelled using the diffusionreaction equation (equation 1), with Su representing the sources and sinks (cells) of hormones, cytokines and any user-defined diffusible substance.
Both the consumption of diffusible substances (consumption rate: R0) and the production of these substances by cells (production rate: P0) are defined by the user and considered to be constant.
Production/consumption of diffusible substances is conditional to the activation status of the corresponding receptor/output node, shown here as a Boolean function Iijk = 0/1, with i,j,k denoting the position of the cell in the 3D lattice.

Spheroids growth measures
In our growth curves simulations, we use the number of cells as an indicator of the size of the spheroids at any given time, and we have used sphericity to assess their degree of roundness: where ψ is sphericity, V is the volume of the object and A its surface area. The radius of a spheroid is determined at each stage of growth as the average distance between the coordinates of the initial centre point of the simulation, and the outermost cells of the growing spheroids. 1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 The models presented in this study are freely accessible via a web interface (30). This interface also enables modification of the models and input parameters to conduct experiments other than those discussed in this paper. We have prepared a detailed protocol ( is a widely-used XML-based standard exchange format for sharing data; it is a flexible data model that can be used for object-relational data and a wide variety of graphs (31). GraphML is another XML-based, widely-used, data sharing format for graphs (32). GINML is an extension of GXL and can be produced for example by the logical model editor GINsim (14). The web server converts any of the above formats to the

Results
In-silico growth of a heterogeneous population of cells communicating within a threedimensional microenvironment We present a modelling framework, microC, which enables simulations of individual cells or group of cells, where each cell contains an inner gene network that simulates the cascade of signalling events occurring in each cell upon stimulation, and determines the cell behaviour ( Figure 1). To illustrate microC, we report the different ways in which it can be applied (Figure 1 A-E). Choosing cancer spheroids as a model system, we illustrate how a specific model can be constructed from its individual elements; we carry out simulations under different perturbations (e.g. mutations, nutrients availability); we discuss how our predictions agree or disagree with results from wet-lab studies; and we conduct sensitivity analyses to assess the impact of the different parameter choices on the model predictions .   1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 As initial step, we assessed our ability to import the required gene networks, and confirmed the faithfulness of our ABM encoding of these networks. We then scaled up the simulation by increasing the genetic and microenvironmental heterogeneity in a stepwise manner. Specifically, we considered a population of homogenous pre-cancerous cells growing in a 3D environment; then, we gradually introduced mutations in onco-and tumour suppressor genes ( Figure 1A) to study how they affect the multi-cellular growth ( Figure   1B). Subsequently, we allowed the resulting clones (carrying different single or multiple mutations) to grow in competition ( Figure 1C), and studied the parameters affecting their evolution. Finally, we investigated the additional effects of introducing extrinsic perturbations such as lack of oxygen and presence of growth factors, and enabling cell-microenvironment ( Figure 1D) and cell-cell interactions ( Figure 1E). We report the parameter values chosen for each experiment in Table S1, while a more detailed description of the parameters may be found in the supplementary protocol and in the online documentation file in microc.org.

Evaluation of the dynamics properties of the inner-cell gene networks
Whilst there is a large number of methods, from traditional statistics to emerging deep-learning approaches, which permit accurate prediction of the behaviour of a population of cells given some initial inputs; such methods often constitute a black-box when it comes to interpreting mechanistically the results.
They emphasize the predictive ability of the model, with respect to the study of the possible causative mechanisms of the predicted behaviour. In this first example, we show how for each single cell microC enables monitoring of the paths which have followed to arrive at a given prediction. In particular, we show how microC enables studying the dynamical properties of the gene networks within each cell.
Firstly, we verified that microC could correctly import and execute a gene network developed in the GINSim interface (14). This involves to reformat and recodify the network so that it can be executed using our agent-based framework ( Figure 2). As we have automated this reformatting and recoding in microC, we needed to check its faithfulness. We asked how intrinsic perturbations, namely mutations commonly observed in cancer cells, and specifically in breast cancer, affect the functioning of the network. To this end, we compared the gene activation profiles across populations of cells carrying different mutations and we determined the differences in the resulting fate of the cell, focusing on the three possible decisions of proliferation, growth arrest and apoptosis, and then necrosis when we introduce oxygen diffusion. This first simulations showed that we could reproduce the stable configurations (or stable states) which were previously reported for this network in the published stable-state analysis (Figure 2A and Figure S1A). This illustrates the ability of microC to execute faithfully a previously defined logical network using a ABM approach. Since our protocols are standard, this opens the technical possibility of reusing any such networks developed using the same standard formats.
In addition to the published results, and thanks to the ABM approach, we could also evaluate the probability of occurrence and temporal profile for cell-fate decisions by performing hundreds of replicated simulations ( Figures 2B, Figure S1B, S2). This reveals that not all stable-states are equally likely to occur, and it uncovers transient states that would remain undetected in a standard stable state analysis. For example, we found that there was a contained but measurable (7%) probability of proliferation for cells with a single loss-of-function mutation in p53 (p53-) ( Figure 2B). Often transient states have been dismissed as 1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 non-biologically relevant. However, if and when they are, or not, is neither a trivial nor fully answered question. Thus, to detect such states in a network analysis and highlight their potential impact on the different fate decisions, is important in order to start to address their relevance.

Effect of mutations on gene network dynamics and multi-cellular growth
The range of abilities, or hallmarks, that a cell needs to acquire in order to progress to become a cancer cell have been extensively described, and the role of somatic mutations in such processes is well documented (9). However, the chain of events from a single mutation occurring in a normal cell to the acquisition of such hallmarks, and then to the occurrence and progression of cancer, is extremely complex and not fully understood. Here, we used microC to dissect such question by asking how the occurrence of single and multiple mutations might impact on the behaviour of signalling networks, and how this results in different patterns of single-cell and multi-cellular growth.
We focused on mutations frequently occurring across cancer types, namely loss-of-function/inactivating mutations in the well-known tumour suppressor genes p53 and PTEN, and the activation of the known cancer driver EGFR. We then monitor the 3D growth patterns of cells where these mutations were introduced as single mutations (EGFR activating mutation, EGFR+; p53 loss-of-function, P53-; or PTEN inactivation, PTEN-), or in combination of two or three. We compare the resulting growth curves with those predicted using a wild type (WT) simulation, namely cells not carrying any of these mutations. Of note, for these initial simulations we consider a media with no growth factors nor other added nutrients, simulating a condition of cell starvation (see Table S1 for all parameter choices).
In these conditions, our model predicts that clones carrying multiple mutations are characterised by a significantly more aggressive phenotype with faster growth with respect to clones carrying single mutations in these genes ( Figure 2C and 2D, and Figure S3). Furthermore, our model predicts that activated EGFR signalling is a determinant for rapid growth under starved conditions. Specifically, in our simulations all EGFR+ clones (EGFR+, EGFR+PTEN-, EGFR+p53-, EGFR+p53-PTEN-) all exhibit initial exponential growth while clones with no activated EGFR signalling did not grow or grew at a much slower rate ( Figure 2D).
To assess the usefulness and accuracy of our predictions we considered experimental data obtained using cells where mutations in EGFR, p53 and PTEN have been induced either alone or in combination. We  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 showed increased growth, and the triple mutant grew significantly more rapidly and formed significantly more colonies than either of the matched double mutants. These experimental results agrees with our simulations. We though observed also a discrepancy between Pires et al experiments and our predictions as in their hands PTEN as individual oncogene could stimulate 2D growth, but not 3D growth, of MCF10A cells in the absence of exogenous growth factors. In our predictions we did not observe any significant difference between the 2D and 3D growth of the PTEN clones.
We then compare the sphericity as further geometrical property of the spheroids growth in a 3D environment, other than simply size in time. Sphericity is a measure of how near an object is to a perfect sphere (see Methods). This showed that spheroids formed by the most aggressive clones (higher proliferations rates), namely EGFR+p53-and EGFR+p53-PTEN-, not only grew faster but they also grew in a more symmetrical manner, with higher sphericity than other clones ( Figure S4). This is understandable from a geometrical point of view, as in our model the clones which tend to have proliferation as their main action, rather than apoptosis or growth arrest, can make best use of the space around them. However, the accuracy of this prediction would need to be confirmed. In this respect, the difficulty of growing some of the less aggressive cell lines means that there is not published evidence on these morphological aspects, and they are hard to assess in an experimental context. However, this is an intriguing result which suggest that morphological characteristics might be a useful aspect to consider in future studies linking the way a spheroid grows with its clonal composition.
These results confirm first that cancer can occur from multiple evolutionary paths, but they also suggest that a proliferation stimulus, namely EGFR activation, followed by the loss of a tumour suppressor, p53 loss-of-function, and then PTEN loss-of-function, results in the most rapid evolution. This appears to support the preponderance of experimental data indicating that p53 mutations is a relatively late, rather than cancer initiating, event in a number of cancers (see e.g. (37)). However, the evidence on this point is contrasting, and it is well known that p53 is affected by multiple mutations, with different functional implications (for a review see e.g. (38)), so further studies are needed to include such differences.
Importantly, and differently from other approaches to study clonal evolution, microC enables exploration of how differences in growth rates between cells carrying different mutations might result from the underlying characteristics of the network (Figure 2 and S5). For example, EGFR activation directly affects the status of a large group of genes including ELK1, CREB, MYC and RAS that promote proliferation and block apoptosis ( Figure S6), so to acquire this mutation early would be very advantageous.
Finally, growth rates depend also on dynamical network parameters, such as the speed of the specific intercellular processes. We performed a thorough sensitivity analysis by changing the value of the the ranking of the clones with respect to their ability to proliferate was maintained ( Figure S7). This was also reflected in the activation/inactivation profiles for the different states in the different clones, where a value of 100 was the optimal choice in order to preserve dynamical behaviour of the model, while minimizing computational intensity of the simulations ( Figure S8).
In the multi-clonal simulations, we reveal that clones with aggressive phenotypes systematically take over the free surface area of the spheroids, thus restricting the rest of the clones in the central parts of the spheroid. This is particularly evident in the 2D geometry ( Figure 4A). This is a striking finding and it implies that the population of the aggressive phenotypes not only increases rapidly against the rest of the clones due to faster growth, but it also creates a physical barrier to for any further proliferation of the inner, slower growing, population.
This effect significantly affects both the growth curves ( Figure 4C) and the final population fractions ( Figure   4D) achieved by the same clones grown in competition and in isolation. The most aggressive clones The population of cells that decrease the most (by 54.5%) are the EGFR+ clones.
Interestingly, these findings can be examined in light of the proliferation rates and the dynamical properties of the different clones ( Figure 2B). In particular, EGFR+p53-and EGFR+p53-PTEN-clones, are the only ones characterized by a single possible outcome for the cell fate decision, which is proliferation. This is a strong advantage when competing with clones that are characterized by multiple decision outcomes. For example, EGFR+ clones have a higher probability of undergoing apoptosis or growth arrest, which constitutes a strong disadvantage under direct competition since apoptotic events free space that are likely to be rapidly occupied by more aggressive clones.

Interaction between the genotypes and the microenvironment affects multi-cellular growth and clonal selection
Although the mechanisms are not yet fully elucidated, increasing evidence points to a key role of the microenvironment in clonal selection, hence multi-cellular growth of heterogeneous populations. To illustrate how microC can address this, we focused on hypoxia simulations as this is one of the major microenvironmental differences between cancer and normal tissue (39)(40)(41)(42). Specifically, we aim to study the formation of necrotic cores in larger spheroids ( Figure 5A-B), and their growth under artificially uniform well-oxygenated conditions (our Control configuration) with respect to growth when oxygen consumption and diffusion are enabled (we refer to this as the Hypoxia configuration) (see also Methods section). To achieve this, we consider environmental agents that simulate the diffusion of oxygen in the microenvironment and have the ability to trigger the hypoxia responsive module (see Methods); we also introduce a new cell fate decision, necrosis, as response to extremely low oxygen concentrations (0.02mM O2,). We then simulate this model in our control and hypoxia configurations.
Of note, we can modify this element and either dissociate EGF from the receptor (so EGF present but not sensed, representing for example a drug treatment inactivating the receptor) or we can decide not to simulate EGF (so EGF not in the environment, representing a knock-out of the EGF gene). This allows us to check the effect of introducing signalling between cells in the model, and conversely the effect of interfering with this signalling ( Figure S3).
The first simulations described below are carried out with this signalling switched off and simply assess the diffusion parameters and occurrence of necrosis. In the next session we will evaluate the EGFR and cellcell signalling component. Interestingly, we observe that EGFR+ clones are the only ones with the potential to outgrow a critical spheroid size that triggers necrosis. For a configuration setup with initial consumption rate 0.005mM.s -1 and diffusion coefficient 10 -9 m 2 .s, the necrotic core that was formed ranged between 20 -360 cells ( Figure 5C) in agreement with previous reports (43). Accordingly, spheroids with necrotic cores had on average 4.8% -23.6% less living cells than spheroids without necrotic cores.
We then study the combined effect of hypoxia (as defined above), and clonal competition (see section 3).
Namely, we introduce all eight clones in the simulation environment and compared growth in isolation and competition, under either hypoxic and control conditions ( Figure S9). Overall, our simulations showed that spheroids where hypoxia was enabled had reduced clonal diversity ( Figure 5D). In particular, we noticed that the EGFR+p53-and EGFR+p53-PTEN-clones increased their presence in the total population. This was additional to the initial enrichment of this population due to competition (white bar series in Figure 5D).
On the contrary, the rest of the clones decreased their presence in the spheroids by an average of 4 -5%.
This selection pressure, and consequent reduction of clonal diversity due to hypoxia can be explained by the spatial distribution of clones ( Figure 4A). Less aggressive clones are more likely to be segregated to the central part of the spheroids, that eventually will become necrotic under hypoxia. Of note, in this example we have not introduced possible mutations which might occur as a result of hypoxia, which could impact on the observed clonal diversity.
Finally, we perform a sensitivity analysis on the parameters involved in the diffusion-reaction equation (Equation 1), namely we change the oxygen consumption rate and the diffusion coefficient to one of a broad range of values ( Figure S10). We found that the consumption rate was the only parameter which, when changed, significantly affects oxygen concentration ( Figure S10A), which may impact spheroid growth indirectly by triggering necrosis. This prediction is in agreement with previous evidence showing oxygen consumption to be the most critical kinetic parameter (43).
We study EGF signalling ( Figure 5E), a response triggered by many stress factors, including hypoxia. We compare growth under two experimental conditions: Oxygen diffusion simulations with disabled EGF signalling (which we call Hypoxia -no signalling, same as Hypoxia in section 5), and Oxygen diffusion simulations with enabled EGF signalling (which we call Hypoxia -signalling), thus where the cells are producing growth factors which are sensed by the nearby cells. For this experiment we used clones that did not constitutively activated EGFR.
We observed a significant increase in the population of cells under the Hypoxiasignalling condition. This increase ranged between 130 -870 living cells, that is 26 -174% of the spheroid size ( Figure 5F). These effects were consistent with the aggressiveness of the clones that we have seen in the previous examples; namely, p53-and p53-PTEN-clonal populations increased significantly more than the PTEN-and WT clones under hypoxia when EGF signalling was accounted for. Interestingly, enabling EGF signalling between cells further increased the tendency of hypoxia to reduce clonal diversity ( Figure 5D and 5G). This reveals that the reduction in clonal diversity attributable to central necrosis ( Figure 5D), is further increased due to the different proliferation rates of clones which are sensing EGF released under hypoxia.
A sensitivity analysis of the parameters determining the strength and the length of the EGF response, namely the activation threshold of the stimulus receptor and consumption rate of the growth factor, showed that low activation threshold values are more likely to activate the EGF receptor for any given EGF concentration above the threshold value, whereas lower consumption rates of EGF are more likely to activate the EGF receptor for longer ( Figure S10B). Both events increase the probability of proliferation, that in turn affects the size of the spheroid ( Figure S10C).

Discussion
To respond to the need to deepen understanding of the intricate relationship between cell genotype and phenotype, we have developed microC, which combines a powerful stochastic method, ABM, and gene networks modelling, into a novel framework for in-silico experimentation. Specifically, microC addresses the challenge of modelling and simulating the dynamics and evolution of a heterogeneous population of cells within a changing microenvironment. ABM is increasingly used to simulate the dynamics and evolution of complex systems in applications ranging from engineering to ecology (44)(45)(46)(47); this approach makes microC a naturally multiscale framework, enabling assessment of both multi-cellular systems and individual cells, and their physical and molecular properties.
One of the most innovative aspects of microC is that it substantially extends the capacity of ABM simulations of living cells, considering each individual cell as a meta-agent. Each cell is considered as a community of computational agents, the genes and molecules, acting and interacting within each cell. This enables the user to modify and/or replace gene networks in the cells, to define new constrains representing 1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63 64 65 different types of mutations in different cells or in the same cell, and to customize cell-cell and cellmicroenvironment interaction parameters. As a consequence, the proposed framework is naturally suited to study how the behaviour of a specific perturbation, such as a mutation, occurring in individual cells within a three dimensional, dynamic microenvironment, affects the collective behaviour of other, similar or different, cells and the whole system. This reveals in some cases unexpected patterns of collective behaviour, not a-priori defined in the model, and which could not be predicted by observing the individual elements. Instead, there are emerging properties of the system as a whole and affect the evolution of the cell population and the behaviour of the single elements in return. microC comes with an example signalling network that can be used as it is, be modified to model in more detail specific pathways, or be completely replaced with a new one by the user so that it is more specific to system studied. Using this signalling network, we illustrated microC features through the study of growth patterns of 3D in-silico spheroids, affected by perturbation of intrinsic (mutations) and extrinsic (oxygen and growth factor availability) factors.
In agreement with experimental results obtained by inducing mutations in the MCF10A pre-cancer model, we demonstrated that clones carrying concomitant mutations in the well-known tumour suppressor genes p53 and PTEN, together with the activation of the known cancer driver EGFR, had a growth advantage with respect to clones carrying single mutations, or combinations of any two mutations. We also predict that the order in which these mutations are acquired can have a significant impact on the spheroid growth rate and final size. Both of these results held true both when the clones were grown in isolation and when they were allowed to compete with other clones. We also showed that this effect increased under more extreme microenvironmental conditions, namely lack of oxygen. Interestingly, we observed that morphological characteristics of spheroids of the same clone varied considerably and may be a source for significant variability in in-vitro experiments (48). We show how the dynamics of a gene network which is specified by the genotype and microenvironmental characteristics, affects the proliferation rate, which in turn has a significant impact on the overall spheroids' size and shape, irrespective of other elements such as adhesion forces which are long known to affect morphology (49).
Finally, we observed expected emerging properties, such as the formation of necrotic cores in the larger spheroids, and new unpredicted emerging properties, such as the effect of hypoxia and EGF signalling on clonal diversity of the spheroids. Importantly, these properties were not encoded at the cellular level, but rose from the cell agents competition for nutrients, space and cellular interaction.
Notwithstanding the modelling possibilities that this novel framework opens, it is important to recognise that the current implementation of the methodology has some limitations. Firstly, although microC accounts for certain important aspects of physical modelling such as the three-dimensionality, the presence of neighbouring cells, and local concentrations of chemicals and molecules, it does not account in its current implementation for physical factors such as matrix porosity, stiffness, and topographical cues. Given the importance of modelling these factors (50,51), future extension of microC will need to include them. On the other hand, we chose to model the cellular environment using ABM; namely, our 3D voxels are themselves agents whose shape and behaviour is defined by rules (see Methods). These rules can be different for 1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 different parts of the environment and can change in time, facilitating the dynamic modelling of physical factors using this framework.