Simulated Ablation for Detection of Cells Impacting Paracrine Signalling in Histology Analysis

Intra-tumour phenotypic heterogeneity limits accuracy of clinical diagnostics and hampers the efficiency of anti-cancer therapies. Dealing with this cellular heterogeneity requires adequate understanding of its sources, which is extremely difficult, as phenotypes of tumour cells integrate hardwired (epi)mutational differences with the dynamic responses to microenvironmental cues. The later come in form of both direct physical interactions, as well as inputs from gradients of secreted signalling molecules. Furthermore, tumour cells can not only receive microenvironmental cues, but also produce them. Despite high biological and clinical importance of understanding spatial aspects of paracrine signaling, adequate research tools are largely lacking. Here, a partial differential equation (PDE) based mathematical model is developed that mimics the process of cell ablation. This model suggests how each cell might contribute to the microenvironment by either absorbing or secreting diffusible factors, and quantifies the extent to which observed intensities can be explained via diffusion mediated signalling. The model allows for the separation of phenotypic responses to signalling gradients within tumour microenvironments from the combined influence of responses mediated by direct physical contact and hardwired (epi)genetic differences. The differential equation is solved around cell membrane outlines using a finite element method (FEM). The method is applied to a multi-channel immunofluorescence in situ hybridization (iFISH) stained breast cancer histological specimen and correlations are investigated between: HER2 gene amplification; HER2 protein expression; and cell interaction with the diffusible microenvironment. This approach allows partial deconvolution of the complex inputs...

Intra-tumour phenotypic heterogeneity limits accuracy of clinical diagnostics and hampers the efficiency of anti-cancer therapies. Dealing with this cellular heterogeneity requires adequate understanding of its sources, which is extremely difficult, as phenotypes of tumour cells integrate hardwired (epi)mutational differences with the dynamic responses to microenvironmental cues. The later come in form of both direct physical interactions, as well as inputs from gradients of secreted signalling molecules. Furthermore, tumour cells can not only receive microenvironmental cues, but also produce them. Despite high biological and clinical importance of understanding spatial aspects of paracrine signaling, adequate research tools are largely lacking. Here, a partial differential equation (PDE) based mathematical model is developed that mimics the process of cell ablation. This model suggests how each cell might contribute to the microenvironment by either absorbing or secreting diffusible factors, and quantifies the extent to which observed intensities can be explained via diffusion mediated signalling. The model allows for the separation of phenotypic responses to signalling gradients within tumour microenvironments from the combined influence of responses mediated by direct physical contact and hardwired (epi)genetic differences. The differential equation is solved around cell membrane outlines using a finite element method (FEM). The method is applied to a multi-channel immunofluorescence in situ hybridization (iFISH) stained breast cancer histological specimen and correlations are investigated between: HER2 gene amplification; HER2 protein expression; and cell interaction with the diffusible microenvironment. This approach allows partial deconvolution of the complex inputs that shape phenotypic heterogeneity of tumour cells, and identifies cells that significantly impact gradients of signalling molecules.

I. INTRODUCTION
Phenotypic heterogeneity of malignant cells within tumours represents a major clinical challenge as it complicates diagnosis and underpins therapy resistance. This heterogeneity arises as the result of interplay between: a) cellintrinsic differences stemming from genetic heterogeneity and stable epigenetically defined states; b) stochastically arising variability in gene expression; c) environmental inputs in form of physical forces, contact-mediated signals from neighbouring cells, the extracellular matrix (ECM), and diffusible signals in form of gradients of growth factors, cytokines, oxygen and metabolites [23]. Understanding exact sources of variability of clinically relevant phenotypic features is of paramount importance. However, deconvolution of the relative impact of inputs that shape phenotypes is extremely challenging as we lack appropriate research tools.
When considering anti-cancer therapeutics, the primary issue that has recently emerged is that heterogeneity is a critical biomarker of tumour prognosis, as greater heterogeneity provides a clear advantage in the face of the evolutionary bottleneck of anti-cancer therapy -there are simply more paths available to facilitate the generation of resistance [21]. The design of therapies must consider the existing heterogeneity in a tumour and account for the most aggressive cells that may exist; and in failing to do so, one may select for pre-existing resistant cells [22]. Furthermore, a critical generator of phenotypic heterogeneity involves microenvironmental variability within the tumour, as a varying environment contributes greatly to different modes of adaption, upon spatially organised subgroups of cells [32]. In fact, it has been shown that heterogeneity of oxygen distribution within a tumour leads to the adaption of subsets of cells in microenvironmental niches to the hypoxic microenvironment, which has been shown to result in poorer prognosis and more aggressive tumours [19]. Lastly, heterogeneity has also been implicated as an important consideration in the field of immunotherapy, where neo-antigen generation is a function of polypeptide heterogeneity, which is a critical determinant of the success of immunotherapy, and reflects underlying genetic heterogeneity [31].
Microscopy imaging of histological specimens is widely and routinely used in clinical diagnostics of cancers, as well in experimental studies aiming to understand the underlying biology and responses to anti-cancer therapies. After fixation and placement on glass slides, tissues are subjected to chromogenic or fluorescent staining using chemical or antibody-based stains [26]. Staining intensity reflects concentration of the chemical moiety, to which the stain binds. Therefore, digitisation of chromogenic or fluorescent signals allows quantification of concentration of the these chemicals. Since histological slides retain spatial information, they are suitable for the analysis of not only cellular phenotypes and genotypes, but also microenvironmental factors that shape cellular heterogeneity [15]. Statistical approaches have been focused on the task on feature extraction, feature selection, and dimension reduction of large histological images. This field is sometimes known as whole-slide imaging (WSI). Feature extraction can happen at both at, a pixel-level, largely uninterpretable by a human (e.g., pixel intensities compared to neighbours), and at an object-level, recovering features that a pathologist would naturally be interested in (e.g., circularity of cell nuclei, location of blood vessels). Feature selection (along with the preceding staining procedure) can be carried out depending on the histopathology of the specific disorder, or based on dimensionality reduction (e.g., using principle component analysis) and irrelevant features can be ignored. Feature analysis is a statistical/machine learning problem, where the aim is to link the collective cell (or pixel) properties to macroscopic disorders/clinical outcomes [13]. Modern trends include considering ecologically motivated spatial statistics [15,25]. Histology analysis is both cheap and also highly clinically relevant, and a pathologist can diagnose based off an image. However, a histology slide is static and one is seldom able to elucidate any mechanism underlying observations.
Using mathematical modelling, we present a quantitative approach to calculate how each cell alters the local microenvironment using only histological images. Our approach allows us specify two things. First, to what degree a cell is either absorbing from, or secreting into, a local diffusible signalling environment; and second, how much of the observed staining intensities are explainable via diffusion. Therefore, when one stains a histology slide to measure the expression of a specific protein, one is then able to quantify the extent to which that protein contributes to microenvironmental signalling. The method presented is based on postulating that expression levels (as determined by staining intensity) of targets of analyses are determined by an effect of a field of secreted environmental signalling molecules. This signalling field (SF) abstraction integrates all of the secreted factors that impact the expression of a particular phenotypic trait in a paracrine manner (cytokines, gases, metabolites). Our approach ignores physical interactions between cells (e.g., force interactions through the ECM) and relies on availability of multi-channel staining data.
Substantial mathematical modelling efforts have been focused on understanding the impact of the microenvironment on phenotypic heterogeneity using biophysical principles; common themes include diffusion of cytokines, and construction of chemical reaction networks. Efforts at modelling the microenvironment in a spatial setting have largely been focused on forward modelling, where one makes assumptions and rules for a model, initiates the model in some starting configuration, and then evolves it forward in time. Due to the complexity of the models involved, cellular automata approaches have often been utilised [3-5, 8, 30]. Such approaches, depending on the complexity of the rules built into the model, are able to reproduce many experimental/clinical observations, and in some cases are capable of making experimentally validated predictions [6]. Unfortunately, mathematical modelling approaches are limited by: the frequent need for model iteration, where model components are added and removed; the complexity of rules needed to describe the behaviour of biological systems, such that underlying assumptions may often be untestable; and the experimental difficulty of obtaining relevant measurements required for adequate parametrisation of the underlying mathematical model. Our method should aid with future modelling as then one can "rule in" critical determinants.
Our approach is centred on mimicking the process of ablation within the fields of neuroscience [9,34,36] and embryonic development [7]; this is where one kills or disables a single neuron to observe how the remaining system behaves. Each cell is considered a functional unit that changes the SF from an implied baseline value, to the observed value of the staining intensity. The baseline value we calculate as the SF at the cell's location were the cell not present; this calculation is performed by solving a steady state diffusion equation (Poisson's equation) with decay. Since parameter estimation is likely impossible, we calculate the signal staining intensities that we expect based on the postulate of all of the variability coming from the impact of the signalling field. We then compare the expected (baseline) staining intensity of each cell with the experimentally measured (observed ) values. Differences between the baseline and observed values are interpreted as cell impact: how a cell alters the SF. We test applicability of our approach using histological samples of breast cancer.
The paper is structured as follows: in Section II, we present the mathematical details behind our method including: assumptions made and parameter selection. Section III discusses our method applied to a multi-channel immunofluorescence in situ hybridization (iFISH) stained breast cancer data set, in which HER2 gene and HER2 protein expression were studied concurrently. Section IV lays out a toy problem to demonstrate our approach when one uses artificially generated data. In Section V, we conclude and discuss potential future work.

II. METHOD APPLIED TO PARACRINE SIGNALLING
Our approach consists of two stages: a mathematical modelling step (Section II A), and a parameter selection step (Section II B). We first pose a class of diffusion models governed by partial differential equations (PDEs) for the description of SF in the extracellular space. This class of models has a number of free parameters: the effective diffusion constant; the rate of decay and the type of boundary condition posed on the cell surfaces. We then consider model selection by asking: what is the discrepancy between the expression of target being analysed (the observed staining intensity) and the expected SF were the cell absent from the histological slide (the baseline staining intensity). Using the assumption that the SF should account for most of the variance in signal that the cell produces, we carry out model selection over the space of possible model parameterisations. Finally, we then have a baseline staining signal intensity and an observed signal intensity for each cell.

A. Mathematical Modelling
To model paracrine signalling, we assume that cells communicate via a diffusible species present in the extracellular domain called the Signalling Field (SF), with concentration profile u = u(x), which degrades at rate λ > 0. Cells are stained with a probe that binds internally, the chromogenic or fluorescent intensity is not necessarily representative of the SF; however we specify that production of the SF is linearly related to the stain intensity. We do not model other signalling processes such as direct signalling or forces exerted between cells via direct contact (such as adherent or gap junctions) or indirect contact via the ECM.
A histology slide is a 2-dimensional slice though a 3-dimensional tissue. In reality, the cells visible on this slide would be interacting with cells above and below the side (before sectioning the tissue). However, for the purposes of our reconstruction we consider only cells visible in the slide, and consider the signalling field to diffuse in 2-dimensions only. In principle a 3-dimensional analysis could be performed by considering multiple adjacent pathology slides.
We consider N non-overlapping cells labelled i ∈ N = {1, . . . , N } that occupy volumes D i ⊂ R 2 [27] (see Figure  1(a)), so that extracellular domain is Ω = R 2 \ N i=1 D i . We consider the case in which the timescale of diffusion is faster than any other timescale of interest so that the SF is in steady state. The concentration profile u is then governed by where α 2 = λ/κ and κ > 0 is the diffusion coefficient. We suppose that the production/absorption of the SF is proportional to the difference in the average cellular stain intensity c i (averaged over the cell) and the concentration of the SF at the cell boundary. Thus we write [28] a.) The parameter γ ∈ (0, ∞) is a measure of how strong the cellular response is to a difference between the local SF field and the "target" value c i . When γ = 0, the cell does not react to the signalling field at all; as γ → ∞ the cell actively absorbs/secretes the SF as fast as it needs to in order to maintain the interior of the cell at the fixed observed staining intensity c i .
The problem defined by equations (1)-(2) is well-posed and would allow us to recreate the extracellular SF were parameters (α, γ) known. However, it was our aim to learn how each cell modifies the SF. To this end, for each cell i ∈ N , we will determine the difference between the observed staining intensity (c i ) and the expected signal were the cell not present. Therefore, we define a new SF problem with cell i removed. Denoting the resulting concentration profile by u i , we then have and Thus the SF now diffuses in the region occupied by cell i also; the boundary conditions on the remaining cells stay the same for all j = 1, . . . , i − 1, i + 1, . . . , N . The domain for the modified problem is illustrated in Figure 1(b). The problem is also well posed (see Appendix A). The baseline SF were cell i not present is then defined as the mean value of the SF over the cell We could also average over the cell boundary, however this gives no difference in results qualitatively. We define the cell impact to be which can be interpreted as follows: if f i > 0, the cell is secreting factors locally into the SF; and if f i < 0, then the cell is absorbing factors from the SF. We solve the PDE systems (3)-(4) using a finite element method; see Appendix B for details.

B. Parameter Selection
Parameter selection is the biggest challenge to implementation of our method. For metabolic processes, diffusion constants and decay rates are frequently known, for example, oxygen, glucose, etc. However, in our case, the specific chemicals comprising the signalling field are unknown (we note again that this is not the chemical stained for, but a hypothesised downstream factor).
Rather than trying to estimate p = (α, γ) from experiments, we choose p to give a best fit to the data in the following sense. For each stain, we measure the observed staining intensities for each cell c = {c 1 , . . . , c N }. For a fixed set of model parameters p, the method described above can be used to generate the baseline intensities We then choose parameters p * such that the coefficient of determination (denoted R 2 ) is maximised, i.e., where The coefficient R 2 measures the fraction of the variance of the stain intensities that can be accounted for by our signalling field model.

III. EXPERIMENTAL DATA
We apply our approach to the analysis of breast cancer tissue sections stained with nuclear stain DAPI that reflects cellular DNA content, and two important proteins targets: [29] HER2 , a receptor tyrosine kinase that is amplified in subset of breast cancers and is considered to be a potent 'driver' gene [24], and Oestrogen Receptor (ER), a nuclear receptor that mediates cellular response to oestrogen signalling. Both HER2 and ER staining are expected to have profound influence on cells phenotypes. Additionally, using fluorescence in situ hybridization (FISH), data regarding the amplification status of the HER2 gene is available.
The data used collected in compliance with the Declaration of Helsinki and was approved by the regional ethics committee (REK S-06495b) and the institutional review board of Oslo University Hospital Radiumhospitalet (IRB 2006-53). Full details of the experimental protocol can be found in Refs. [2,33].
The cell outlines were found using the GoIFISH software [33], the pixels that formed the cell outlines were down sampled by a factor of 8, so that the cell edges were given by polygons. For the FEM scheme, nodes within domain Ω were placed using Halton node placing [12]. To simplify parameter searches in γ ∈ (0, ∞) (which crosses many orders of magnitude) we mapped it to the bounded variable β = γ/(γ + 1) ∈ (0, 1).
From the method in Section II, we maximise R 2 given in equation (8) for each of the 3 stains for 3 histology slides. The corresponding values of α * and γ * are found in Table I. From Table I, we see that the R 2 max is small for DAPI as expected: DNA content should be independent of diffusible signalling components. In contrast, for the HER2 and ER stains, Table I suggests that the majority of the variance in the observable data set can be explained by SF.
Note that the large values of γ * indicate a strong coupling between the HER2 staining intensity and the SF. It is reassuring that R 2 max and α * are similar for the same stain across different slides. The range of values of γ * are less consistent. In Figure 2, we plot the observed, baseline and cell impact spatial staining patterns for the DAPI , HER2 and ER stains for sample 1. Additionally, we also plot the solution to equation (1) using the p * parameter set (without Observed intensities a.) removing any cells). This allows us to visualise the hypothesised SF. Plots relating to the same stain use the same scaled colour bar. The general trend one finds is that the observed data set is very heterogeneous with regards to staining intensity, the baseline data set looks like a smoothed version of the observed data set, and the cell impact data set shows variations around the baseline.

A. Possible Identification of Microenvironmental Niches
For Sample 1, ignoring the DAPI stain, we plot the baseline HER2 and baseline ER data sets as a scatter graph in Figure 3(a). One observes approximately 3 clusters of points (which can be found using k-means clustering). When looking at the observed staining intensities, these clusters of points appear as less distinct entities, see Figure 3(b). An interpretation of this finding is that there are 3 distinct microenvironmental niches on the slide. Viewing these clusters spatially, one can see that cells with approximately the same microenvironment are located in adjacent locations, see  The labelling from the k-means clustering shown on the HER2 and ER observed data sets; and c.) the k-means clustering labels applied spatially. Figure 3(c). Samples 2 and 3 seemed to also have niches, but without such immediately identifiable groups.

B. Use of Centromeric Probes to Investigate HER2 Gene Amplification Status
Using the FISH experimental procedure, a HER2 gene specific probe was used in conjunction with a centromeric probe to identify the copy number of the HER2 gene for each cell. We write the copy number of cell i as As the data is noisy, we discard data points that are biologically unreasonable, i.e., correspond to a amplification ratio that is infinite (A i = ∞), or less than a single specific probe detected per centromere (A i < 1). We then put the data into two ordinal sets, either unamplified wildtype (WT) cells where the HER2 gene has not been upregulated (A i = 1), or where the cell has amplified the HER2 gene (A i > 1). For the 3 data sets, after invalid data has been discarded, amplified HER2 cells account for 26%, 9% and 17% (respectively) of the cells on the slide. For the first data set, using a logistic regression, one can use the measured HER2 staining intensities c to predict whether the HER2 gene is amplified A = {A 1 , . . . , A N }. This logistic regression is correct with 66% accuracy. Analysing the binary classified data sets for staining intensity (either low or high staining intensity with the threshold determined by the logistic regression), we find that for high staining intensity cells tend to be net secretors of the SF (f i > 0), and cells with low staining intensity tend to be net absorbers of the SF (f i < 0).
However, when looking at cells with low HER2 staining intensity, but with amplified HER2 gene, we find that the cell impact for the HER2 stain is greater than the mean for all low HER2 stain intensity cells (see Figure 4). By use of z-tests (used for n > 30 samples), we reject the null hypothesis that: the mean cell impact for cells with low HER2 stain intensity and amplified HER2 gene is the same as the population mean cell impact for all cells with low HER2 stain intensity; compared to the alternative hypothesis that the means are not the same. This is found to be statistically significant with p < 0.05. The interpretation of this is that when cells have a low staining intensity, but with an amplified HER2 gene, the cells may not necessarily stain brightly but are net secretors into the SF, (and therefore they likely stain comparatively more intensely than their immediate neighbours).
Additionally, it should be noted for cells with low staining intensity: for amplified HER2 copy number, cells are net producers of the SF impacting ER intensity; and for unamplified HER2 cells, cells are net absorbers of the SF impacting ER intensity (for both cases using z-tests, p < 0.05).

IV. METHOD MOTIVATION VIA TOY SYSTEM
In this section we evaluate the efficacy of our method on synthetic data (where we know the exact mathematical representation of the heterogeneity). We generate this data with the following simple model. For each cell i, we suppose that there is measurable quantity y i = y i (t) corresponding to the staining intensity. We suppose that there are a number of cell phenotypes, labelled by τ in set T . We suppose that a each cell produces a signalling field from other cells as well as to some internal dynamical system within the cell. Thus we write

Copy Number
Unamplified Amplified

HER2 Stain Intensity
Low High  Here µ i represents the internal dynamics, and will account for the cell's phenotype τ , and φ i represents the forcing by the signalling field u. As before we suppose that the dynamics of y i are slower than the diffusion of the SF, so that u is in quasi-steady state. In the method described in Section II, this had the form where β is a scaling constant with units length over time, and u = u(x) obeys equations (1)- (2). The baseline was then found by considering the average concentration at the cell's location were the cell not there. For simplicity, in our synthetic system, we will consider cells occupying negligible area, i.e, they will behave as point sources/sinks of the SF -and therefore we do not need to incorporate boundary conditions. In this case, the baseline (b i ) can be calculated by removing the internal dynamics for cell i, i.e., by setting µ i = 0.
With cells as points, we take φ i as the Green's function solution to equation (1)- (2). Thus, where d ij represents the distance between cells i and j. The form of G n depends on the dimension, and is given by where K 0 is the modified Bessel function of the second kind. The baseline is then given by Notice that as cells are points, there is no γ parameter (though in some sense it has been replaced by β).
We specify that there are three cell phenotypes τ ∈ T = {1, 2, 3}. We take N = 3000. 1500 cells are placed uniformly at random in the unit circle; 1500 cells are placed uniformly at random in the annulus between r = 1 and r = 2 (see Figure 5). Using k-means clustering, cells are assigned into 15 groups. Groups are then assigned a cell type label randomly so that there are 5 groups of each cell type. The internal dynamics of each cell type are chosen to be given by and we use the 3 dimensional Green's function as the interaction term. Here i is a random number drawn from i ∼ N (0, 1) and is the means by which we model heterogeneity within a fixed cell phenotype. We choose β = 4π/N and α = 10. We can interpret the cell phenotypes as: cell phenotype τ = 1 is a cell that averages the signal received by neighbours to settle at a steady state close to some intrinsic value i /4; cell phenotype τ = 2 is a passive cell that mimics nearby cells; and cell phenotype τ = 3 is a cell that averages the signal received by neighbours to settle at a steady state close to some intrinsic value 1+ i /4. From random initial conditions in the unit interval (y i (t = 0) ∈ [0, 1]), the system will reach a steady state. The functional forms chosen are such that there is considerable overlap in the distribution of steady states for each cell phenotype, so that it is not immediately clear which phenotype a cell belongs to given the steady state value c i .
In Figures 5(a,e), we see a spatial plot of the cell centres and a histogram of the observed intensities ( c); Figure 5(b,f) reveals the discrete cell phenotypes that are not immediately apparent when viewing Figures 5(a,e). To calculate the baseline ( b), we assume the functional form of the interactions are as in equation (13), but without specifying the value of α, which is determined via a best fit as in Section II B. In Figure 5(c,g), we show a spatial plot and histogram of the baseline cell intensities; and in Figure 5(d, h) we show the cell impact cell intensities. From Figure 5(f) to Figure  Figure 5(g), we see the emergence of a trimodal distribution indicating the three cell phenotypes. The phenotype τ = 2 is easily identified in Figure 5(h) as the population having zero cell impact: this is to be expected since these cells essentially copy what their neighbours are doing.

V. DISCUSSION
We have presented a method to identify and extract diffusible microenvironment signalling from histology slides. We are also able to quantify what percentage of the data is explainable by diffusion mediated signalling. We applied our method to breast cancer histology slides and found that HER2 amplified cells with low HER2 stain intensity have higher a higher cell impact than the population of all low HER2 stain intensity cells. Finally the method was evaluated on synthetic data generated by a simpler model for cell behaviour.
This initial study suggests a methodology that could be applied to more general problems. For example mechanotransduction is another important method of cellular communication [16], to which our simulated ablation approach could be applied.

A. Method Applicability
In our analyses of clinical samples, we used stains for proteins where we are unsure of the exact sources behind variation in stain intensity, be that genetic, microenvironmental, or stochastic. HER2 amplification is widely believed to be a genetic event and should be heritable. Expression levels of HER2 protein are expected to reflect the gene amplification status. On the other hand, some experimental evidence shows HER2 expression might reflect not only genetic material, but also microenvironmental stimuli, such as Notch and NF-κB (RANK) signalling (with the effect of increased HER2 transcription and regulation of the expansion of cancer stem cells) [17,18,20,38]. The interpretation of the R 2 max values should be as what is the maximum percent of the signal variance explainable via our model of cellular communication for paracrine signalling. Therefore, our method can potentially reveal paracrine interaction even in scenarios where variability of the analysed trait is primarily believed to reflect genetic differences.
Due to this work being only a preliminary study, the presented analysis should be viewed as a promising first step and demonstration of the method, rather than bona fide proof of principle. We envision that the method presented here will be directly applicable to defined scenarios of biological and clinical importance. For example, paracrine signalling is responsible for microenvironment-directed therapy resistance against most of the targeted anti-cancer therapies used in clinics [35]. Yet, the signalling fields as well as impact of individual cells on them have not been studied due to the lack of appropriate tools. For example, the method could be used to interrogate the spatial distribution of c-MET phosphorylation, implied in resistance to ALK targeting tyrosine kinase inhibitors in non-small cell lung cancers [35,37], as c-MET phosphorylation should be reflective of microenvironmental gradients of its ligand HGF, which is primarily produced by cancer-associated fibroblasts.
A clear next step to promote use and acceptance of our method would be to see how our method performs against different types of cancer stained under the same protocol. Particularly, it may be of particular interest to investigate cancers that are known to be genetically homogeneous, and so the majority of the variance in observation may come from diffusion. In contract, one could investigate genetically heterogeneous cancers and therefore proportionally less variance may be accountable by diffusion. Additionally, it would be particularly pertinent to design experiments where stains relating to metabolic activity were selected. Cell types are usually identifiable by eye, and our approach may even aid hypothesis of cell function.

B. Data Limitations
Clearly our method as it stands suffers many drawbacks. Regarding use of data, we are stuck working in two dimensions. Working in three dimensions using complete reconstructions of cell geometry would be possible; however, this would be expensive and not repeatable in a clinical setting.
There will also be edge effect artefacts that we have not accounted for in the model. By this, we mean there will be cells that impact the SF, but were excluded by the biopsy extraction and preparation process. One option is to decrease sample sizes by ignoring a layer of cells at the edge of the histology slice.
When considering the data set used in Sections III, for our method to "pick out" important features of the data, we recommend that R 2 max 40% at a minimum.

C. Technique Refinement
It is likely that other methods for parameter selection are also worthy of exploration. For example, one could also make the opposite assumption: that the contribution of genetic heterogeneity is large, and the microenvironment minimally contributes to the observed data set c -however this does not work practically as then one could then either set α → ∞, or γ = 0 and then b i = 0 and c − b is maximised. Constraints have to be introduced in an intelligent manner.
Our model could also be expanded to include different classes of objects, for instance, it would not be difficult to include blood vessel structures. Additionally, were it the case that a cell was stained multiple times and one had an a priori knowledge for how a cell was supposed to function, relevant constraints could be included in the parameter selection method. In this paper, we did not carry out an exhaustive search on parameter selection techniques.

VI. ACKNOWLEDGEMENTS
and where ∂Ω = i ∂D i and ψ is a function satisfying We now aim to prove continuity and coercivity of a(u, v) on H 1 (Ω), as well as the continuity of l(v) on H 1 (Ω) to satisfy the Lax-Milgram Theorem (Theorem 5.8 on page 83 in [14]). Specifically, we wish to show that Continuity of a(u, v) For u, v ∈ H 1 (Ω), by nature of the modulus function, one can immediately write following inequality The Cauchy-Schwartz inequality allows us then to write According to the Trace Theorem [11], since u, v ∈ H 1 (Ω) and because Ω is C 1 and ∂Ω is bounded, then u |∂Ω , v |∂Ω ∈ H 1 2 (∂Ω) and the Trace operator is continuous from H 1 (Ω) to H 1 2 (∂Ω). So, there exists a positive constant C such that According to the Sobolev Embedding Theorem [39], for an open set U ∈ R N , W m,p (U ) ⊂ L q (U ), ∀1 < q < ∞ if mp = N , which is the case here for U = ∂Ω, m = 1 2 , p = 2 and N = 1. Moreover, this injection is continuous and then, for all 1 < q < ∞, there is a positive constant D q such that Therefore, using (A11) and (A12), we can write From this inequality, we can write (A10) as follows Because ∇u L 2 (Ω) ∇v L 2 (Ω) ≤ u H 1 (Ω) v H 1 (Ω) and u L 2 (Ω) v L 2 (Ω) ≤ u H 1 (Ω) v H 1 (Ω) , then the following inequality can be deduced from (A14) allowing to conclude that the bilinear form a(u, v) is continuous on H 1 (Ω).

Continuity of l(v)
Let v ∈ H 1 (Ω), we can immediately write the following inequality Because the function ψ is bounded on Ω, we can write which can be written Using again the Trace Theorem [11], since v ∈ H 1 (Ω), we know that there exists a positive constant C such that v |∂Ω H 1 2 (∂Ω) According once more to the Sobolev Embedding Theorem [39], there is a positive constant D q such that v |∂Ω L q (∂Ω) ≤ D q v |∂Ω H Then we can write (A21) as follows which proves the continuity of l on H 1 (Ω). Then the Lax-Milgram Theorem [14] ensures that the problem (A4) has a unique solution u on H 1 (Ω).
Remark A.1. The previous proof does not work for the particular case γ = 1, which corresponds to a Dirichlet condition imposed on the surface of each cell. The proof of existence and uniqueness in this particular case would need to proceed differently. This case can be treated in defining the closed and convex set K = {v ∈ H 1 (Ω) | v − ψ ∈ H 1 0 (Ω)} and applying the Stampachia Theorem [10].

Practical Approach to FEM Implementation
To practical implement our FEM scheme, instead of putting in the entries to the matrices L, B, R and vector b in individually, it is much simpler to do it each triangle or boundary edge at a time. We write L and B as a sum over the triangles T in the triangulation T (Ω i ) and thereforeL andD are 3 × 3 matrices. These matrices have well known analytic solutions given as For matrix R and vector r, we sum over edges E that form the discretised boundary E(∂Ω i )