MIAMI: mutual information-based analysis of multiplex imaging data

Abstract Motivation Studying the interaction or co-expression of the proteins or markers in the tumor microenvironment of cancer subjects can be crucial in the assessment of risks, such as death or recurrence. In the conventional approach, the cells need to be declared positive or negative for a marker based on its intensity. For multiple markers, manual thresholds are required for all the markers, which can become cumbersome. The performance of the subsequent analysis relies heavily on this step and thus suffers from subjectivity and lacks robustness. Results We present a new method where different marker intensities are viewed as dependent random variables, and the mutual information (MI) between them is considered to be a metric of co-expression. Estimation of the joint density, as required in the traditional form of MI, becomes increasingly challenging as the number of markers increases. We consider an alternative formulation of MI which is conceptually similar but has an efficient estimation technique for which we develop a new generalization. With the proposed method, we analyzed a lung cancer dataset finding the co-expression of the markers, HLA-DR and CK to be associated with survival. We also analyzed a triple negative breast cancer dataset finding the co-expression of the immuno-regulatory proteins, PD1, PD-L1, Lag3 and IDO, to be associated with disease recurrence. We demonstrated the robustness of our method through different simulation studies. Availability and implementation The associated R package can be found here, https://github.com/sealx017/MIAMI. Supplementary information Supplementary data are available at Bioinformatics online.


Introduction
In recent years, multiplex tissue imaging (Bataille et al., 2006) technologies like, imaging mass cytometry (Ali et al., 2020), multiplex immunohistochemistry (mIHC) (Tan et al., 2020) and multiplexed ion beam imaging (MIBI) (Angelo et al., 2014) have become increasingly popular for probing single-cell spatial biology. The technologies help in understanding the biological mechanisms underlying cellular and protein interactions in a wide array of scientific contexts. MIBI (Ionpath Inc.) (Angelo et al., 2014) platform and the mIHC platforms such as Vectra 3.0 (Akoya Biosciences) (Huang et al., 2013) and Vectra Polaris (Akoya Biosciences) (Pollan et al., 2020) produce images of similar structure. In particular, each image is two-dimensional, collected at cell-and nucleus-level resolution and proteins in the sample have been labeled with antibodies that attach to cell membranes. We will refer to the antibodies as markers in the article. Typically, mIHC images have 6-8 markers, whereas MIBI images can have 40 or more markers.
Many of the above markers are surface or phenotypic markers (Shipkova and Wieland, 2012;Zola et al., 2007), which are primarily used for cell type identification. Additionally, there are functional markers (Ijsselsteijn et al., 2019) such as HLA-DR (Saraiva et al., 2018), PD1, PD-L1 (Alsaab et al., 2017) and CD45RO (Lee et al., 2008) that dictate or regulate important cell functions. Both surface and functional markers are quantified as continuous valued marker intensities. However, in the traditional method, the interaction or co-expression effects of the markers are studied by binarizing them. For every marker, a threshold is chosen to indicate whether a cell (or, a pixel) in the tumor microenvironment (TME) (Binnewies et al., 2018) is positive or negative for that marker. If a cell is positive for two markers, it implies that the markers have co-expressed or co-occurred in that cell. Next, for all the subjects considering every unique pair of markers, the proportion of cells positive for both the markers, the proportion of cells positive for only the first marker and the proportion of cells positive for only the second marker are computed. These proportions can then be used in hierarchical clustering (Murtagh and Legendre, 2014) to group the subjects. It can then be tested if the cluster labels correlate with clinical outcomes, such as disease recurrence or time to death (Jackson et al., 2020;Johnson et al., 2021;Koguchi et al., 2015). The step of binarizing the marker expression profiles can be performed either using compatible commercially available software or manual V C The Author(s) 2022. Published by Oxford University Press. assessment of the segmented images. For example, Johnson et al. (2018) used AQUAnalysis V R software (Dolled-Filhart et al., 2010;McCabe et al., 2005) for binarizing the functional markers, PD1, PD-L1, HLA-DR and IDO, and discovered that their co-expression predicted improved outcomes of anti-PD1 therapies in metastatic melanoma. On the other hand, Patwa et al. (2021) used their own clinical expertise and careful evaluation of the segmented images to determine the binarizing thresholds and discovered the interaction between the markers, PD1, PD-L1, IDO and Lag3 to be associated with disease recurrence. It should be pointed out that such expression-thresholding or binarization is also popular in cell-cell communication or interaction analysis (Jin et al., 2021) in the context of single-cell RNA sequencing data. As an example, Armingol et al. (2021) defined a pair of cells to be interacting if the expressions of both ligand and receptor (Wong et al., 1997) in those cells exceed certain chosen thresholds.
The manual threshold selection process for multiple markers can be extremely challenging and is subjective. Alternatively, commercially available softwares, such as AQUAnalysis V R and inForm (Dolled-Filhart et al., 2010;Kramer et al., 2018), can be used for automated threshold selection in the context of a few particular data types. However, these softwares generally follow a 'black box approach' and can be difficult to interpret. On top of these difficulties, many authors have criticized binarizing continuous random variables in general due to the resulting loss of power (Altman and Royston, 2006;Irwin and McClelland, 2003;Seal et al., 2022a).
In this article, we propose a threshold-free approach for studying marker co-expression. We treat the marker intensities as continuous random variables and use their marginal and joint probability density functions (PDF's) to construct a metric of co-expression based on mutual information (MI) (Cover and Thomas, 2006). Unlike the correlation coefficient, MI is capable of capturing non-linear patterns of dependence between the markers and is easily extendable for more than two markers. However, as the number of markers increases, computing the joint PDF becomes increasingly challenging which makes the computation of MI infeasible as well. Therefore, we use a slightly different formulation of MI known as Euclidean quadratic mutual information (EQMI) (Principe et al., 2000) which has a similar interpretation but can be computed more efficiently. The computation algorithm is discussed in Principe (2010) with a simpler assumption that we further generalize. The vector of estimated values of the EQMI of all the subjects is tested for association with clinical outcomes. With the proposed method, we analyzed an mIHC lung cancer dataset (Seal et al., 2022b) finding that a higher co-expression of the markers, HLA-DR and CK was significantly associated with better 5-year overall survival. We analyzed a MIBI triple-negative breast cancer (TNBC) dataset (Keren et al., 2018) studying the co-expression of two sets of functional markers, (a) HLA-DR, CD45RO, H3K27me3, H3K9ac and HLA-Class-1, and (b) PD1, PD-L1, Lag3 and IDO, which are also known as immunoregulatory proteins (IRP's). We found the co-expression of the IRP's to be significantly associated with recurrence. We demonstrated the robustness of our method over the existing approaches through different simulation studies.

Materials and methods
Suppose there are p markers and N subjects with the j-th subject having n j cells. Let X kij denote the expression of the k-th marker in the i-th cell of j-th subject for k ¼ 1; 2; . . . ; p; i ¼ 1; 2; . . . ; n j , and j ¼ 1; 2; . . . ; N: Note that we focus on cell-level data in this article but the framework is readily usable on pixel-level data as well. Let Y ¼ ðY 1 ; Y 2 ; . . . ; Y N Þ T denote a subject-level outcome vector and C be an N Â S matrix of subject-level covariates. Next, we discuss the existing and proposed methods and also summarize them in Figure 1.

Traditional thresholding-based approach to study co-expression of the markers
For every marker k, we choose a cutoff t k to define cell i of subject j as positive for that marker if X kij > t k : The choice of t k can be guided by prior biological insight or by careful inspection of the marker intensity profile. For example, extreme quantiles like (e.g. the 90th or 95th percentiles) can serve as viable thresholds, as we see in the simulations. However, in most real datasets, it requires user-defined thresholds for achieving a meaningful and interpretable conclusion. A cell can be positive for multiple markers. If a cell is positive for a pair of markers (k r , k s ), it would imply that these markers have co-expressed in that particular cell. For every subject, compute the proportion of such double-positive cells, denoted by k þ r k þ s , the proportion of cells positive for only the first marker, denoted by k þ r k À s , and the proportion of cells positive for only the second marker, denoted by k À r k þ s , for k r ; k s 2 f1; 2; . . . ; pg and k r 6 ¼ k s : Next, the subjects are classified into two or more groups based on their vectors of proportions using hierarchical clustering. Note that if it is biologically relevant, instead of a pairwise analysis, one can also study multiple markers jointly, e.g. with four markers, one can count the cells which are either k þ r k þ s k þ t k þ u or any of the possible ð2 4 À 1Þ ¼ 15 combinations and group the subjects based on these proportions. Fig. 1. Comparison of the workflow of the proposed method with the traditional method. We used segmented cell-level data in this article but the method is applicable on a pixel-level data as well Suppose that the subjects are grouped into M clusters. Let Z ¼ ðZ 1 ; . . . ; Z N Þ T be an N Â M matrix of the cluster labels. Z j is a vector corresponding to the subject j with Z jm ¼ 1 if the j-th subject belongs to group m and 0 otherwise. In most common practices, M ¼ 2 is considered. When Y is a continuous outcome, a standard multiple linear regression model with Z as a predictor can be written as where b; c are fixed effects and is an N Â 1 error vector following multivariate normal distribution (MVN) with mean 0 and identity covariance matrix r 2 I N : The null hypothesis, H 0 : c ¼ 0, can be tested using the Wald test or likelihood ratio test (LRT) (Gourieroux et al., 1982). Similarly, when Y is a categorical outcome, a logistic or multinomial logistic regression model (Kwak and Clayton-Matthews, 2002) can be considered.
Next, we consider the case of Y being a right-censored failure time outcome. Let the outcome of the j-th individual be Y j ¼ minðT j ; U j Þ, where T j is the time to event and U j is the censoring time. Let d j IðT j U j Þ be the corresponding censoring indicator. Assuming that T j and U j are conditionally independent given the covariates for j ¼ 1; 2; . . . ; N, the hazard function for the Cox proportional hazards (PH) model (Andersen and Gill, 1982) with fixed effects can be written as, where k j ðtjC j ; Z j Þ is the hazard of the j-th subject at time t, given the vector of covariates C j and the cluster label Z j and k 0 ðtÞ is an unspecified baseline hazard at time t. To test the null hypothesis: H 0 : c ¼ 0, an LRT (Therneau, 1997) can be considered.
2.2 Proposed method: mutual information-based analysis of marker co-expression 2.2.1 Theory of mutual information MI is an information theoretic measure of dependence between two or more random variables (r.v.'s). In contrast to the linear correlation coefficient, it captures dependences which do not manifest themselves in the covariance (Kraskov et al., 2004). For two random variables, R 1 and R 2 with sample space D, marginal PDFs, f 1 , f 2 and joint PDF f 12 , the MI is defined as the following: MIðR 1 ; R 2 Þ ¼ Ð Ð f 12 ðr 1 ; r 2 Þ log f 12 ðr 1 ; r 2 Þ f 1 ðr 1 Þf 2 ðr 2 Þ dr 1 dr 2 : MI has been used as a tool for feature selection in many different contexts (Hoque et al., 2014;Liu et al., 2009;Song et al., 2021;Yang and Moody, 1999). The measure can be easily generalized for more than two random variables. However, due to the curse of dimensionality (Langrené and Warin, 2019), it becomes extremely difficult to estimate the joint PDF and hence, the MI, as the number of random variables increases. Xu (1998) and Principe et al. (2000) looked into alternative definitions of MI that would remain conceptually similar but can be computed efficiently. They discussed two measures, EQMI and Cauchy-Schwarz quadratic mutual information (CSQMI), defined as below, It is trivial to verify that EQMIðR 1 ; R 2 Þ ! 0 with equality occurring if and only if R 1 , R 2 are independent, i.e. f 12 ðr 1 ; r 2 Þ ¼ f 1 ðr 1 Þf 2 ðr 2 Þ for any r 1 , r 2 2 R. Using the Cauchy-Schwarz inequality, we have CSQMIðR 1 ; R 2 Þ ! 0 with equality happening if and only if R 1 , R 2 are independent. Xu (1998) argued that EQMI shares more properties, such as convexity with respect to the PDF's, of the traditional MI compared to CSQMI and thus, is better suited as a dependence measure. It should also be noticed that the form of EQMI is very similar to the distance covariance measure proposed by Székely et al. (2007). There is one more generalized measure of dependence, known as kernel canonical correlation analysis (Huang et al., 2006), which does not share forumlaic similarity with EQMI but can potentially serve as an alternative for detecting non-linear dependence patterns. In this article, we focus on EQMI and propose its usage in the co-expression analysis of the markers in general multiplex imaging datasets used in the study of spatial biology of TME.

Formulation of EQMI
For every subject j, we assume that the expression of marker k is a continuous random variable, denoted by X kj , with sample space D ¼ ½0; 1. X kj is observed in n j cells as, X k1j ; X k2j ; . . . ; X knjj . Suppose there are p ¼ 2 markers and for every subject j, denote their joint PDF as f 12j ðx 1 ; x 2 Þ and their marginal PDFs as f 1j ðx 1 Þ and f 2j ðx 2 Þ, respectively. Following Equation (2), the EQMI between the markers 1 and 2 for subject j can be defined as, Þ can be interpreted as a generalized measure of co-expression of the markers, capable of capturing non-linear dependences. A large value of EQMIðX 1j ; X 2j Þ will imply that the markers 1, 2 have significantly co-expressed in the TME of subject j. EQMIðX 1j ; X 2j Þ is bounded below by 0 for every j but there is no common upper bound for different j's. Therefore, to compare EQMIðX 1j ; X 2j Þ's across different subjects, we need to appropriately standardize the values so that they lie in the same scale. We define a new measure as, We observe that EQMI Ã ðX 1j ; X 2j Þ lies between ½0; 1 because V C ! 0, and has a similar interpretation as EQMIðX 1j ; X 2j Þ.
It is intuitive to generalize the measure for any p ! 2 markers. For subject j, letting f 12...pj ðx 1 ; x 2 ; . . . ; x p Þ to be the joint PDF and f 1j ðx p Þ; f 2j ðx p Þ; . . . ; f pj ðx p Þ to be the marginal PDFs, EQMI Ã can be defined as, Notice that to estimate EQMI Ã ðX 1j ; X 2j ; . . . ; X pj Þ when the true PDFs are unknown, a naive approach would be to estimate the PDFs first by using a kernel density estimation approach (Silverman, 1981). Then, we use the estimated PDFs to compute the terms V J , V C and V M via numerical integration (Davis and Rabinowitz, 2007). However, such an approach would be computationally infeasible for a large p and will defeat the purpose of considering EQMI instead of the standard form of MI. EQMI Ã ðX 1j ; X 2j ; . . . ; X pj Þ can be estimated efficiently as, and b V k ði; sÞ ¼ G ffiffi 2 p h k ðX kij À X ksj Þ. G h stands for a Gaussian kernel with bandwidth parameter h. This clever way of estimating the terms, b (2010) with a simpler assumption that h k' s are equal for all k, i.e. h k ¼ h for k ¼ 1; 2; . . . ; p: We provide the general derivation (i.e. h k 6 ¼ h k 0 , for k 6 ¼ k 0 ) and other associated details in the Supplementary Material. For choosing the optimal values of h k' s, we generally consider the diagonal multivariate plug-in bandwidth selection procedure (Chacó n and Duong, 2010;Wand et al., 1994). However, when p is large (p > 6), to avoid computational deadlock we suggest using Silverman's rule of thumb (Silverman, 1981) for choosing h k' s individually.

Using mutual information in association analysis
Denote the vector of estimated values of EQMI Ã as Our goal is to test if E is associated with the clinical outcome Y. When Y is a continuous outcome, a standard multiple linear regression model with E as a predictor can be written as, where b; c are fixed effects and is an N Â 1 error vector following MVN with mean 0 and identity covariance matrix r 2 I N : After estimating the parameters, the null hypothesis, H 0 : c ¼ 0, can be tested using the Wald test. Note that we can also add higher order terms of E j , such as E 2 j ; E 3 j ; . . . as in a polynomial regression model (Ostertagová , 2012) to account for non-linear relationship between Y and E j . Similarly, when Y is a categorical outcome, a logistic or multinomial logistic regression model (Kwak and Clayton-Matthews, 2002) can be considered.
Next, we consider the case of Y being a survival or recurrence outcome. Using the same definitions and conditional independence assumptions of T j , U j and covariates as in Section 2.1, the hazard function for the Cox PH model can be written as, where k j ðtjC j ; E j Þ is the hazard of the j-th subject at time t, given the vector of covariates C j and the predictor E j and k 0 ðtÞ is an unspecified baseline hazard at time t. To test the null hypothesis: H 0 : c ¼ 0, an LRT can be considered. In a two-marker scenario (i.e. p ¼ 2), one can also treat the absolute value of the Pearson correlation as a measure of marker coexpression and use it as E j in the earlier equations for testing association with the outcome. However, as we later demonstrate, using correlation instead of EQMI can be sub-optimal in many cases. It is also difficult to generalize to more than two markers.

Real-data analysis
We applied our method to two real datasets, an mIHC lung cancer dataset (Seal et al., 2022b) from Vectra 3.0 platform and a MIBI TNBC dataset (Keren et al., 2018). We also applied the traditional thresholding-based method on both the datasets. Since it was hard to decide the optimal thresholds for binarizing the markers, we ran the method for varying values of the thresholds. For every marker, concatenating the intensity data of all the subjects, we computed the median, 95% and 99% quantiles. Next, three different thresholding-based methods using these quantiles were considered, respectively referred to as Median-Thresholding, Threshold 1 and Threshold 2. Note that Threshold 1 and 2 both captured the difference in the tails of the distributions, whereas Median-Thresholding captured the difference in the centers. For the first dataset, we also performed the correlation-based association analysis, referred to as Corr.

Application to mIHC lung cancer data
In the lung cancer dataset, there were 153 subjects each with 3-5 images (in total, 761 images). Two subject-level covariates, age and sex were available. For every subject, the images were nonoverlapping and from the same TME region. After segmenting the images, the subjects had varying number of identified cells (from 3755 to 16 949). We worked directly with the cell-level data as described next. The cells come from two different tissue regions: tumor and stroma. They were classified into either of the six different cell types: CD14þ, CD19þ, CD4þ, CD8þ, CKþ and Other, based on the expression of the phenotypic markers, CD19, CD3, CK, CD8 and CD14. These markers (and the cell-types) usually have clinical meaning, e.g. CK is a type of a tumor cell marker and CD4 and CD8 are T-cell or cytotoxic T-cell markers. Apart from these, a functional marker HLA-DR (also known as MHCII), was measured in each of the cells. Johnson et al. (2021) classified the subjects into two groups, (a) MHCII: High and (b) MHCII: Low based on the proportion of CK þ tumor cells that are also positive for HLA-DR (i.e. CK þ HLA À DR þ cells). They discovered that group (a) had significantly higher 5-year overall rate of survival (reported P-value of 0.046).
Note that having a large number of CKþHLADRþ tumor cells implies that these two markers had co-expressed in a lot of the tumor cells. In light of that, we studied if the degree of co-expression of these markers in the tumor cells, as quantified by our method, was associated with the survival. Considering the tumor cells of every subject j, we first estimated EQMI Ã ; E j ¼ d EQMI Ã ðX 1j ; X 2j Þ between the two markers, HLA-DR and CK. Next, we tested the association of E j with 5-year overall survival using the Cox-PH model from Equation (5). The coefficient c was À7.26 with the P-value of the LRT being 0.0286. Thus, subjects with high co-expression of the markers in the tumor cells were more likely to survive. The result was thus consistent with Johnson et al. (2021)'s finding. The estimated coefficient, hazard ratio (HR) and P-value of all the methods are listed in Table 1 and further details such as confidence interval of the HR, are provided in the Supplementary Material. The correlation-based analysis (Corr) yielded a negative coefficient estimate but was not statistically significant (at level 0.05). Out of the thresholding-based methods, only Threshold 2 had a significant Pvalue. It demonstrated that the traditional thresholding-based method could vary heavily based on the choice of the thresholds leading to utterly different conclusions. On the other hand, inference using our method would be robust.
We should point out that CK was a phenotypic marker, and studying its co-expression with a functional marker HLA-DR might not have much clinical relevance. However, the goal was to demonstrate that our method obviates the need to binarize the continuousvalued marker expression profiles and is potentially applicable even when one of the markers is a phenotypic marker.

Application to TNBC MIBI data
The TNBC MIBI dataset (Keren et al., 2018) had 38 subjects, each with one image. There were 201 656 cells in total with each of the images having varying numbers of cells (between 1217 and 8212). There were 49 markers in total, the majority of which were lineage or phenotypic markers, used primarily for cell-type identification. We were interested in two sets of functional markers, (a) HLA-DR, CD45RO, H3K27me3, H3K9ac and HLA-Class-1, and (b) PD1, PD-L1, Lag3 and IDO, also known as IRP's. The reason for concentrating on these two sets of markers was the findings of Patwa et al.  (2021). Employing the thresholding-based method with a set of very carefully chosen thresholds, they concluded that the pair-wise co-expression of the functional markers from the sets (a) and (b) were negatively associated with two clinical outcomes, namely disease recurrence and time to death (survival). For set (a), they did not find any statistical significance for either of the clinical outcomes (at level 0.05), whereas, for set (b), they were able to find statistical significance in the association test with recurrence (reported P-value of 0.0058). For the co-expression analysis with the five markers from the set (a), we looked into all possible (2 5 À 5 À 1 ¼ 26) two-way and higher-order combinations of the markers. There were four different types of combination, namely pair (10), triplet (10), quadruplet (5) and quintuple (1). We computed EQMI Ã for each combination of the markers and tested for association with recurrence and survival. In Table 2, we list the results for five marker combinations for which the lowest P-values were observed. We noticed that at a level of 0.05, several marker-combinations were found to be associated with recurrence. All the estimated coefficients were negative, implying that higher co-expression of the markers decreased the chance of disease recurrence. However, once we adjusted the P-values for multiple testing correction using Bonferroni's method (Bonferroni, 1936), only the combinations 'HLA-DR, CD45RO' and 'HLA-DR, CD45RO, H3K9ac' remained significant. Note that, we compared the P-values of every type of combination separately, meaning that for marker pairs and triplets, we compared the P-values at level 0.05/10 since the numbers of pairs and triplets were both 10, and for quadruplets, at level 0.05/5 since the number of quadruplets was 5. The marker-combinations were not independent and thus, a Bonferroni correction probably has been overly conservative in this case. For the survival outcome, we did not detect any statistical significance. But, the negative coefficient estimates hinted at a possible association of better rate of survival with higher co-expression.
For the co-expression analysis with the four IRP's from the set (b), we looked into all possible (2 4 À 4 À 1 ¼ 11) two-way and higher order combinations. There were four different types of combination, namely pair (6), triplet (4) and quadruplet (1). In Table 2, we list the results for five marker combinations for which the lowest P-values were observed. At a level of 0.05, many of the markercombinations were found to be associated with both recurrence and survival. Again, all the estimated coefficients were negative, implying that higher co-expression of the markers decreased the chance of disease recurrence. Upon correcting the P-values for multiple testing using Bonferroni's method, all of the marker combinations listed in Table 2 remained significant for recurrence while for survival, none of them remained significant. We compared the P-values of every type of combination separately, meaning that for marker pairs we compared the P-values at level 0.05/6 since the number of pairs was 6, for triplets we compared the P-values at level 0.05/4 since the number of triplets was 4, and for quadruplet, at level 0.05 since there was just a single quadruplet. Thus, we also arrived at a similar conclusion as Patwa et al. (2021) that the inter-play or coexpression of the IRP's were significantly associated with recurrence and possibly also with survival. One added novelty of our method was that one could easily pinpoint which of the combinations of the IRP's had the most impact.
It should be kept in mind that the sample-size for this dataset was quite small with only 16 events for recurrence and 15 for survival. It might have affected the overall inference which was based on asymptotic distributional properties of the test statistics. For the same reason, we mainly focused on the sign of the coefficient estimates but not their CI's in this particular case. We also applied the simple thresholding-based methods, Median-Thresholding, Threshold 1 and Threshold 2 as described earlier using both the sets of markers. We found only a single statistically significant result which was for Median-Thresholding using set (b) in the association test with recurrence. Associated tables are provided in the Supplementary Material.

Simulation
Next, we compared the performance of the EQMI Ã -based association analysis with the correlation-based association analysis and the thresholding-based methods in different simulation setups. We assumed that there were two groups of subjects, one with subjects having high marker co-expression and the other with subjects having low or almost zero marker co-expression. In Section 4.1, we considered two markers sharing linear and non-linear patterns of co-expression relationship. In Section 4.2, we considered three markers sharing varying degree of linear co-expression relationship.

Simulation using Gaussian copula
We replicated the characteristics of the lung cancer dataset in this simulation. The mean marginal distribution of the markers HLA-DR and CK across the subjects could be approximated by Beta distribution respectively with parameters, ð1:5; 170Þ and ð1:6; 35Þ (refer to the Supplementary Material). We used a Gaussian copula (Masarotto and Varin, 2012) to simulate correlated intensity data for two markers which had the above marginal Beta distributions. The simulation strategy was as follows, 1. A random number I j between 0 and 1 was chosen with probability 0.5 each, respectively standing for group (1), whose subjects had high co-expression of the markers and group (2), whose subjects had mild to none co-expression of the markers. It assigned j-th subject to either of the two groups. 2. The intensity vector of two markers, ðX 1ij ; X 2ij Þ T for every individual j was simulated as follows, a. If I j ¼ 0, simulate a correlation parameter q j from Unifð0:75; 0:9Þ, or else simulate q j from Unifð0; 0:15Þ. b. Consider a correlation matrix, R j ¼ ½ 1 q j q j 1 and simulate ðu 1ij ; u 2ij Þ T $ N 2 ð0; R j Þ for i ¼ 1; . . . ; n j : c. Compute ðv 1ij ; v 2ij Þ T ¼ ðUðu 1ij Þ; Uðu 2ij ÞÞ T , where UðÞ denotes the cumulative distribution function of the standard normal distribution. d. Perform inverse transformation as, e. ðX 1ij ; X 2ij Þ T ¼ ðF À1 1 ðv 1ij Þ; F À1 2 ðv 2ij ÞÞ T , where F 1 and F 2 are Beta distributions with parameters ð1:5; 170Þ and ð1:6; 35Þ. Refer to the Supplementary Material for plots of the true joint densities of the markers for two groups of subjects. 3. The clinical outcome of j-th subject is simulated as, where j $ Nð0; r 2 Þ: This step is repeated 100 times to generate 100 different datasets having different Y vectors but the same intensity vectors, X 1 and X 2 . All the methods are applied on these 100 datasets and empirical power is computed.
Steps 1 À 3 were repeated 20 times and the mean empirical power of different methods were displayed for varying values of the number of cells ðn cells Þ and the number of individuals (N) in Figure 2. The EQMI Ã -based association analysis (EQMI) and the correlation-based association analysis (Corr) achieved comparable performance in all the cases. This particular simulation strategy inherently assumed that the dependence between the markers was linear. Thus, the estimated values of the EQMI Ã and the correlation shared an almost one-to-one relationship making the association analysis using either of them equivalent. Median-Tresholding and Threshold 1 showed similar performance, whereas Threshold 2 had consistently lower power. It showed the importance of choosing a proper threshold in the traditional thresholding-based method for achieving a reasonable performance. All the methods expectedly had the least power when N was the smallest, whereas the value of n cells did not have any major impact. It implied if the co-expression pattern was well captured even through a smaller number of cells, most of the methods would perform well.

Simulation with squared marker co-expression relationship
The last simulation strategy essentially assumed a linear pattern of coexpression between the markers. Next, we simulated a non-linear pattern of co-expression between the markers where we would expect the correlation-based association analysis (Corr) to perform worse since it could only capture a linear dependence pattern. Steps 1 and 3 of the last simulation strategy were kept the same here and the marker-intensity simulation step, i.e. step 2 was changed as follows, 2a. If I j ¼ 0, simulate X 1ij from Unifð0; 0:1Þ, and e ij from Unifð0; 0:0005Þ. Construct X 2ij as, X 2ij ¼ ðX 1ij À 0:05Þ 2 þ e ij : 2b. If I j ¼ 1, independently simulate x 1ij ; x 2ij both from Unifð0; 0:1Þ.
From Figure 2, we noticed that the EQMI Ã -based association analysis performed the best in all the cases. Threshold 1 and 2 achieved comparable performance for large value of n cells , whereas Median-Thresholding mostly yielded poor performance. Note that for both the groups of subjects, the correlation between the markers were close to zero as the dependence pattern was non-linear, squared to be specific. Expectedly, the correlation-based association analysis (Corr) had almost no power in every case. The simulation strategy demonstrated why using a generalized measure of coexpression such as EQMI Ã would be more optimal in many cases.

Simulation with circular marker co-expression relationship
In the last simulation setup, the thresholding-based methods performed well despite the marker co-expression pattern being nonlinear. Next, we looked into a slightly more complicated coexpression relationship for which the thresholding-based approach would suffer. Steps 1 and 3 of the last two simulation strategies were kept the same here and the Step 2 was changed as follows, 2a. If I j ¼ 0, simulate X 1ij from Unifð0; 0:1Þ, e ij from Unifð0; 0:0005Þ and a random number s ij between À1 and 1. Construct X 2ij as, X 2ij ¼ 0:05 þ s ij ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi 0:05 2 À ðX 1ij À 0:05Þ 2 q þ e ij : 2b. If I j ¼ 1, independently simulate X 1ij ; X 2ij both from Unifð0; 0:1Þ.
From Figure 2, we noticed that the EQMI Ã -based association analysis performed the best in all the cases. The correlation-based association analysis (Corr) expectedly performed the worst. Threshold 1, unlike the last simulation, performed significantly worse. In this simulation setup, the subjects of one group had a circular pattern of marker co-expression and the others had almost zero co-expression. Recall that in a two-marker scenario, the thresholding-based methods depended on computing the proportions of the cells positive for both the markers and of the cells positive for only one of the markers (Section 2.1). The difference between these proportions across the two groups of subjects became negligible under this setup which made it difficult distinguishing between them. This explained the overall poor performance of all the thresholding-based methods.

Simulation with three markers
Next, we considered three markers and simulated varied degree of linear dependence between them using Gaussian copula. We only performed the EQMI Ã -based association analysis and the thresholding-based methods in this case. There were again two groups of subjects respectively with high and low co-expression. The simulation strategy was as follows, 1. A random number I j between 0 and 1 was chosen with probability 0.5 each, respectively standing for groups (1) and (2). 2. The intensity vector of three markers, ðX 1ij ; X 2ij ; X 3ij Þ T for every individual j was simulated as follows, i. Consider a correlation matrix, R j ¼ 1 q 12j q 13j q 12j 1 q 23j q 13j q 23j 1 2 6 4 3 7

5.
For the subjects in group (1), the off-diagonal elements of the correlation matrix would have high values, whereas they would have low values for the subjects in group (2). We considered three different cases with varying differences between the correlation matrices of the two groups.
Steps 1-3 were repeated 20 times and in Figure 3, the mean empirical power of the methods were displayed. The power of the methods were quite low when N was small. The EQMI Ã -based method outperformed both the thresholding-based methods, Threshold 1 and 2 in every case. Threshold 2 had little to no power in most of the cases. Note that, the cases (a), (b) and (c) differed in how different the marker co-expression pattern of the two groups were. The difference between the marker co-expression pattern of the two groups was the largest in case (a) since all the three correlation parameters, q 12j ; q 23j and q 13j were different across the groups. The difference was the smallest in case (c) as two of the three correlation parameters, q 23j and q 13j were kept to be 0 in both the groups. Quite expectedly, the power of the methods decreased going from case (a) to case (c), as the difference between the groups of subjects reduced. The decrease was more prominent with Threshold 1 suggesting the method's lack of robustness.

Discussion
In multiplex imaging data, studying the interaction or co-expression of multiple functional markers in the cells of the TME can be crucial for subject-specific assessment of risks. The traditional approach requires a complex step of binarizing the continuous valued marker expression profiles which is prone to subjectivity and can be suboptimal in many scenarios. The complexity gets exacerbated as the number of markers increases. In this article, we propose a method for studying interaction or co-expression of multiple markers based on the theory of mutual information (MI). We treat the subjectspecific intensity or expression of every marker as a continuous random variable. We determine how much the markers have coexpressed in the TME of a particular subject by computing a measure known as EQMI, comparing the estimated marginal and joint PDFs of the markers. The formula of EQMI has a similar interpretation as the standard formula of MI but allows a more efficient computation. We adopt and generalize an existing algorithm for computing EQMI that does not require explicitly estimating the joint PDF of the markers, a step which becomes increasingly intractable as the number of markers increases. Next, the subject-level EQMI values are tested for association with the clinical outcomes. The proposed method is free from the subjectivity bias of the traditional thresholding-based method and is readily applicable with any number of markers.
We applied the proposed method to two real datasets, one mIHC lung cancer dataset and one MIBI triple negative breast cancer dataset. In the former, we found high co-expression of the markers, HLA-DR and CK to be associated with the 5-year overall  On the x-axis, the fixed effect size b was varied from low to high survival of the subjects. In the latter, we found high co-expression of the proteins, PD1, PD-L1, IDO and Lag3 to be associated with disease recurrence. We evaluated the performance of our method through several simulation scenarios with two and three markers. In the scenarios with two markers, we showed that all the methods perform well and close to each other if the pattern of dependence (coexpression) between the markers is linear. However, with a more complex non-linear dependence pattern, only the proposed method could achieve respectable power. In the scenarios with three markers, we found that the proposed method performed consistently better than the thresholding-based method and showed superior robustness.
As we have shown in the simulation studies, EQMI can capture both linear and non-linear patterns of co-expression between the markers very well. However, the measure is not well suited for capturing the differences between the patterns. For example, it may happen that one subject has a linear pattern of co-expression, whereas some other subject has a non-linear pattern. The EQMI for both the subjects can be very similar, making it hard to distinguish between them. As a part of our future direction, we would like to improve the method by detecting and incorporating the type of the co-expression pattern. With more than two markers, we studied the co-expression patterns of all possible combinations of the markers and declared significance based on P-values corrected by Bonferroni's method. However, in future, we would like to explore the causal direction between the markers which can then be used to determine a smaller and optimal set of markers and would obviate the need of exploring all possible marker-combinations. In this article, we have not used any information on the spatial locations of the TME cells. As a future direction, we would like to study the MI between the spatial information and the marker expression profiles with a goal to detect spatially variable markers and their spatial patterns.
Our method is available as an R package named MIAMI at this link, https://github.com/sealx017/MIAMI. The package is readily applicable to any multiplex imaging dataset which has cell-level intensity data on two or more markers. In future, we would like to further augment the package's capability by incorporating a pixel-level analysis as well.