- Split View
-
Views
-
Cite
Cite
Ashwani Kumar, Ali Hosseinnia, Alla Gagarinova, Sadhna Phanse, Sunyoung Kim, Khaled A Aly, Sandra Zilles, Mohan Babu, A Gaussian process-based definition reveals new and bona fide genetic interactions compared to a multiplicative model in the Gram-negative Escherichia coli, Bioinformatics, Volume 36, Issue 3, February 2020, Pages 880–889, https://doi.org/10.1093/bioinformatics/btz673
- Share Icon Share
Abstract
A digenic genetic interaction (GI) is observed when mutations in two genes within the same organism yield a phenotype that is different from the expected, given each mutation’s individual effects. While multiplicative scoring is widely applied to define GIs, revealing underlying gene functions, it remains unclear if it is the most suitable choice for scoring GIs in Escherichia coli. Here, we assess many different definitions, including the multiplicative model, for mapping functional links between genes and pathways in E.coli.
Using our published E.coli GI datasets, we show computationally that a machine learning Gaussian process (GP)-based definition better identifies functional associations among genes than a multiplicative model, which we have experimentally confirmed on a set of gene pairs. Overall, the GP definition improves the detection of GIs, biological reasoning of epistatic connectivity, as well as the quality of GI maps in E.coli, and, potentially, other microbes.
The source code and parameters used to generate the machine learning models in WEKA software were provided in the Supplementary information.
Supplementary data are available at Bioinformatics online.
1 Introduction
Large-scale genetic (gene–gene, or epistatic) interaction (GI) maps in human cells (Bassik et al., 2013; Boettcher et al., 2018; Du et al., 2017; Horlbeck et al., 2018) and other organisms (Frost et al., 2012; Horn et al., 2011; Lehner et al., 2006; Ryan et al., 2012), especially the most comprehensive digenic (Costanzo et al., 2010, 2016) and trigenic (Kuzmin et al., 2018) GI maps of the budding yeast Saccharomyces cerevisiae, have revealed extensive insights into genetic redundancy, and identified cellular functionality of varied genes. GI maps in yeast have also revealed the functional wiring in response to DNA damage or autophagy stress (Bandyopadhyay et al., 2010; Guenole et al., 2013; Kramer et al., 2017; Srivas et al., 2013), metabolic adaptation (Szappanos et al., 2011), cellular protein homeostasis (Rizzolo et al., 2017) and the evolution of new traits (Taylor and Ehrenreich, 2014), as well as contributed to the discoveries of new gene functions and uncharacterized protein complexes or pathways (Collins et al., 2007; Hoppins et al., 2011; Jonikas et al., 2009; Schuldiner et al., 2005). Motivated by these efforts in yeast, we (Butland et al., 2008) and others (Typas et al., 2008; van Opijnen et al., 2009) have developed analogous high-throughput GI screening methods in the prokaryotic Gram-negative model organism, Escherichia coli.
Our eSGA (E.coli synthetic genetic array) approach allows for systematic creation and phenotypic scoring of double mutants by exploiting natural homologous recombination and bacterial conjugation to transfer a genetically marked ‘query’ deletion (or partial loss-of-function hypomorphic mutant allele) into a panel of E.coli essential hypomorphic or nonessential single gene deletion mutant ‘recipient’ strains. By employing this method, GIs pertaining to various biological processes (i.e. cell envelope, protein synthesis, genome integrity, nutrient stress), including antibiotic targets, were systematically mapped to reveal the epistatic dependencies and cross-talk (Babu et al., 2014; Cote et al., 2016), as well as the dynamic alteration of GI networks in response to environmental challenges or genotoxic insults (Babu et al., 2011b; Gagarinova et al., 2016; Kumar et al., 2016).
The presence of an epistatic relationship is determined by comparing the growth fitness of double-mutant strains to their single-mutant counterparts, with colony size measurements reflecting strain fitness. In a multiplicative definition (a.k.a. model) used for E.coli and yeast GI mapping, for functionally unrelated genes, which represent the majority of possible gene–gene combinations, the product of the two single-mutant growth fitness measurements is expected to equal the fitness of the corresponding double mutant. Conversely, combining two functionally linked genes would be expected to produce more severe (negative or aggravating/synthetic sick or lethal effects, indicating parallel/redundant pathways) or less severe (positive or alleviating/buffering or suppression, suggesting a linear pathway) fitness defects than the predicted product of single-mutant fitness measurements. The strength and confidence of the GI is then determined using an interaction score for each E.coli-mutant gene pair based on the multiplicative rule originally developed for yeast (Baryshnikova et al., 2010; Collins et al., 2006).
In addition to the multiplicative model, other predefined functions for epistasis, such as minimum (double mutant has the same fitness as the less-fit single mutant), logarithmic (GIs measured on a logarithmic fitness scale) and linear additive (double mutant has the fitness equal the sum of two single-mutant fitness values), have been tested in yeast using the growth-rate measurements of single and double mutants (Mani et al., 2008). This revealed that multiplicative and logarithmic functions are suitable for identifying functional relationships, specifically for growth-related phenotypes (i.e. colony fitness or growth rates), due to the nonlinear nature of cell growth (Mani et al., 2008; Wang et al., 2014). Conversely, additive model was shown to be appropriate for assessing phenotypes that change linearly (Wang et al., 2014). Because these quantitative definitions diverge vividly and lead to different biological inferences when constructing GI networks (Mani et al., 2008), the choice of a model should always depend on the type of phenotype and read-out (Walhout et al., 2012). As with yeast studies (Costanzo et al., 2010, 2016), large-scale E.coli genetic screens have typically used colony growth as a proxy for strain fitness measurement, and the multiplicative model for scoring GIs (Babu et al., 2011, 2014; Butland et al., 2008; Gagarinova et al., 2016; Kumar et al., 2016). While the choice of a model can have a determining effect on the interpretation of the results of a GI study, there has been no systematic evaluation of the GIs that would be differentially predicted in E.coli using other statistical models and machine learning algorithms.
Here, using our recently published large-scale GI map of E.coli genome integrity (Kumar et al., 2016) as a case study, we compared multiple models based on their ability to identify functional relationships among gene pairs in E.coli. We show that a nonparametric machine learning-based Gaussian process (GP) can discriminate between genetically noninteracting and interacting E.coli gene pairs better than the published multiplicative and other machine learning and statistical models tested. We further experimentally verified this directly, testing predicted interactions for more than a dozen gene pairs. Using the GP model, we then re-evaluated existing GI data, identifying numerous interactions that have been missed by the multiplicative algorithm. We predict that the GP will improve the function prediction accuracy for the 16% (702 of 4432) of genes that still lack annotations, and thus can substantially increase the amount of gene functional information derived from the E.coli GI screens.
2 Materials and methods
2.1 Data collection and prediction of GIs
Colony size measurements (as proxies for strain fitness), normalized via the online SGAtool (Wagih et al., 2013), were obtained for 102 255 viable digenic mutants and their corresponding 549 single gene mutants from our standard nutrient-rich condition E.coli GI network (Kumar et al., 2016). In addition, we included 1767 gene pairs that were missed in the original genome integrity GI study (Kumar et al., 2016), yielding quantitative values for a total of 104 022 gene pairs. For statistical reproducibility, each Hfr query mutant in the genome integrity study was screened in quadruplicate against arrayed F−-recipient single-gene deletion mutants, generating four replicates per donor–recipient combination. The standard error (SE) measurements for the normalized colony sizes of the recipient single mutants (Wy; SE: 0 to 1.91×10−1) and double mutants (Wxy; SE: 3.19 × 10−7 to 9.54 × 10−1) were experimentally quantified (Kumar et al., 2016), and fall within the SE range (Supplementary Table S1). Whereas the fitness measurement of each donor mutant (Wx) was represented as the median of normalized colony sizes for all double mutants arising from the corresponding donor’s GI screen.
The normalized fitness measurements of the single (Wx: 0.91–1.20; Wy: 0.52–2.64) and double (Wxy: 0.00–1.96) mutants were then used to compute the expected double-mutant fitness according to predefined (i.e. additive, logarithmic, minimum, multiplicative) and machine learning (i.e. linear regression, LR; support vector regression, SVR; neural network, NN; GP) models, with normalized colony size values of 1.0 equaling wild-type (WT) fitness. To enable comparisons between the chosen predefined and machine learning models, we computed the GI score (ε) to define epistatic relationships among pairs of genes as the difference between expected and observed double-mutant fitness measurements, such that the score of zero would reflect a noninteraction, and most scores would be expected to center on zero. Conversely, aggravating and alleviating GIs would be defined as those whose double-mutant fitness significantly different from the fitness expected based on the two corresponding single-mutant fitness values.
2.2 Predefined GI models
The four predefined formulas employed to delineate GIs are detailed elsewhere (Mani et al., 2008). Briefly, first, in the multiplicative definition GI score Wxy − (Wx * Wy), the product of the two single-mutant (Wx, Wy) growth fitness values is deducted from the observed double-mutant (Wxy) fitness. Second, in the additive definition score Wxy − (Wx + Wy − 1), the sum of single-mutant growth fitness values (minus one) is subtracted from the corresponding observed double-mutant fitness value. Third, in minimum definition GI score Wxy − Min (Wx Wy), the growth fitness value from the slowest-growing single mutant (Min [Wx, Wy]) is subtracted from the respective double-mutant fitness value. Fourth, the logarithmic definition GI scores Wxy – log 2 ((2Wx − 1) (2Wy − 1) + 1) reflect the fitness scale in logarithmic function.
2.3 Defining GIs by machine learning models
GIs measured using four different machine learning algorithms, along with chosen parameters under each definition, built within the WEKA (Smith and Frank, 2016) machine learning software (ver 3.9.3), are summarized, briefly, as follows:
LR: The training data are modeled in a way such that the predicted (dependent) variable is a linear function of the input (independent) variables. Here, the independent variables were Wx and Wy (signifying two single mutants' growth fitness), and the dependent variable was Wxy (double-mutant growth fitness). We used standard WEKA implementation of LR.
SVR: Kernel function is used to map the input data into a higher-dimensional space, in which a linear hyperplane is trained and then used as a model for prediction. Again, the input variables were Wx and Wy, and the output variable was Wxy. We used RegSMOImproved algorithm with the polynomial kernel function under standard parameter settings as defined in WEKA.
NN: Input values are linearly combined and fed through layers of so-called neurons, which pass on their output values (computed via a linear activation function) to the next layer, eventually leading to the output layer which produces the final prediction for the given inputs. In numerous iterations, internal weights are adjusted so that the output values approximate the ones given in the training data. We used the WEKA implementation for training a feed-forward NN (multilayer perceptron) with one hidden layer with one unit, using the sigmoid activation function. For all other parameters, we used the standard setting in WEKA. As always, the input variables were Wx and Wy, and the output variable was Wxy.
GP: This definition that relies on Bayesian principles initializes a posterior probability distribution on the training dataset, and can be best explained by comparing simple linear regression to Bayesian linear regression. In the former, a linear relationship is assumed between the dependent (Wxy) and independent (Wx, Wy) variables, where the intercept and slope were calculated using the parameters and , and an error term ( is added.
Here K = covariance matrix expressing similarity between observed (training) values calculated by a kernel function; = covariance matrix expressing similarity between the training and test values for x; = covariance matrix expressing similarity between the test values for x; and = transpose of . Finally, for training dataset we have observed the corresponding outcome (i.e. denoted as f), whereas for test dataset we predict an outcome (i.e. denoted as ). This is best represented by computing a probability assuming that f and jointly have a Gaussian distribution. Equation (1) means that ( is normally distributed () with mean (, and covariance matrix (. To summarize, a GP is an infinite family of random variables with the property that every finite subset forms a vector of random variables that follows a Gaussian distribution. In machine learning with GPs, a kernel function is used for estimating the similarity between input data points so that similar input leads to comparable predictions. We used the WEKA implementation with the radial basis function kernel. The only parameter of that kernel, gamma, was set to the default value 1.0 for the data from Kumar et al. (2016) or Costanzo et al. (2010), and to 3.0 for the data from Babu et al. (2014).
2.4 Training and test datasets
This led us to construct 12 bins with different normalized fitness values, with 20 distinct gene pairs chosen within each bin. The resulting 336 training gene pairs were subjected to 5-fold cross-validation by distributing them randomly into five sets. We then assessed the performance of each model using the remaining gene pairs in the other four sets. This process was repeated such that the model validation is iteratively performed, and the expected double-mutant fitness values from these iterations were then combined to generate the final model.
These aforesaid factors yielded 400 noninteracting and 400 interacting test gene pairs, comparable in size to the training set, for proper evaluation of the performance of the selected models. Notably, we ensured that our noninteracting test gene pairs did not have any overlap with the training dataset.
2.5 Statistical measures to evaluate model performance
For both MAE and RMSE, the average prediction error ranges from zero to infinity. As a measure of variance, for each model we applied, where a standard deviation (SD) () relative to the mean () error (e) of all gene pairs was expressed as a percentage.
2.6 Decision tree-based classification
Using the C4.5 decision tree algorithm provided within the WEKA software, 800 test gene pairs were partitioned into interacting (aggravating, alleviating) or noninteracting and were incorporated independently into the decision tree classifier with training labels based on multiplicative or GP model. A 5-fold cross-validation was applied to these test pairs for each definition, whereby four-fifths of the data were used for training, and the other one-fifth was held out for testing. Briefly, each of the aggravating, alleviating and noninteracting pairs were assigned specific labels: -ve, +ve, and O, respectively. The resulting 3 × 3 confusion matrix was used to compute the percentage of interacting or noninteracting pairs that was classified correctly.
2.7 Examining GIs for shared functional links using gene set enrichment analysis
Functional enrichment analysis was performed as described previously (Subramanian et al., 2005) using a gene set enrichment analysis (GSEA) (ver. 3.0) tool. Briefly, a pre-ranked gene list of normalized fitness values corresponding to the pairs defined as aggravating (1614), alleviating (2459) and noninteracting (4073) under the GP definition were investigated to determine if the GI pairs share similar Gene Ontology (GO) functions. All GO annotation genesets for E.coli were downloaded from EcoCyc, and GO terms with a minimum of 2 to a maximum of 500 annotated genes were considered for GSEA. Genes that were enriched for a GO term with a P ≤ 0.05 and with a false discovery rate (FDR) of 25% or less were used to assess the functional relationship, as long as an enriched GO term function is held by both genes in a pair.
2.8 Bacterial strains and media
Strains used for experimental validation include the WT E.coli K-12 BW25113; kanamycin-marked F−-recipient single-gene deletion mutant strains from the Keio knockout library (Baba et al., 2006), including the functionally unrelated gene JW5028 (Gagarinova et al., 2012); F− single-mutant strains with hypomorphic alleles of essential genes generated by integrating a C-terminal sequential peptide affinity fusion tag and a kanamycin selection cassette into the 3′-UTR of the respective essential genes to perturb transcript abundance as described previously (Babu et al., 2014; Butland et al., 2008); and chloramphenicol-marked high-frequency recombination Cavalli (HfrC) donor single-mutant strains created using λ-Red recombination as reported earlier (Butland et al., 2008). Strains were grown on Luria-Bertani (LB) medium at 32°C for 24 h. In addition, as indicated, kanamycin (F− recipients and double mutants; 50 μg/ml), chloramphenicol (HfrC donors and double mutants; 34 μg/ml), bleomycin (BLM; 0.25 µM) and methyl methanesulfonate (MMS; 0.05%) were added to the media.
2.9 Validation of predicted GI pairs
Double mutants were created by conjugating a chloramphenicol-marked HfrC donor gene deletion mutant with the kanamycin-marked F−-recipient mutant strains, following the eSGA procedure (Butland et al., 2008). The double mutants and the corresponding single mutants were serially diluted in LB medium and pinned onto LB plates with and without DNA-damaging agents (BLM, MMS) to assess the sensitivity of the strains to DNA damage. Phase-contrast cell morphology micrographs from the cultures of single or double mutants incubated for 2 h in the presence of DNA-damaging agents were captured using a Carl Zeiss LSM700 confocal microscope; and cell lengths were measured using ImageJ.
3 Results and discussion
3.1 GP identified more bona fide GIs than other models
To evaluate the performance of the selected models, we employed RMSE and MAE metrics on the training (336) and test (400) gene pairs based on predicted GI scores. Our hypothesis is that for any noninteracting gene pairs, the observed double-mutant fitness values that are not different from expectation should have a low RMSE and MAE values. Consistent with this notion, just as the noninteracting test gene pairs (Fig. 1A), GP showed smallest RMSE and MAE compared to other definitions (see Supplementary Table S2 for mean and SD from a 5-fold cross-validation of the training set shown for machine learning models in Fig. 1A). As well, in contrast to predefined models, low CV exhibited by GP was comparable to other machine learning models. We also attempted to modify the parameter settings (i.e. gamma and noise parameters) used in GP model to evaluate its performance by measuring the error metrics (RMSE, MAE) on the training dataset (Supplementary Table S2). Parameter sampling was performed manually by testing various gamma (or noise) values that are above or below the default setting. We found no substantial effect on the performance of the trained predictors even when parameter settings were modified. Similarly, since NN model on the training and test datasets exhibited error metrics (Fig. 1A) close to GP, we posited that fine-tuning the parameter settings for NN may achieve better performance than GP. However, even under different parameter settings, we were unable to achieve better error metrics for NN over GP (Supplementary Table S2).
Gene pairs with related biological functions connected by GIs are rare (Mani et al., 2008). We therefore should expect most of the gene pairs to be noninteracting. As a result, observed double-mutant fitness values will yield a distribution that approximates the expected distribution. Quantitative measurement of GIs for those noninteracting gene pairs would then have a tight normal distribution centered around zero (i.e. low dispersion and low bias), indicating that GIs are restricted only to certain pairs. Here, the left and right tails of the distribution represent gene pairs with negative (aggravating) and positive (alleviating) GIs, respectively. By applying this strategy to the training or test double-mutant strains, and to the 104 022 digenic mutants of the genome integrity network (Kumar et al., 2016), we examined the |ε-score| distribution to discern which definition produced values close to zero. As with the training or test dataset (Fig. 1B), we found the distribution of GP for all digenic mutant pairs approximated a normal distribution that is tight and close to zero (i.e. mean = 0.02, SD= 0.22; with lowest bias and dispersion for ε scores; Fig. 1C), conforming to the expectation that most gene pairs are noninteracting. Alternately, other models showed a positive or negative shift in the distribution of |ε-score| with a greater number of scores deviating from zero (i.e. with bigger SDs; Fig. 1B and C).
3.2 GP identifies new GIs missed by the multiplicative model
We next investigated if the performance of GP in terms of negligible deviation of observed double-mutant fitness from expectation and tight |ε-score| distribution for genes involved in genome integrity was consistent when applied to a different E.coli GI dataset not focused on any particular function. To do so, we obtained the growth fitness measurements of 600 000 digenic mutant strains and their respective single mutants from our genome-wide GI study (Babu et al., 2014) in which we crossed 163 functionally unrelated ‘donor’ mutant alleles from diverse core bacterial processes against 3968 nonessential single gene deletions and 149 hypomorphic mutant strains. Using this growth fitness evidence, 504 gene pairs were selected for training set based on the same selection criteria as described above (see Section 2). Here, GP once again exhibited insignificant deviation of double-mutant fitness from expectation as determined by 5-fold cross-validation (Supplementary Fig. S1A), as well as least bias and dispersion (i.e. normal distribution centered on neutrality) compared to other models (Supplementary Fig. S1B). Collectively, these results suggest that, as in yeast (Mani et al., 2008), the choice of model is vital since it can markedly impact our discovery of E.coli GIs, especially when 5% (235 076 of 4 889 920) of total digenic mutants examined to date are solely supported by large-scale genetic screens relying on the multiplicative function (Babu et al., 2011, 2014; Gagarinova et al., 2016; Kumar et al., 2016).
To further analyze the performance of the GP model in identifying GIs, we compared how well the gene pairs among the top 25% ranked (26 005 of 104 022 double mutants) by |ε-score| thresholds for GP (ε ≤ −0.152; ε ≥ 0.153) and the multiplicative (ε ≤ −0.221; ε ≥ 0.203) model were in agreement with the identification of their putative GIs. Here, in order to compare εGP and εMulti, we defined gene pairs with εGP ≤ −0.152 or εMulti ≤ −0.221 as aggravating, εGP ≥ 0.153 or εMulti ≥ 0.203 as alleviating and those that do not pass the threshold as noninteracting. While the interaction type for most (84%, 21 932) of the digenic mutant gene pairs (14 740 alleviating, 7192 aggravating; Supplementary Table S1) displayed by GP and multiplicative definitions were similar, for one-sixth (16%, 4073) of the gene pairs they predicted inconsistently (Fig. 1C inset). The latter observation led us to investigate which model might have misclassified the interacting gene pairs.
A close examination of 3481 gene pairs (category A) that were defined as having aggravating interactions (−2.064 ≤ εMulti ≤ −0.221) by the multiplicative, but not by the GP definition (−0.152 ≤ εGP ≤ 1.00) showed no obvious growth fitness defects in the double mutants relative to their respective single mutants (Fig. 1D-i), indicating that GP accurately followed the expectation that most gene pairs were not aggravating. For example, in contrast to the multiplicative model (εMulti = −0.224), GP classified the association between recA and ssb as close to neutral (εGP = 0). This observation is consistent with previous findings (Cox and Lehman, 1982; Eggler et al., 2003; Kowalczykowski et al., 1987), suggesting that both ssDNA binding proteins, Ssb and RecA, compete for ssDNA substrates in vivo; hence ssb suppresses the nucleation stage of recA filament formation on ssDNA and subsequent homologous recombination. Likewise, 1614 gene pairs (category B) were defined as aggravating by GP (−0.308 ≤ εGP ≤ −0.152) but not by the multiplicative model (−0.220 ≤ εMulti ≤ −0.029). While gene pairs in this category were assigned negative ε scores by both models, the strength and confidence of putative GIs are different. If a gene pair has a strong GI, then it is expected to be within the top-ranked list of gene pairs. In that sense, GIs predicted by the multiplicative model inconsistently classified the pairs as noninteracting (i.e. fell below the chosen cut-off; εMulti ≤ −0.221) despite many double mutants exhibited weak growth fitness compared to their single mutants (Fig. 1D-ii). For example, GP has successfully captured the aggravating interaction (εGP = −0.159) previously reported for recB priA double mutants (Mccool and Sandler, 2001), whereas this interaction (εMulti = −0.108) was precluded under the multiplicative definition as it was below the selected |ε-score| cut-off.
Additionally, among the 592 gene pairs (category C) predicted by the multiplicative model as alleviating (0.203 ≤ εMulti ≤ 0.291), but were classified as noninteracting by the GP definition (0.053 ≤ εGP ≤ 0.151) showed that when double mutants and their respective single mutants had moderate growth fitness defects, the GP model closely matches the expectation of these gene pairs being noninteracting (Fig. 1D-iii). Similarly, based on the growth fitness of double mutants that show better growth than their corresponding single mutants, GP has classified 2459 gene pairs (category D) as alleviating (0.152 ≤ εGP ≤ 1.00) as expected; whereas the multiplicative model showed a greater tendency to place these gene pairs as aggravating or close to neutrality (−1.941 ≤ εMulti ≤ 0.203; Fig. 1D-iv). An illustrative example is when compared to the multiplicative model (εMulti = 0), GP (εGP = 0.215) defines the interaction between the components of cellulose synthase complex, BcsF and BscQ, as alleviating, consistent with the notion that such interactions tend to occur between the subunits of a protein complex (Babu et al., 2014; Bandyopadhyay et al., 2008; Collins et al., 2007; Costanzo et al., 2019).
Although the above results suggested that the classification of gene pairs by GP to appropriate interaction class was consistent with colony growth measurements, we investigated further if the GI pairs (4073; sum of categories A and D) favored by the multiplicative model, that when not in agreement with GP, can be clearly demonstrated by their change in interaction direction, irrespective of their chosen threshold. We found that for less than a quarter (19%; 776 of 4073) of the pairs defined by the multiplicative model as aggravating, GP clearly classified these pairs either as alleviating (103 pairs) or noninteracting (673 pairs; Supplementary Table S1). The interaction trend (i.e. positive or negative) for the remaining three-fourths (81%; 3297 of 4073) of the pairs was reliable under two definitions, but these pairs had neutral (not significant) interactions according to GP, consistent with the observed neutral colony sizes (Fig. 1D) and with the expectation that most pairs would not interact.
Next, our rationale for using top 25% (26 005) of the ranked GI pairs by |ε-score| thresholds for GP and multiplicative model was to obtain large sets of gene pairs to make reasonable biological interpretations about model performance. While this method of utilizing top-ranked gene pairs is widely adopted for evaluating the performance of GI predictions (Ulitsky et al., 2009), and in cross-study validation using multiple transcriptomic datasets to generate a robust prediction model (Kim et al., 2016), we applied statistical thresholds corresponding to two SD (|Z-score| ≥ 1.96; P ≤ 0.05; Supplementary Fig. S1C) to the 104 022 digenic mutants of the genome integrity network to examine the predictive capability of GP and multiplicative model in identifying GIs. Of the 6980 statistically significant (P ≤ 0.05) GI pairs, 4502 had GI (ε) scores of −1.96 or lower indicating aggravating relationships, and 2478 with ε-scores of +1.96 or higher representing alleviating interactions (Supplementary Fig. S1C). Three-fourths (72%; 4998 of 6980) of the digenic mutant pairs were consistently captured by both GP and multiplicative models. Whereas for the rest of the gene pairs, GP approach has reliably classified 542 pairs as aggravating (category II) and 816 as alleviating (category IV), indicating that 601 pairs (category I) that have initially thought of as aggravating and 23 (category III) as alleviating by the multiplicative model were indeed inconsistent, based on the fitness of the double mutants relative to single mutants (Supplementary Fig. S1D; Supplementary Table S3). This trend persisted even when choosing different P-value-based cut-offs (data not shown), where most (95%; 6604) of the significant Z-transformed GI pairs in the 95th percentile (P ≤ 0.05) were found in nearly all (99%; 25 991 of 26 005) of the top 25% of the ranked gene pairs (Supplementary Fig. S1E). Overall, these results suggest that the conclusions of our findings remain consistent, whether the selection of GI pairs was based on top-ranking or P-value based cut-off.
Since probabilistic decision tree-based classifiers can reveal the characteristics of a model in terms of its GI predictive accuracy for gene pairs (Wong et al., 2004), we conducted an independent evaluation of the definitions based on training the decision tree using the 800-pair test dataset (classifying 200 as aggravating, 200 as alleviating and 400 as noninteracting). This trained classifier was then used to examine if GP and the multiplicative definition categorize gene pairs into appropriate interaction types, as expected. Based on the decision-tree classification score threshold of εGP ≤ −0.440 and εGP ≥ 0.110 (Supplementary Fig. S2A), GP achieved an accuracy (consistently classifying the GI type) of 99.63% (797 of 800 pairs) at an error (inconsistently classifying the GI type) rate of 0.375% (3 of 800 pairs) as determined by 5-fold cross-validation. Conversely, at the classification score threshold of εMulti ≤ −0.385 and εMulti > 0.126 (Supplementary Fig. S2A), the accuracy of the multiplicative definition was 96.9% (775 of 800) at an error rate of 3.125% (25 of 800; Fig. 2A). Further investigation on these 25 gene pairs (Supplementary Table S4) revealed that the multiplicative model misclassified 21 alleviating interactions as noninteracting, 2 noninteracting pairs as alleviating and 2 alleviating pairs as aggravating or noninteracting. In fact, some of the alleviating interactions among the 21 gene pairs are well supported by literature evidence. For example, the multiplicative definition identified the association between helicase II (uvrD) and homologous recombination (recA, recD) mutants as noninteracting, whereas GP classified these pairs as being alleviating, in agreement with the observation that UvrD and RecA act in the same pathway and function in concert with RecBCD-dependent recombination intermediates (Bidnenko et al., 2006). A similar high accuracy and negligible deviation of observed double-mutant fitness from expectation for the GP method was obtained in comparison to the multiplicative model, when the decision-tree classifier was used on 104 022 gene pairs (Supplementary Table S4). As with the decision tree approach (Fig. 2A), we also achieved a greater area under the receiver operating characteristic curve for the interacting (0.996) and noninteracting (0.998) test pairs predicted by GP model (versus 0.975 and 0.996 for the multiplicative model, respectively; Supplementary Table S4). While these differences appear modest, they correspond to hundreds of gene pairs that are differently classified and will have substantial impact on our biological interpretation of the identified pathways.
3.3 Validation of GI pairs identified by GP model
As error metrics, ε distribution, and cross-validation confirmed GP to be a better predictor than the multiplicative model for identifying GIs and functional relationships in E.coli, we next sought to systematically examine whether the aggravating (1614; category B) or alleviating (2459; category D) GIs identified under the GP model conform to the previous notion (Baryshnikova et al., 2010) regarding a common biological function. To do so, we used GSEA (Subramanian et al., 2005) to evaluate enriched functional annotations among interacting gene pairs. Using GSEA with the pre-ranked gene list of fitness data and all 3440 GO terms as input, we found enrichment for 9 significant (P ≤ 0.05; false-discovery rate ≤ 0.25) GO terms. Among these terms, one-quarter (25%, 999 of 4073) of the enriched (P = 1.04 × 10−278 by hypergeometric test) gene pairs defined as aggravating or alleviating under the GP definition shared a GO term (Fig. 2B;Supplementary Table S5). Conversely, only very few (0.07%, 3 of 4073) disagreeing aggravating (3481; category A) or alleviating (592; category C) GI pairs that were defined by GP as noninteracting (Fig. 2B) shared the same GO functional attribute. Instead, 48-fold underrepresentation for pairs with a shared annotation (P = 1.39 × 10−232 by hypergeometric test) than interacting pairs observed under the GP definition. While the assumption is that most noninteracting gene pairs should not have a close functional relationship (Mani et al., 2008), shared functionality observed for certain enriched noninteracting pairs is expected because the case study (Kumar et al., 2016) was focused on genome integrity rather than random pairs, and because a low level cross-talk is expected between processes due to indirect, common effects on fitness. In any case, since the enrichment for shared GO terms among the number of noninteracting pairs is several fold (220 times; 26.4% versus 0.12%; Fig. 2B) lower than among the number of interacting pairs under the GP definition, our results suggest that GP is a strong predictor of functional links, as indicated by aggravating or alleviating GIs.
Next, a detailed inspection of the pairs categorized as aggravating (1614; category B) and alleviating (2459; category D) by the GP model, and those pairs in disagreement with the multiplicative model (3481, category A; 592, category C), indicated that in comparison to interactions with annotated genes, over half (57%; 4673 of 8146) of the pairs had epistatic dependencies between unannotated E.coli genes (i.e. orphans or genes of unclear function) not previously linked to DNA repair, or between unannotated and known genomic integrity targets (Fig. 2C). To rigorously verify the reliability of the interactions defined using the GP model, we selected a subset of these gene pairs for independent experimental validation using the mini-array genetic crosses. Specifically, our selection of ‘agreement’ gene pairs entails noninteracting or aggravating interactions under GP and multiplicative models (e.g. ylcG with umuD/yhhZ/mraZ; yebV with fabZ/mukB/pagP/fliZ; yicR with recB). Whereas the ‘disagreement’ gene pairs include aggravating interactions under multiplicative definition, but noninteracting according to the GP model (e.g. yaiU with umuD/yhhZ/ mraZ; ftsH with fabZ/mukB/pagP/fliZ; category A in Supplementary Table S1), or pairs that were noninteracting in the multiplicative model, but alleviating under the GP definition (e.g., yicR with ftsKNW or recCE; category D in Supplementary Table S1). Notably, when GP and multiplicative definitions disagreed, GP predictions were experimentally verified (Fig. 2D). As an illustrative example, an uncharacterized gene, yicR displayed alleviating and aggravating GIs with homologous recombination genes (recBCE) under the GP definition, whereas the multiplicative model defined all these GIs as aggravating (Fig. 2D). To examine the biological implication of these GP or multiplicative assignments, we compared the sensitivity of the E.coli yicR-mutant strain to DNA damage induced by genotoxic agents (BLM, MMS) with the sensitivities of WT or recBCE single gene deletion strains.
BLM or MMS introduces clusters of lesions in DNA to form single- or double-strand breaks (SSBs, DSBs), or produces an assortment of N- and O-methylated bases in DNA by blocking the replication fork progression. In E.coli, when these genotoxins induce DNA damage, cell survival depends on rec-dependent homologous recombination and repair pathways (Knezevic-Vukcevic and Simic, 1991; Kosa et al., 2004; Nowosielska et al., 2006). It is known from previous evidence (Knezevic-Vukcevic and Simic, 1991; Nichols et al., 2011) that yicR and the recombination-deficient recBC-mutant strains, but not recE deficient strains, are sensitive to the cytotoxic effects of BLM, MMS (Fig. 2E) or other genotoxic agents that induce DSBs (Nichols et al., 2011). Consistent with this observation, in our experiments, strains lacking yicR showed hypersensitivity (Fig. 2E), but inactivation of both yicR and recC did not enhance the effect compared to the respective single mutants. Thus, the lack of synergy in yicR-recC double mutants suggests, consistent with the GP prediction, that yicR functions with recC. Conversely, the increased sensitivity of yicR-recB double mutants to BLM or MMS is consistent with the GP and multiplicative definitions that recB is absolutely required for the yicR phenotype. Whereas, when compared to yicR-recB/recC phenotypes, yicR recE double mutants were fully resistant like the recE single mutants, suggesting that yicR does not function redundantly with recE. This lack of additive effect is in agreement with the GP definition.
Recombination-deficient strains (e.g. rec mutants), sensitive to genotoxic agents, exhibit impaired cell division and chromosome segregation, forming aberrantly long, nonseptate, multi-nucleate filaments (Babu et al., 2011; Kumar et al., 2016). Therefore, we ascertained if the observed yicR-recB and yicR-recC phenotypes suggested by hypersensitivity to MMS, as well as by aggravating and alleviating GIs by GP, respectively, indicate involvement in chromosome segregation following DNA damage. We found that the double mutants of yicR with recB, but not with recC, exhibited longer (∼9.2 μm avg. cell length) filamentous cells compared to WT or single mutants (Fig. 2F) after MMS treatment. This suggests that YicR appears to function in cooperation with RecC, while functioning in parallel with RecB in DSB repair pathway, which is consistent with the GP and multiplicative definition. Additionally, our findings decisively indicate that, like yicR-recC or yicR-recE growth fitness, other gene pairs that were tested by genetic crosses (Fig. 2D) were in fact misclassified as aggravating interactions under the multiplicative model.
4 Conclusions
Global epistatic E.coli connectivity maps based on colony growth measurements, under standard laboratory growth conditions or in the presence of genotoxic or environmental stressors (Babu et al., 2011, 2014; Cote et al., 2016; Gagarinova et al., 2016; Kumar et al., 2016) have been generated using the multiplicative function, following the framework established in yeast (Baryshnikova et al., 2010; Collins et al., 2006, 2007). However, there was no comparable, rigorous evaluation performed on large-scale E.coli GI data to justify the use of the multiplicative model. As a result, it has not been clear if the multiplicative definition has had any ramifications on the interpretation of GIs generated in E.coli to date. Therefore, in the current study, using the growth fitness of viable digenic mutant combinations and their corresponding single-gene deletion mutants of functionally related protein-coding genes involved in genome integrity (Kumar et al., 2016), we provided predictions for pairs of genes with the interaction subtype for each of the selected machine learning and predefined models. This dataset was then employed to compare the underlying performance of the GIs between definitions, with a set of statistical and experimental measures, to reveal the practical differences among definitions. Notably, based on the error metrics and |ε-score| distributions of the training and genome integrity datasets, we found the GP definition to display lower RMSE and MAE values, with ε distributions centered around zero and more closely conforming to the ideal normal distribution (i.e. most gene pairs are noninteracting) than under other definitions that showed more bias and greater dispersion of ε-scores. Although the NN definition exhibited low CV on the noninteracting test gene pairs (Fig. 1A), as we have shown in Figure 1B, NN on the test pairs exhibited a negative shift in the normal distribution of ε-score (i.e. deviating from zero) compared to the distribution of GP that approximated a normal distribution that is tight and close to zero (i.e. with lowest bias and dispersion for ε scores). In any case, our rationale for choosing GP over NN definition is not solely based on the error metrics of noninteracting test dataset, but rather on our observations with other sets of experimental results (Fig. 1B;Supplementary Fig. S1A; and Table S2, sheets 4 and 5).
Other important observations came from the analyses was the gene pairs that were inconsistently classified by the GP and the multiplicative models. In the case of 3481 aggravating (category A) and 592 alleviating (category D) GIs defined under the multiplicative model, the double mutants and one of the two single-mutant fitness measurements appear to near one, or WT and not different from one another (Fig. 1D); yet, the multiplicative model has defined these pairs as aggravating or alleviating. In contrast, GP defined the majority of these pairs as noninteracting, which we confirmed to be consistent experimentally with the genetic crosses performed on more than a dozen gene pairs. Additional evidence from GSEA on these GP-classified noninteracting pairs revealed less functional enrichment, supporting the respective GP model assignments. Decision tree classifiers also further indicate that GP had the highest accuracy and lowest error in consistently classifying the test and genome integrity gene pairs to appropriate interaction type when compared to the multiplicative model. When we repeated the aforesaid analyses on an independent genome-wide GI study (Babu et al., 2014) that included genes broadly representing various bacterial processes, we found that GP similarly outperformed multiplicative and other definitions.
While through multiple lines of evidence we have demonstrated that GP is able to show comparable results with different E.coli datasets, for example, with functionally related (Kumar et al., 2016) or unrelated (Babu et al., 2014) groups of genes, we tested if this definition can likewise perform well in the budding yeast that examined 5.4 million digenic pairs for synthetic GIs (Costanzo et al., 2010). The examination of the |ε-score| distributions showed that, unlike in E.coli, the GP and multiplicative definitions in yeast displayed almost identical error metrics and |ε-score| distributions with lowest bias and dispersion, conforming to the expectation that most gene pairs are noninteracting (Supplementary Fig. S2B). This outcome raises the question of why the multiplicative model was not consistent in certain cases with the GP definition on E.coli GI datasets. We found that the discrepancy with the multiplicative definition occurred when the normalized colony sizes for one of the observed single mutants is much larger (Wx or Wy >1.0), causing the multiplicative model to consider those gene pairs as aggravating instead of alleviating or noninteracting. Hence, it can be less sensitive to defining alleviating or suppression phenotypes, which in fact is evident from one-tenth (13%; 311 of 2459) of the gene pairs (category D) that were classified as aggravating (Supplementary Table S1). The remaining (87%, 2148 of 2459) large proportion of gene pairs were misclassified as noninteracting because the multiplicative model cannot distinguish the small fitness differences observed between single and double mutants. While under certain circumstances the multiplicative model introduces disagreement with GP, for most (84%; 21 932 of 26 005) of the top-ranking gene pairs, we found both definitions to perform equally well for identifying epistatic relationships when the growth of the double mutants (better or worse) can be different from at least one single mutant with a moderate fitness defect or both single mutants with near WT fitness (Supplementary Fig. S2C). Taken together, we suggest those favoring the multiplicative model should exclude interactions, where the fitness of one of the single mutants is especially large, to place more confidence on biological conclusions.
To facilitate exploration, we have provided the source code used to generate the machine learning models in WEKA software environment, along with the supporting E.coli training dataset in WEKA’s native format (Supplementary Material). In summary, we conclude that GP is a viable alternative for future epistatic studies or those underway in unicellular (Peters et al., 2016; Shapiro et al., 2018) or multicellular organisms (Billmann et al., 2018) using the CRISPR genetic screens that are in quest for identifying functional relationships. As with any other high-throughput screening framework, large-scale genetic screens can be noisy; hence it is beneficial to employ GP and the multiplicative model or any other definition, as long as there is a general consensus between definitions in classifying the consistent GI types.
Acknowledgements
We thank Gabe Musso (Harvard Medical School, Massachusetts, USA), members of the Babu lab (James Vlasblom, Matthew Jessulat) and anonymous reviewers for their helpful discussions and valuable comments. We also thank Mohammed Taha Moutaoufik for assisting with cell morphology micrographs. A.K. and S.Z. designed and conducted the machine learning-based analyses. S.P. compiled the raw and normalized fitness data from the large-scale genetic screens, and performed GSEA and other computational analyses. A.H., A.G. and S.K. created Hfr gene deletion mutants and F− hypomorphic strains, and A.H. performed experimental validation, with feedback from K.A and M.B. A.K., S.Z. and M.B. wrote computational methods, and M.B wrote the article, with input from co-authors.
Funding
This work was supported by the National Sciences and Engineering Research Council of Canada [DG-05336 to S.Z., DG-20234 to M.B]. A.G. was supported by a Canadian Institutes of Health Research (CIHR) Postdoctoral Fellowship. S.Z. holds Canada Research Chair in Computational Leaning Theory, and M.B. is a CIHR New Investigator (MSH-130178).
Conflict of Interest: none declared.
References