The genome-scale interplay amongst xenogene silencing, stress response and chromosome architecture in Escherichia coli

The gene expression state of exponentially growing Escherichia coli cells is manifested by high expression of essential and growth-associated genes and low levels of stress-related and horizontally acquired genes. An important player in maintaining this homeostasis is the H-NS-StpA gene silencing system. A Δhns-stpA deletion mutant results in high expression of otherwise-silent horizontally acquired genes, many located in the terminus-half of the chromosome, and an indirect downregulation of many highly expressed genes. The Δhns-stpA double mutant displays slow growth. Using laboratory evolution we address the evolutionary strategies that E. coli would adopt to redress this gene expression imbalance. We show that two global gene regulatory mutations—(i) point mutations inactivating the stress-responsive sigma factor RpoS or σ38 and (ii) an amplification of ∼40% of the chromosome centred around the origin of replication—converge in partially reversing the global gene expression imbalance caused by Δhns-stpA. Transcriptome data of these mutants further show a three-way link amongst the global gene regulatory networks of H-NS and σ38, as well as chromosome architecture. Increasing gene expression around the terminus of replication results in a decrease in the expression of genes around the origin and vice versa; this appears to be a persistent phenomenon observed as an association across ∼300 publicly-available gene expression data sets for E. coli. These global suppressor effects are transient and rapidly give way to more specific mutations, whose roles in reversing the growth defect of H-NS mutations remain to be understood.

when the glycerol stock of a single ΔhnsstpA colony is streaked on an LBagar plate, and during the subsequent batch liquid culture used for nucleic acid extraction. At this stage, the rac prophage is not fully lost from the population yet: many reads do map to the prophage, with an ~3fold difference in coverage between the rac locus and the flanking regions (Panel A above, which shows the read coverage of the rac prophage and flanking regions as a function of genome coordinate; black -parent, blue -HS100 population, red -HS250 population). That the rac prophage is not deleted completely from the parental population is further supported by our previouslyreported transcriptome of ΔhnsstpA, which showed upregulation of certain rac prophage genes, including the gene for the toxin kilR, in the mutant when compared with the wildtype (Panel B above). Nevertheless, rac excision is presumably very common, as we noticed its deletion in the genome sequences of each of four single ΔhnsstpA colonies from the streaked plate. This is consistent with a previous study (Hong et al. Microbial Biotechnology. 3: 344356. 2010), which had reported rapid excision of rac in Δhns and a HNSK57N mutant where the oligomerisation of the protein was disrupted. In the HS100 populations, fewer reads map to the rac region, whereas hardly any read aligns to this locus in the HS250 populations (Panel A).

Supplementary Figure 8
From the sequencing data for HS100, we infer the presence of an amplification of ~40% of the genome, centred around the origin of replication. In theory, there is a remote possibility that this could arise from a deletion of the remaining ~60% of the genome, around the terminus of replication (panel a in the figure above). This might be possible in cells with multiple nucleoids, with a small proportion of the genomic DNA molecules carrying such a deletion. Using the simulation approach described below, we estimated the proportion of nucleoids carrying an amplification or a deletion, for a given difference in coverage between the Ori-centred and the Ter-centred segments of the chromosomes.
What we describe below is a computer experiment simulating the expected experimental outcome of sequencing a certain population of genomic DNA molecules. This allows us to impose certain conditions (example: proportion of genomic DNA molecules with a duplication or a deletion), and build expectations of quantitative measures resulting from the sequencing of the population of DNA. These expected measures can then be compared to experimental data to derive conclusions.
We simulated random fragmentation of the chromosome under the two above-described scenarios, as follows (panel b and c in the figure): a portion p of the genomic DNA molecules either underwent a global duplication of the High Coverage region (duplication scenario) or underwent a deletion of the Low Coverage region (deletion scenario). For example p = 0.1 in the duplication scenario implies that 90% of the genomic DNA molecules are wildtype and the remaining with the duplication.
Both scenarios -deletion and duplication -would carry an advantage in selection in the sense that they would relatively increase the expression of growth-associated genes in the Ori region in comparison to the Ter region. The first scenario does so due to the higher copy number of the duplicated region but it comes with a trade-off of reduced fitness because the chromosome replication time depends linearly on its length. In the second scenario, the effect on gene expression is mediated by the reduction in copy numbers of the genes belonging to the deleted region, which encodes many horizontally acquired genes including eight of the nine cryptic prophages; these cells suffer from the deletion of multiple essential genes belonging to the Ter region, and can survive possibly only by polyploidy.
The simulation first generates 8000000 random fragments from the population of genomic DNA molcules described by the two scenarios. This is similar to the fragmentation of the genomic DNA molecule that is performed before sequencing. For each fragment, the sequences of 100-bases on either end are written down into the simulated sequencing data file -akin to the paired-end sequencing technique. These data as processed in a manner similar to real sequencing data, except for quality controls as the simulated data assumes 100% accuracy. The sequencing coverage is plotted against chromosomal coordinates, for various scenarios (varying p for duplication and deletion), and is shown in the figure below (panel a). The foldchange in the coverage between the high and the low coverage regions is predicted to depend only on p and on the length fraction of the higher coverage region over the whole genome δ in the two scenarios Panel b illustrates the interpolation formulae for coverage fold-change as a function of p and δ. These formulae are useful to estimate the value of p from coverage data; an expression is given for the complete relation as well as for the first-order linear approximation in p.

Supplementary Figure 9
The two scatter plots on the left show the density of the randomized reads mapped to E. coli K12 chromosome for a chosen value of p. This is similar to the real experimental data shown in Figure 3 in the main text. The plots on the right show that the dependence of the mean coverage as a function of the mutated fraction p is different between the duplication and deletion scenarios. Note that parts of this figure are zoomed-in versions of smaller images shown in Supplementary Figure 8. In the deletion scenario the fold change between the two regions diverges more as a function of p than in the duplication scenario.
In the HS100 population the fold change in coverage between the High and the Low coverage regions has been measured at about ~1.2-fold, which according to our model would correspond to one of the two following scenarios: (a) a duplication in ~25% of the population; (b) a deletion in ~7% of the population.
The number of HS100 single clones carrying the mutation (3 out of 16) is ~3-times more likely under the duplication scenario than with the deletion (by random sampling from 100s of clones). A 1.8-fold change in coverage, as observed in a single colony sequencing, can be explained either by a deletion in ~25% of the population or an amplification in ~80% of the population. The fact that the proportion of cells with a deletion that can explain our observations is very low, also suggests that these cells will require an exceptional high rate of growth in order to result in an observable increase in population growth rates.
These together support the duplication model for the coverage data in HS100.

Supplementary Figure 10
The above figure shows a scatter plot of the fold change in expression between the following comparison: x axis, ΔrpoShnsstpA v. ΔhnsstpA; yaxis, rpoS mut v. ΔhnsstpA. These indicate a high agreement between the two comparisons showing that much of the transcriptional effect seen in rpoS mut can be explained by an rpoS knockout. All fold changes are in the log, base2 scale.

Supplementary Figure 11
The above figure shows the gene expression properties of genes that are upregulated in ori 2 relative to Δhns stpA. The left panel shows the wildtype midexponential phase expression levels of genes, which are up regulated in ori 2 , and of a control set of genes, which do not change in expression in ori 2 . The right panel shows the fold change in expression between ΔhnsstpA and the wildtype for the same two sets of genes.
Control genes were defined as those whose foldchange in expression between ori 2 and ΔhnsstpA were between 0.5 and +0.5 on the log2 scale. This figure shows that genes, which are upregulated in ori 2 relative to ΔhnsstpA show higherthanaverage expression levels in exponentially growing wildtype cells. Their expression is decreases in ΔhnsstpA when compared to the wildtype.
Supplementary Figure 12 This figure shows the growth curve of wildtype E. coli grown in the spent media collected from the mid exponential phase culture of ΔhnsstpA (red) and wild type (black). The blue graph represents the growth of wild type E. coli in the ΔhnsstpA spent media supplemented with LB; and the green graph shows the growth of wild type E. coli grown in the wild type spent media supplemented with LB.
In rapidly-dividing cells, transient amplification of rRNA is common. Further, duplication of large segments of the chromosome can be selected in low nutrient environments. Polymorphisms in rpoS are common in both laboratory and environmental / pathogenic isolates of E. coli. Studies in chemostats have also shown that σ38 inactivation leads to shorter doubling times in low nutrient environments, and that such environments select for loss-of-function mutations in rpoS. We tested whether the Δhns-stpA culture might in fact be experiencing a 'low-nutrient' state. If this were the case, spent medium growing the double mutant to a certain cell density might support less growth than one in which the wildtype has grown to a similar cell density. We tested the ability of spent media, prepared from the wildtype and Δhns-stpA cultures, to support growth of the wildtype E. coli in 96-well plates (see figure). As expected, spent medium from a wildtype culture supported growth, albeit obviously to lower levels than fresh LB medium. On the other hand, medium from Δhns-stpA cultures supported less growth, in terms of both growth rate and final cell density.
Further, the Δhns-stpA spent medium, when supplemented with LB constituents supported growth, which was comparable to that obtained from similar experiments with the wildtype supernatant. These suggest that Δhns-stpA cultures might be experiencing a low-nutrient state, which might be because of sub-optimal diversion of nutrients towards futile processes. This experience of nutrient limitation might be a common selective force for σ38 inactivation and duplication of the genomic domain around the origin of replication.