IHF and HU are two heterodimeric nucleoid-associated proteins (NAP) that belong to the same protein family but interact differently with the DNA. IHF is a sequence-specific DNA-binding protein that bends the DNA by over 160°. HU is the most conserved NAP, which binds non-specifically to duplex DNA with a particular preference for targeting nicked and bent DNA. Despite their importance, the in vivo interactions of the two proteins to the DNA remain to be described at a high resolution and on a genome-wide scale. Further, the effects of these proteins on gene expression on a global scale remain contentious. Finally, the contrast between the functions of the homo- and heterodimeric forms of proteins deserves the attention of further study. Here we present a genome-scale study of HU- and IHF binding to the Escherichia coli K12 chromosome using ChIP-seq. We also perform microarray analysis of gene expression in single- and double-deletion mutants of each protein to identify their regulons. The sequence-specific binding profile of IHF encompasses ∼30% of all operons, though the expression of <10% of these is affected by its deletion suggesting combinatorial control or a molecular backup. The binding profile for HU is reflective of relatively non-specific binding to the chromosome, however, with a preference for A/T-rich DNA. The HU regulon comprises highly conserved genes including those that are essential and possibly supercoiling sensitive. Finally, by performing ChIP-seq experiments, where possible, of each subunit of IHF and HU in the absence of the other subunit, we define genome-wide maps of DNA binding of the proteins in their hetero- and homodimeric forms.
Nucleoid-associated proteins (NAPs) are considered to be global regulators of gene expression in bacteria. They alter the topology of bound DNA by bending, bridging or wrapping it, leading to multiple effects on the bacterial cell including transcriptional regulation ( 1 ). Studies of 12 types of NAPs in Escherichia coli showed that they are generally expressed at high levels, and differ from each other in their expression across the growth phase and the degree of sequence specificity ( 2 , 3 ). The global nature of the effects of NAPs on bacterial physiology has prompted several genome-scale studies of their binding and transcriptional effects in E. coli and Salmonella enterica ; these have sometimes led to intriguingly conflicting conclusions on the functions of NAPs, thus underscoring their complexity ( 4–10 ).
Two NAPs, IHF and HU, are composed of two homologous subunits each (IhfA and IhfB; HupA and HupB). They are both members of the DNABII family of DNA-binding proteins and are strikingly similar to each other in sequence and in their unique structural fold ( 11 ). However, the similarities end there: they differ in their sequence specificity, with IHF being sequence-specific and HU binding at low affinity along the chromosome ( 2 , 12 ) with some specificity toward gapped or nicked DNA ( 13–16 ). Whereas the ability of each subunit of HU to form homodimers and bind to the DNA in such a form is relatively well-established ( 17 , 18 ), such evidence is less clear for IHF ( 19 , 20 ). Moreover, the two proteins differ in the degree of conservation across bacteria: whereas at least one subunit of HU is found across most bacterial genomes making it the most conserved NAP, IHF has a more restricted occurrence. Their functions have been described to include regulation of transcription, replication and recombination via DNA binding ( 1 ) and extend to the control of translation initiation by HU via protein–RNA interactions ( 21 , 22 ).
Several molecular and genome-scale studies have investigated the role of IHF in transcriptional control. Notable among such studies are the description of its effects on the nir ( 23 ) and the fim ( 24 , 25 ) operons, wherein IHF represses the nir and activates the fim operon. Also remarkable is the role of IHF in helping the formation of activation loops at enhancer-dependent promoters ( 26 ). Compilation of results from molecular studies—performed under diverse conditions—by the curators of the RegulonDB database ( 27 ) identified over 150 genes as being regulated at the transcriptional level by IHF, with over two-thirds activated by IHF. A very early microarray study ( 28 ), primarily emphasizing technical aspects of data analysis, identified genes that are differentially regulated in an ΔihfA strain grown in MOPS minimal medium; however, it must be noted that the strain on which the experiment was performed (a derivative of K12 CP79) was different from that for which the microarray was designed (K12 MG1655). In Salmonella enterica Typhimurium , deletions of ΔihfA , ΔihfB and both ΔihfA and ΔihfB each led to different effects on transcription during growth in rich LB medium, thus suggesting distinct binding tendencies of the IhfA2 and IhfB2 homodimers and the IhfAB heterodimer ( 29 ). The number of genes responding transcriptionally to Δihf is substantially higher in Salmonella than reported in E. coli ; these genes include virulence determinants in Salmonella . Finally, a genome-scale study of IHF binding to the E. coli genome using low-resolution microarrays showed a preference for the binding regions to be located in non-coding DNA ( 5 ).
Despite the near universal conservation of HU in the bacterial kingdom, only recently have genome-scale studies been performed to investigate its effect on gene expression. This is in spite of molecular studies investigating its role in controlling gene expression at specific loci, most notably the stabilization of the repression loop at the gal promoter ( 30 ). One study performed clustering analysis of microarray data obtained for ΔhupA , ΔhupB and ΔhupA/ΔhupB (ΔhupAB) strains during exponential, transition and stationary phases of growth, thus identifying distinct HupA2, HupB2 and HupAB regulons comprising genes used in energy metabolism, SOS response and osmolarity and acidic stress response ( 31 ). In spite of the established effect of HU on the supercoiled state of the DNA, these authors found little association between genes comprising the HU regulon and those that respond to DNA supercoiling. Again however, this experiment was performed on a strain of E. coli (C600) which was not the same as that based on which (MG1655) the microarray was designed. A more recent microarray study of the double-deletion strain showed that genomic loci encoding HU-responsive genes tend to display high gyrase binding and therefore supercoiling sensitivity ( 32 ). Finally, in Salmonella , distinct regulons were identified for the three dimeric forms of HU, such that dissimilar sets of genes were differentially expressed during different phases of growth ( 33 ). To our knowledge, though HU is a major NAP, no study has investigated its in vivo binding to the chromosome on a genomic scale.
Despite the above studies, the binding characteristics of the two proteins have not been described at a high resolution and on a genome-wide scale. Further, as evident from the conflicting results of previous studies, the effects of these proteins on gene expression on a global scale remain a contentious issue. Finally, the contrast between the functions of the homo- and heterodimeric conformations of these proteins remains poorly understood and deserves the attention of further study. Here, we present a genome-scale study of the binding characteristics of HU and IHF to the E. coli K12 chromosome at four different time-points during batch growth in LB, using chromatin-immunoprecipitation coupled to high-throughput sequencing (ChIP-seq). We also perform microarray analysis of gene expression in single- and double-deletion mutants of each protein, to identify their regulons. Finally, by performing ChIP-seq experiments where possible, of each subunit of IHF and HU in the absence of the other subunit, we define genome-wide maps of DNA binding of the proteins in their hetero- and homodimeric forms.
Strains and general growths conditions
The E. coli K-12 MG1655 bacterial strains used in this work are listed in Supplementary Table 1 . Luria–Bertani (0.5% NaCl) broth and agar (15 g l −1 ) were used for routine growth. Where needed, ampicillin, kanamycin and chloramphenicol were used at final concentrations of 100, 30 and 30 µg ml −1 , respectively.
Construction of E. coli MG1655 knock-outs and FLAG-tagged strains
Disruption of ihf and hup genes in the E. coli chromosome was achieved by the λ Red recombination system ( 34 ), as previously described by Baba et al. ( 35 ). Primers designed for this purpose are shown in Supplementary Table 2 . Sets of additional external primers were used to verify the correct integration of the PCR fragment by homologous recombination ( Supplementary Table 3 ). The cassette was then removed by FLP-mediated site-specific recombination. Double-deletion strains were made by P1 transduction ( 36 ).
The 3xFLAG epitope was added at the C terminus of the IhfA, IhfB, HupA and HupB proteins by a PCR-based method with plasmid pSUB11 as template ( 37 ). Primers used for introducing the 3xFLAG tag are shown in Supplementary Table 2 . The tagged construct was then introduced onto the chromosome of E. coli MG1655 using the λ Red recombinase system. At each stage, DNA and strain constructions were confirmed by PCR and/or sequencing. This approach resulted in the introduction of a kanamycin resistance cassette in the chromosome downstream of the tagged gene. The cassette was then removed by FLP-mediated site-specific recombination.
RNA extraction and microarrays
To prepare cells for RNA extraction, 100 ml of fresh LB was inoculated 1:200 from an overnight culture in a 250 ml flask and incubated with shaking at 180 rpm in a New Brunswick C76 waterbath at 37°C. Two biological replicates were performed for each strain and samples were taken at exponential, late exponential, early stationary and stationary phase. The cells were pelleted by centrifugation (10 000 g , 10 min, 4°C), washed in 1xPBS and pellets were snap-frozen and stored at −80°C until required. RNA was extracted using Trizol Reagent (Invitrogen) according to the manufacturer's protocol until the chloroform extraction step. The aqueous phase was then loaded onto mirVanaTM miRNA Isolation kit (Ambion) columns and washed according to the manufacturer's protocol. Total RNA was eluted in 50 µl of RNAase free water. The concentration was then determined using a Nanodrop ND-1000 machine (NanoDrop Technologies), and RNA quality was tested by visualization on agarose gels and by Agilent 2100 Bioanalyser (Agilent Technologies, Palo Alto, CA, USA).
For the generation of fluorescence-labeled cDNA, we used the FairPlay III Microarray Labelling Kit (Stratagene). Briefly, 1 µg of total RNA was annealed to random primers, and cDNA was synthesized in a reverse transcription reaction with an amino allyl modified dUTP. The amino allyl labeled cDNA was then coupled to a Cy3 dye (GE Healthcare) containing a NHS-ester leaving group. The labeled cDNA was hybridized to the probe DNA on microarrays by incubating at 65°C for 16 h. The unhybridized labeled cDNA was removed and the hybridized labeled cDNA was visualized using an Agilent Microarray Scanner.
To measure the enrichment of the IhfA, IhfB, HupA, HupB or RNAP-binding targets in the immunoprecipitated DNA samples, real-time qPCR was performed using a MJ Mini thermal cycler (Bio-Rad). About 1 μl of IP or mock-IP DNA was used with specific primers to the promoter regions (primer sequences in Supplementary Table 3 ; results in Supplementary Table 4 ) and Quantitect SYBR Green (QIAGEN).
RT-PCR for validation
To validate the results of the microarray analysis, quantitative reverse–transcriptase PCR (qRT–PCR) was carried out using specific primers to the mRNA targets showing up- or down-regulation, and control targets not showing differential expression (primer sequences in Supplementary Table 5 ; results in Supplementary Tables 6 and 7 ). RNA was extracted as described above from wild type, ΔihfA , ΔihfB , ΔihfAB , ΔhupA , ΔhupB and ΔhupAB cells and 30 ng total RNA was used with the Express One-Step SYBR GreenER kit (Invitrogen) according to the manufacturer's guidelines, using a MJ Mini thermal cycler (Bio-Rad).
Library construction and Solexa sequencing
Prior and post library construction, the concentration of the immunoprecipitated DNA samples was measured using the Qubit HS DNA kit (Invitrogen). Library construction and sequencing was done using the ChIP-Seq Sample Prep kit, Reagent Preparation kit and Cluster Station kit (Illumina). Samples were loaded at a concentration of 10 pM.
Public data sources
The E. coli K12 MG1655 genome was downloaded from the KEGG database and gene coordinate annotations from the Ecocyc 11.5 database ( 39 ). Literature-derived transcriptional regulatory network and a list of operons were sourced from the RegulonDB 6.2 database ( 27 ). List of genes bound by IHF was obtained from Grainger et al . ( 5 ). ChIP-chip signals for DNA gyrase were obtained from Jeong et al . ( 40 ). Functional category annotation data for E. coli K12 MG1655 was obtained from the COG database. RNA-seq data was obtained from our previous publication ( 4 ).
Analysis of genomic data
Reads obtained from the Illumina Genome Analyzer were mapped to both strands of E. coli K12 MG1655 genome using BLAT ( 41 ), as described previously. Binding regions for IHF were calculated using the per-base read count distribution as performed earlier ( 4 ); in addition to a statistical enrichment (binomial test) in the ChIP signal over the mock-IP (as proposed by PeakSeq) ( 42 ), we imposed a further 1.5-fold increase in the absolute signal. We also used the Bioconductor package BayesPeak ( 43 , 44 ) to identify binding regions for IHF and HU.
For HU, we calculated two gene-level measures of binding signal: (i) the highest read count obtained between −150 and +20 of the ORF and (ii) the median read count across the ORF body; the two measures provide equivalent results. In addition, we adapted a method used previously to analyse data from ChIP-chip experiments for nucleoporins in Drosophila melanogaster , to identify regions of enriched signal for HU ( 45 ). This adapted method calculates differences in log (base 2)-transformed read count signals over 400 nt windows between the ChIP sample and the mock-IP sample, following normalization by DESeq ( 46 ). The left hand side of this distribution, plus its mirror image around the mode gives the null distribution. All data points over the 95th percentile of the null distribution were considered as representing significant binding in the sample.
Binding motifs were identified using the MEME software and subsequently, binding regions scanned for the occurrence of the motif using MAST ( 47 ). An operon was defined as bound by the protein of interest if at least 50 bp of the intergenic region upstream of the operon overlapped with a binding region. For long intergenic regions, only the first 400 bp immediately upstream of the operon were used.
Gene expression analyses were performed on a previously described custom-designed isothermal Agilent microarray platform, and analyzed as described earlier ( 4 ). Briefly, array data were background corrected using normexp ( 48 ) and normalization performed using VSN ( 49 ). Differential expression in the deletion strains compared with the wild-type was called at FDR-adjusted P -value of 0.05, and a fold change of at least two.
All statistical tests were carried out using R .
DNA-binding properties of IHF and HU subunits
To study the binding characteristics of IHF and HU to the E. coli chromosome, we performed immunoprecipitation of each protein subunit—during mid-exponential, late-exponential, transition-to-stationary and stationary phases of growth—and sequenced the cross-linked DNA using an Illumina Genome Analyzer system (ChIP-seq). We also used control data from a mock-IP experiment (for mid-exponential phase) described in our previous study ( 4 ). We mapped the short sequence reads obtained from each sequencing experiment to the E. coli K12 MG1655 genome (KEGG ID: eco). For each sample, we then obtained a read count distribution, quantified by the number of reads that map to each base position on the chromosome.
We inspected the nature of the read count distributions by plotting their densities ( Figure 1 A). The distributions for the various IHF samples each had a heavy right tail corresponding to regions of specific binding. On the other hand, the distributions for HU were only slightly skewed to the right, and in this respect similar to that from the mock-IP experiment. Whereas the read counts obtained for IHF were only weakly correlated to the mock-IP control ( ρ = 0.12 for IhfA, mid-exponential phase sample; the weak correlation presumably arising from a systemic background), those for HU showed a more significant correlation with the mock-IP ( ρ = 0.47 for HupA, mid-exponential phase; Figure 1 A). Similarly, plots of the distribution of mock-IP subtracted signal for HU (following division of read counts by the total number of reads obtained in that sample, and log transformation) was centered around zero with a relatively weak right-sided tail ( Figure 2 A). On the other hand, this distribution for IHF was offset from zero with a peak well-below zero representing most of the genome with little or no binding, and those to the right corresponding to regions of enriched signal. Despite the strong resemblance of the HU data to the mock-IP, our HU experiment is representative of the protein's DNA binding profiles for the following reasons: (i) there is a considerable right-sided tail to the mock-IP-subtracted HU ChIP-seq signal; (ii) the read count profile for each HU subunit is more correlated with that for the other subunit ( ρ = 0.83) than with that for the mock-IP ( ρ = 0.47 and 0.59 for HupA and HupB, respectively; Figure 1 B; Supplementary Figure S1 ); (iii) the profile at any given time-point is also more strongly correlated with that from the adjacent time-point than with the mock-IP profile ( ρ = 0.88 for HupA between exponential and late exponential phases; Figure 1 B and Supplementary Figure S2 ); (iv) ChIP experiments for HU are reproducibly successful, unlike that for the mock-IP which typically provides very low concentrations of DNA not always sufficient for a sequencing reaction. Taken together, these provide a genome-wide, high-resolution, in vivo validation of prior molecular data suggesting that IHF binds DNA in a sequence-specific manner whereas HU binds more uniformly.
The strongly right-tailed distribution for IHF allowed us to identify regions of enriched signal—or binding regions—using a stringent version (Methods) of a procedure described earlier ( 4 ). Over 85% of the 1042 (1022) binding regions thus obtained for IhfA (and IhfB) overlap with those obtained using another published method BayesPeak ( 43 , 44 ). We noted a similar agreement between our method and another previously used in our lab to detect binding regions from eukaryotic ChIP-chip/seq experiments ( 45 ). In general, the signal enrichment in these IHF-bound regions is significantly higher than that for another sequence-specific, yet promiscuous, NAP: FIS ( Figure 3 A). During exponential phase, IHF-bound regions (either subunit) cover 13% of the genome, including upstream regions of 443 operons (17%). Genes identified as bound by either subunit of IHF during the two exponential-phase time-points in our study cover 68% of those identified in an earlier publication using mid-resolution ChIP-chip microarrays ( 5 ). We also recovered the known binding motif for IHF from these data ( Figure 3 B). We detected 2999 and 3162 occurrences of this motif within the binding regions of IhfA and IhfB, respectively. Of these motifs, <10% is localized to regions upstream of predicted operons. This proportion is small compared to our previous data for Fis for which over 20% of the binding regions fell upstream of operons. This is in line with the smaller number of bound operons (based on binding to upstream regions) per mega base pair of bound DNA for IHF when compared to Fis (approximately 720 operons per mega base pair of binding region for IHF, compared to approximately 1250 operons per mega base pair for Fis).
Because the binding profile from our HU ChIP-seq experiment shows a strong resemblance to that from the mock-IP with relatively weak signals ( Figure 2 A–C), we used two methods to characterize its binding. First, we obtained a HU occupancy measure for each gene in the genome, which was defined by the median of the read count distribution across the gene body. This value was then normalized by the corresponding value in the mock-IP data. This method is similar to that used to quantify nucleosome occupancies in eukaryotic studies ( 50 ). This normalized HU occupancy correlates positively with the A/T content of the bound DNA ( Figure 4 A). Second, for the exponential phase sample, we adapted a procedure used previously to investigate ChIP-chip data for nucleoporins in Drosophila ( 45 )—which also showed wide-spread but low levels of binding, to identify 1104 and 1179 regions of enriched signal for HupA and HupB, respectively, with excellent agreements between the binding regions for the two subunits (>90% of peaks in the smaller list overlap with those in the second list). In agreement with our observations using gene-based occupancy profiles, these binding regions have significantly higher A/T content than the genomic average ( Figure 4 B). However, motif identification was not reliable, as different motifs were identified as significant for HupA and HupB despite their binding regions overlapping strongly ( Figure 4 C). This suggests that slight variations in the exact positioning of the binding regions might affect motif identification. Nevertheless, the one common feature of the identified motifs is A/T richness, which is in agreement with the findings described above and with results from an earlier report of in vitro specificity of HU–DNA interactions ( 51 ). This partiality towards A/T-rich genomic regions may be in line with previous reports suggesting a preference for HU to bind to bent DNA ( 52 , 53 ). In summary, HU binds largely in a non-specific fashion to the chromosome, with a particular preference toward targeting A/T-rich regions.
Finally, comparison of binding signals obtained from each subunit of the same protein indicates a high degree of correspondence between the two ( Figure 1 B). Notably for IHF, the proportions of the genome covered by the binding regions for IhfB (identified as described below) were considerably more than that for IhfA in three of the four time-points (excepting mid-exponential phase). Binding regions for IhfB are generally longer (by 5–10% median; P < 10 −6 , Paired Wilcoxon test for all the above three time-points; Supplementary Figure S3 ) than the corresponding region for IhfA suggesting that many IhfB binding regions are extensions of IhfA binding regions. This might be in concordance with a previous report showing that IhfB homodimers are more likely to form than IhfA dimers ( 19 ), but that such dimers may not exist freely in solution ( 20 ). For both proteins, there is high correlation among the binding profiles across time-points during our batch culture ( Figure 1 B).
Effects of IHF and HU on global gene expression in E. coli
To investigate the effects of IHF and HU on gene expression in E. coli , we created single ( ΔihfA , ΔihfB , ΔhupA and ΔhupB ) and double-deletion ( ΔihfAB , ΔhupAB ) strains for the genes comprising the subunits of the two proteins ( Supplementary Figures S4 and S5 ). We then performed microarray experiments measuring transcript abundance in these strains during exponential and late-exponential, transition to stationary and stationary phases of growth and compared them to that in the time-matched wild-type cells.
Effect of IHF on E. coli gene expression
We observe differential expression of only a small number of genes in the ihf single deletions (97 for IhfA and 56 for IhfB across all four conditions). Though a significantly larger number of genes are differentially expressed in the ihfAB double deletion, the number is much smaller (477 across all four conditions) than what we previously observed ( 4 ) for other sequence-specific nucleoid proteins such as Fis (1104 genes adopting the same criteria for calling differential expression as for the IHF data) and H-NS (1987 genes). Most of these effects are seen during the two exponential phases with only approximately 50 genes being differentially expressed—compared with the wild-type—during the stationary phase. Across the conditions, almost equal numbers of genes are up- or downregulated in ΔihfAB ; however, over two-thirds of the genes that are differentially expressed during late-exponential phase are upregulated (70%). Among the genes upregulated in ΔihfAB , there is a statistical enrichment for genes involved in ‘energy production and conversion’, a property that is seen particularly in late-exponential phase; however, these do not show any strong representations of individual metabolic pathways.
There is very little overlap among the sets of genes differentially expressed across different time-points ( Supplementary Figure S4 ), despite the fact that the binding profile of IHF does not change significantly with growth phase. Further, similar to observations made earlier for Fis ( 4 , 6 ), there is very little correspondence between IHF binding and differential expression. Specific examples of IHF-bound genes that are differentially expressed in the double deletion includes the fim operon, which is strongly downregulated in the deletion strain in exponential phase. This is in agreement with prior molecular studies which have implicated IHF in both phase-switching and gene expression control at the fim operon ( 25 ).
The lack of an observable global effect of IHF on the expression of genes bound by it might be explained by combinatorial regulation, i.e. the possible role of IHF as a facilitator of binding of other transcription factors to gene-upstream regions. For example, using ChIP-seq data previously generated in our lab, we find that there is a significant overlap between the genes bound by IHF and those by Fis (35% of genes bound in all conditions by IHF are also bound by Fis; P = 2 × 10 −5 , Fisher's exact test). A previous study has shown that IHF is the second-most prolific transcription factor in terms of the number of other transcription factors with which it shares target genes ( 54 ). A striking example of this is the observed binding of IHF to a significant proportion (40%; P = 4 × 10 −5 , Fisher exact test) of genes regulated by σ 54 , whose activation by AAA + ATPase transcription factors requires IHF-dependent DNA bending ( 55 ). The effect of such binding on gene expression might be highly specific to conditions, such as nitrogen limitation, not used in this study.
Effect of HU on E. coli gene expression
In contrast to IHF, mutants deficient in HU show large changes in gene expression; across the four conditions tested here, 1490 genes are up, or downregulated in either the single or the double mutants ( Supplementary Figure S5 ). The greatest effect is seen in the double mutant in which 1266 genes are differentially expressed when compared to the wild-type; 512 genes change in expression in Δ hupA whereas only 107 genes do so in Δ hupB . Overall, a majority of differentially expressed genes are upregulated in Δ hupAB (56%; P < 0.001, compared against random assignments of up and down regulation of genes) and Δ hupA (69%; P < 0.001)—the two mutants that display global changes in gene expression. A statistically significant proportion of genes differentially expressed in Δ hupA also change in expression in Δ hupAB (43 and 54% of genes up- and downregulated in Δ hupA ; P < 10 −6 , Fisher's exact test; Figure 5 A); despite this, it must be noted that a significant component of each regulon is distinct from the other.
Genes that are upregulated in Δ hupAB show an enrichment for essentiality for growth in rich media ( P < 10 −6 for sets, Fisher's exact test; Figure 6 ); this is not true of genes differentially expressed in Δ hupA . We then analyzed the COG functional categories of genes that are differentially expressed in these mutants, and find that genes involved in translation and ribosome biogenesis are upregulated in the double mutant but not in Δ hupA (or in Δ hupB ). We also find that genes involved in motility are upregulated in both the mutants. Finally, since HU is the most conserved NAP in bacteria, we analyzed the degree to which its target genes in E. coli are conserved across prokaryotes. Genes that are upregulated in Δ hupAB tend to be highly conserved, whereas the same is not true of genes that are downregulated by Δ hupAB or those that change in expression in Δ hupA . The high degree of conservation observed for Δ hupAB targets is not merely due to the aforementioned enrichment of genes involved in translation. It had previously been observed that genes that are differentially expressed in Δ hupAB tend to be bound by DNA gyrase ( 32 ), and are supercoiling-sensitive; in our data, this trend is relatively weak in Δ hupAB , though statistically significant ( P < 10 −6 ; Mann–Whitney test), but absent in Δ hupA .
It has been shown previously that deletion of HU leads to an increase in the accessibility of DNA to the DNA relaxing activity of topoisomerase I ( 32 , 56 ). This is, at first glance, at odds with the observation that genes, which are bound by the opposing DNA gyrase tend to be upregulated in the Δ hupAB mutant in the present work and in an earlier work by Muskhelishvili's group ( 32 ). The authors of the above paper showed that there is little change in the unconstrained supercoiling levels in the double mutant ( 32 ). They further hypothesized that the increased accessibility of topoisomerase I to the DNA in the HU double mutant might be compensated by higher local negative supercoiling introduced at the upregulated loci by greater DNA gyrase binding and higher levels of transcription. To test this hypothesis we classified all genes into four groups based on gyrase binding ( 40 ) and mid-exponential phase gene expression levels as measured using RNA-seq experiments in wild-type cells ( 4 ): (i) H E H G : high expression, high gyrase binding (high defined by the top third of the distribution); (ii) H E L G : high expression, low gyrase binding (low defined by the bottom third of the distribution); (iii) L E H G : low expression, high gyrase binding; (iv) L E L G : low expression, low gyrase binding. Though only ∼27% of all classifiable genes belong to H E H G , ∼54% of genes upregulated in Δ hupAB have high gyrase binding and high gene expression in wild-type cells ( P < 10 −6 ; Fisher's exact test; Supplementary Figure S6 ). This might indicate a possible role for increased local negative supercoiling, introduced by a combination of high transcription and DNA gyrase binding in determining upregulation in the Δ hupAB mutant.
Analysis of expression patterns across different phases of growth reveals complex trends ( Figure 5 B). There is a progressive decrease in the number of genes that are differentially expressed in Δ hupA as the culture progresses through batch growth; however, there is only a slight overlap between the lists of differentially expressed genes in different time points. On the other hand, in Δ hupAB , many genes change in expression during late exponential and stationary phases, though significant effects could be seen during the other two time-points; again each phase of growth sees a largely distinct set of genes being differentially regulated. These are described below.
During exponential phase, similar numbers of genes are differentially expressed in Δ hupA and Δ hupAB , with a small though statistically significant overlap between them. Essential, conserved genes and those involved in translation and ribosome biogenesis are over-represented among genes upregulated in Δ hupAB but not in Δ hupA . Similar enrichments are seen in the larger set of genes that is upregulated in Δ hupAB during late-exponential phase; however, in contrast to the earlier time-point, almost all genes that are up-regulated in Δ hupA also do so in Δ hupAB . During the two exponential phase time-points, the number of genes upregulated in Δ hupAB (67%) overwhelms those that are downregulated. Only a few genes change in expression in Δ hupB during this period. Though relatively few genes are differentially expressed during the transition to stationary phase, we note that there is a striking upregulation of various flagellar genes, involved in motility, in all three mutants at this time; we have validated several of these using RT–PCR ( Supplementary Table S6 ). This is at odds with previous observations of a HU mutant that is non-motile in E. coli K12 W3110 ( 57 ) because of reduced transcription of the flagellin gene. Swimming motility assays performed by us resulted in smaller swarm diameters for the various hup mutants, the double mutant in particular; however, all mutants were motile ( Supplementary Figure S7 ). Finally, during stationary phase, only Δ hupAB shows global changes in gene expression. Unlike in the earlier time-points, only a slight majority (55%) of genes are upregulated, in which there is a statistical over-representation of translation-associated genes ( P < 10 −6 , Fisher's exact test followed by multiple correction by FDR); there is no functional enrichment detectable among downregulated genes.
In summary, deletion of both hupA and hupB has significantly greater impact on gene expression than that of either gene alone, with Δ hupA displaying greater gene expression changes than Δ hupB . Further, genes that are upregulated in Δ hupAB but not in Δ hupA tend to be statistically enriched in essential cell processes such as translation, and are more conserved in prokaryotes than expected by random chance. These results may be consistent with our observation that during growth, cell densities are lower (∼25% less than the wild-type) in the double deletion than in the wild-type or the single deletions; this possibly arises from a longer lag phase observed in the double deletion, although the growth rate of Δ hupAB during exponential phase does not seem to be different from that of the wild-type [ Supplementary Figure S8 ; also reported by ( 32 )].
Investigations of potential IHF and HU homodimers binding to the chromosome
Following from our observations of largely incongruent effects of single and double mutants of HU and IHF on E. coli gene expression, we performed ChIP-Seq experiments on each subunit of the two proteins, in strains carrying deletions of the second subunit. For IHF, our experiments did not yield enough DNA to perform sequencing reactions; this is suggestive of very weak or no homodimer binding to the DNA, at least in the absence of the second subunit. For HU, we were able to obtain binding profiles for each subunit in the absence of the other, which were strikingly similar to those in the wild-type ( Supplementary Figure S9 ). This indicates that the homodimers bind to the chromosome in similar patterns as the heterodimer. This may be reflected in the fact that most bacterial genomes encode only one HU subunit, and is supportive of the fact that more genes change in expression in the double deletion than in the single mutants.
It has previously been suggested that HupA2 is the predominant form of HU in exponential phase, whereas the heterodimer takes over during later stages of growth ( 17 ). Western blots presented here show higher expression of HupA than HupB during exponential phase ( Supplementary Figure S10 ). Gene expression data described above show greater gene expression changes in Δ hupA than in Δ hupB , especially during exponential phase. Our ChIP-seq data, however, show similar binding profiles for both subunits of HU across all stages of growth; results reported in this section further suggest that the binding profile of HupB is similar between the wild-type and ΔhupA mutant. Though it is possible that there is a uniform reduction in binding signals for HupB in ΔhupA , which might account for gene expression changes observed in ΔhupA during exponential and late-exponential phases, there is little reorganization of HupB's binding profile.
IHF and HU are two nucleoid-associated proteins that belong to the same DNA binding protein family, but show distinct levels of sequence specificities. IHF, a sequence-specific DNA binding protein, has extreme effects on the topology of bound DNA, which it bends by ∼160° ( 58 ). HU, the most conserved NAP, binds more uniformly to the E. coli chromosome, with a preference for distorted DNA structures ( 13–16 , 52 , 53 ). Both proteins exist as heterodimers in E. coli . In this article, we report results from our genome-scale studies of the binding of IHF and HU to the chromosome of E. coli K12 MG1655, and its effects on gene expression at various time-points of batch culture, from growth to stasis.
IHF displays sequence-specific binding to the E. coli chromosome, with signal intensities significantly stronger than those observed for Fis, another sequence-specific NAP. The two subunits of IHF show similar binding profiles, indicative of preferential heterodimer formation. In the wild-type strain, IhfB binding regions cover more of the chromosome than those of IhfA (in three of the four conditions); this might be in line with a prior observation that IhfB homodimers form more readily than IhfA homodimers ( 19 ). However, our inability to recover enough DNA from ChIP experiments for IhfB in ΔihfA suggests that such homodimer formation may occur on the DNA ( 20 ), only in the presence of a nucleating heterodimer complex.
Across the four conditions tested, IHF binding regions target the upstream regions of over 30% of all predicted operons, indicative of a global role for the protein in regulating gene expression. However, only ∼10% of these change in expression when the genes coding for the two subunits of IHF are deleted. Additionally, compared to the effects of other sequence-specific NAPs such as Fis and H-NS ( 4 ), the overall effect of IHF on gene expression under the present conditions is less in terms of the number of genes that are differentially expressed in the deletion mutant(s). This might be linked to our observation that, in contrast to Fis where over 20% of binding motifs lie upstream of operons ( 4 ), only ∼10% of predicted IHF binding motifs are so positioned. We suggest that the minimal proximal effect of IHF on gene expression might be due to combinatorial regulation, i.e. the tendency of IHF to regulate genes jointly with other factors. Another possibility is that HU might compensate for the absence of IHF; this has been demonstrated for excisive recombination at specific sites ( 59 ), but remains to be investigated on a genomic scale in the context of transcription. In this context, it must be noted that IHF has important functions outside of transcriptional regulation such as recombination ( 60 ), which are not apparent in our transcriptome experiment; in fact the large majority of binding sites which are located in non-intergenic regions might have such functions.
In agreement with current knowledge, the binding profile for HU is reflective of relatively non-specific binding to the chromosome, however with a notable preference for A/T-rich DNA in concordance with previous in vitro studies ( 51 ). It has been shown previously that the composition of the HU dimer varies across the various phases of growth of E. coli with HupA2 being the dominant form during exponential phase and the heterodimeric form dominating during later stages of growth ( 17 , 61 ). Our western blots do indicate higher levels of expression of HupA than HupB during exponential and late-exponential growth phases. But, the binding profile of each subunit of HU strongly correlates with that of the other subunit across the growth phases, including exponential growth; this might indicate that homo- and heterodimeric forms of HU could bind to the DNA interchangeably. Further, the binding profile of each subunit in the wild-type is similar to that in an otherwise isogenic strain that is lacking the other subunit. It is possible that the binding of HupB to the chromosome is uniformly less (across the genome) in ΔhupA than in the wild-type, thus accounting for gene expression changes seen in ΔhupA during exponential and late-exponential phases of growth. This interpretation may be in line with previous reports showing in vitro that HupB2 binds poorly to duplex DNA ( 18 ); however there is enough binding for us to recover in our ChIP experiments. However our data do not suggest any large-scale reorganization of the HupB binding profile following hupA deletion.
In apparent conflict with the consistency that the two subunits show in their binding, they have substantially different effects on gene expression. Briefly, we observe that Δ hupAB has the greatest effects on gene expression, distantly followed by Δ hupA , with minimal effects seen in Δ hupB . Previous studies in E. coli C600 and Salmonella enterica have also observed discordance between the sets of genes differentially expressed in single and double deletions of HU subunits ( 11 , 33 ). The authors of the paper on E. coli C600 identified few genes as members of HupB2, interpreting this as a possible consequence of previously observed instability and low expression level of this form of the protein at 37°C, and the inability of HupB2 to introduce negative supercoiling on relaxed DNA in the presence of topoisomerase I ( 11 ). A similar observation—Δ hupB showing significantly smaller changes in gene expression than Δ hupA and Δ hupAB —was made in two of the three growth phase time-points tested in S. enterica; however, the extent of differential expression was similar in Δ hupAB and Δ hupA across all time-points ( 33 ).
The aforementioned study on E. coli C600 showed that the HU regulon is composed of genes involved in energy metabolism, SOS response, and osmolarity and acid stress responses ( 31 ). In contrast to the conclusions of a later study ( 32 ), which investigated the transcriptome of only the double mutant, these authors did not find any supercoiling dependence in the expression of the members of the HU regulon. Here we observe that genes that are upregulated in Δ hupAB are statistically enriched for essential cellular functions such as translation and show statistically higher binding to DNA gyrase than other genes. Our analysis also agrees with a previous hypothesis that local negative supercoiling introduced by DNA gyrase and high transcription might compensate for the increased accessibility of DNA to topoisomerase I in the HU double mutant ( 32 ). Despite the fact that a significant proportion of genes that are upregulated in Δ hupA also change in expression in Δ hupAB , the above functional enrichments are not observed in Δ hupA . Similarly, and in line with the fact that HU is the most conserved NAP in bacteria, genes upregulated in Δ hupAB are more conserved across bacteria than other genes. This may also be reflected in the fact that the growth curve of Δ hupAB , but not that of Δ hupA or Δ hupB , differs from that of the wild-type. Moreover, many bacterial genomes encode only one subunit of HU; the fact that a second subunit is encoded in E. coli might in part build in some redundancy to this conserved regulatory system. However, it is remarkable that the subunit of HU that is more conserved across bacteria is HupB, which appears to be the minor player in gene expression control at least in E. coli and S. enterica both of which encode both subunits of this protein.
All sequence data have been deposited at NCBI SRA (Study accession SRP008538). All microarray data have been deposited at ArrayExpress (accession number E-MEXP 3461).
Supplementary Data are available at NAR Online: Supplementary Tables 1–7, Supplementary Figures 1–10, Supplementary Data set.
Girton College, University of Cambridge; Ramanujan Fellowship, Department of Science and Technology, Government of India SR/S2/RJN-49/2010 (to A.S.N.S.); Spanish Ministry of Science and Innovation (to A.I.P.); Biotechnology and Biological Sciences Research Council (BBSRC) grant ‘Genomic Analysis of Regulatory Networks for Bacterial Differentiation and Multicellular Behaviour’ BB/E011489/1 (to G.M.F.) and BB/E01075X/1 (to N.M.L.); Isaac Newton Trust (to G.M.F.); European Molecular Biology Laboratory (EMBL) (to N.M.L.). Funding for open access charge: European Molecular Biology Laboratory.
Conflict of interest statement . None declared.
The authors thank Vladimir Benes, David Ibberson and Sabine Schmidt at the Genomics Core Facility, EMBL-Heidelberg for the sequencing and microarray experiments. They thank the three anonymous referees for their critical comments and suggestions.