-
PDF
- Split View
-
Views
-
Cite
Cite
Eng-Guan Chua, Michael J. Wise, Yalda Khosravi, Shih-Wee Seow, Arlaine A. Amoyo, Sven Pettersson, Fanny Peters, Chin-Yen Tay, Timothy T. Perkins, Mun-Fai Loke, Barry J. Marshall, Jamuna Vadivelu, Quantum changes in Helicobacter pylori gene expression accompany host-adaptation, DNA Research, Volume 24, Issue 1, February 2017, Pages 37–49, https://doi.org/10.1093/dnares/dsw046
- Share Icon Share
Abstract
Helicobacter pylori is a highly successful gastric pathogen. High genomic plasticity allows its adaptation to changing host environments. Complete genomes of H. pylori clinical isolate UM032 and its mice-adapted serial derivatives 298 and 299, generated using both PacBio RS and Illumina MiSeq sequencing technologies, were compared to identify novel elements responsible for host-adaptation. The acquisition of a jhp0562-like allele, which encodes for a galactosyltransferase, was identified in the mice-adapted strains. Our analysis implies a new β-1,4-galactosyltransferase role for this enzyme, essential for Ley antigen expression. Intragenomic recombination between babA and babB genes was also observed. Further, we expanded on the list of candidate genes whose expression patterns have been mediated by upstream homopolymer-length alterations to facilitate host adaption. Importantly, greater than four-fold reduction of mRNA levels was demonstrated in five genes. Among the down-regulated genes, three encode for outer membrane proteins, including BabA, BabB and HopD. As expected, a substantial reduction in BabA protein abundance was detected in mice-adapted strains 298 and 299 via Western analysis. Our results suggest that the expression of Ley antigen and reduced outer membrane protein expressions may facilitate H. pylori colonisation of mouse gastric epithelium.
1. Introduction
Helicobacter pylori infects approximately half the world’s population. It is a Gram-negative microaerophilic pathogen that persistently colonises the human stomach. The bacterium is associated with chronic gastritis and peptic ulceration, and less commonly, with gastric adenocarcinoma and gastric mucosa-associated lymphoid tissue (MALT) lymphoma.1,2 Hence, the organism was classified as a class 1 carcinogen by World Health Organisation in 1994.3
Population analysis of H. pylori isolates from different geographical origins has revealed a high degree of genetic diversity within the species. Multi-locus sequence typing (MLST) analysis has shown that identical alleles are extremely rare in H. pylori even when sampling a large population of the same geographical origin.4 This indicates that H. pylori is able to adapt to a new or rapidly changing environment by undergoing continuous genetic modification. Greater understanding of these modifications and the mechanisms by which they arise will improve the knowledge base from which we may develop new therapeutic targets.
The high genetic variation in H. pylori appears to be generated through intragenic and intergenic recombinations, plus extensive mutations attributed to the lack of a nucleotide mismatch repair system, such as that provided in other bacteria by MutS, MutH and MutL.5 There are two families of MutS homologues: MutS1 and MutS2. The MutS1 family includes Escherichia coli MutS, which plays a role in post-replication mismatch repair. In the H. pylori genome, HP0621, a member of the MutS2 family has been identified. However, rather than participating in mismatch repairing, it suppresses homeologous recombination mediated by RecA.6,7 Compared to E. coli, H. pylori is also missing a number of elements responsible for direct repair of DNA damage such as Ada methyltransferase, AlkB oxidative demethylase and Phr/Spl photolyase, as well as several gylcosylases and endonucleases involved in base excision repair including MutM, Tag, AlkA, Mpg, YgjF, Nei and Nfo.8 On the other hand, the presence of a DNA polymerase I that lacks proof-reading activity also contributes to the generation of genomic plasticity in H. pylori.9
Rapid phenotype switching is another strategy that enables pathogens to survive in diverse hostile environments and to establish persistent infections in novel hosts. Phenotypic switching can be achieved by epigenetic events, such as DNA methylation and gene silencing mediated by non-coding RNAs. It can also occur via phase-variation-associated genetic modifications, such as site-specific recombination, inversion of promoter elements, insertional inactivation mediated by insertion sequences and slipped-strand mispairing of simple sequence repeats (SSRs) including homopolymeric tracts.10,11
Changes in homopolymeric tract length located in coding sequences are known to generate frameshift errors and hence result in expression of truncated and often non-functional proteins.12 While intragenic homopolymeric tract alterations turn protein expression on or off, intergenic variations may alter mRNA level to fine-tune protein expression. This has recently been shown in the adhesin-encoding gene sabA of H. pylori. Altering the length of a poly(T) tract found adjacent to its -35 promoter region changes the mRNA transcript and protein expression levels, as well as the binding to the sialyl Lex receptor.13
To date, relatively little is known about H. pylori genetic adaptation in response to selective pressures from its host. It is important to identify these adaptive mutations to provide deeper insights into the exact mechanisms underlying H. pylori infection. Although H. pylori is highly adapted to the human gastric mucosa environment, several animal models comprising mice, gerbils and non-human primates have been used to advance understanding of H. pylori colonisation and pathogenesis.14–16 Whilst experimental studies in humans would be most likely to reveal clinically relevant details, studies in animal models are ethically superior to human challenge experiments, and can still tell us much about the genomic changes associated with host adaptation.
Here, we report a follow-up investigation to our previously announced complete genomes of the H. pylori clinical strain UM032 and its mice-adapted derivatives designated 298 and 299. These were sequenced on the Pacific Biosciences RS sequencing technology using the C2 chemistry.17 PacBio RS sequencing technology produces extraordinarily long reads, thus increasing the likelihood of acquiring a complete genome sequence. As opposed to the current C5 chemistry, high error rates were observed in sequencing data generated using the earlier C2 chemistry.18 Therefore, for this study all three strains were also subjected to whole-genome sequencing using Illumina’s MiSeq platform. Hence, the comparative data presented here provide comprehensive information about the changes associated with host change in H. pylori. We reveal H. pylori genetic elements potentially required for host-adaptation and epigenetic regulation of gene expression mediated by homopolymeric tracts present both in the upstream regions of coding sequences, within or in close proximity to the promoter elements, and within the coding sequences.
2. Methods
2.1. Bacterial strains
The acquisition of H. pylori clinical strain UM032 and its mice-adapted derivatives, designated 298 and 299 respectively, was as previously described.17
2.2. Illumina library preparation and sequencing
Preparation of MiSeq library was performed using Illumina Nextera XT DNA sample preparation kit (Illumina, San Diego, CA, USA) as previously described with minor modifications.19 In brief, 1 ng of genomic DNA was fragmented in 5 µl of Amplicon Tagment Mix and 10 µl of Tagment DNA buffer. Tagmentation reaction was performed by incubation at 55 °C for 5 min followed by neutralisation with 5 µl of Neutralise Tagment Buffer for 5 min. Tagmented DNA (25 µl) was indexed in a 50 µl limited-cycle PCR (12 cycles) as outlined in the Nextera XT protocol and subsequently purified using 25 µl of AMPure XP beads (Beckman Coulter Inc, Australia). The fragment size distribution of the purified DNA was analysed utilising a 2100 Bioanalyser with a High Sensitivity DNA assay kit (Agilent Technologies, Santa Clara, CA). DNA libraries were adjusted to 2 nM, pooled in equal volumes and then denatured with 0.2 N NaOH according to the Nextera protocol. The libraries were sequenced using 2 × 300 paired-end protocols on an Illumina MiSeq instrument (MiSeq Reagent Kit v3 for 600 cycles).
2.3. Genome correction
The generated MiSeq data were used for correction of homopolynucleotide errors present in the previously published H. pylori UM032, 298 and 299 complete genomes generated using PacBio C2 chemistry. In brief, MiSeq raw reads were mapped against the complete genome using the Geneious R7 read mapper with the medium sensitivity option.20 Variants were subsequently called using the following parameters: minimum coverage = 10 and minimum variant frequency = 0.7. Reported errors in each reference genome sequence were corrected.
2.4. Genome annotation
Initially, the genome sequences were automatically annotated using the RAST (Rapid Annotation using Subsystem Technology) pipeline.21 In addition to Glimmer (version 3.02) used in the pipeline, additional ORF prediction was performed using GeneMarkS (version 4.7a) and Prodigal (version 2.60).22–24 By majority ruling, the predicted start codons of ORFs predicted from multiple annotation engines were manually curated to enhance start codon accuracy. Genome annotations were further compared against gene sequences from 45 completely sequenced publicly available H. pylori genomes to allow identification of pseudogenes. In our context, a pseudogene can be a non-functional DNA sequence with a frameshift or premature stop codon, a sequence homologous to a protein-coding gene but without ORF or a full-length reference gene broken into two or more adjacent ORFs. Our manually curated genome annotations are available in Supplementary Tables S1–S3.
NCBI accession numbers of 45 public genomes used for genome annotation comparison are as follows: 35A (NC_017360), 51 (NC_017382), 52 (NC_017354), 83 (NC_017375), 908 (NC_017357), 2017 (NC_017374), 2018 (NC_017381), 26695 (NC_000915), Aklavik86 (NC_019563), Aklavik117 (NC_019560), B8 (NC_014256), B38 (NC_012973), BM012A (NC_022886), BM012S (NC_022911), Cuz20 (NC_017358), ELS37 (NC_017063), F16 (NC_017368), F30 (NC_017365), F32 (NC_017366), F57 (NC_017367), G27 (NC_011333), Gambia94/24 (NC_017371), HPAG1 (NC_008086), HUP-B14 (NC_017733), India7 (NC_017372), J99 (NC_000921), Lithuania75 (NC_017362), OK113 (NC_020508), OK310 (NC_020509), P12 (NC_011498), PeCan4 (NC_014555), PeCan18 (NC_017742), Puno120 (NC_017378), Puno135 (NC_017379), Sat464 (NC_017359), Shi112 (NC_017741), Shi169 (NC_017740), Shi417 (NC_017739), Shi470 (NC_010698), SJM180 (NC_014560), SNT49 (NC_017376), SouthAfrica7 (NC_017361), SouthAfrica20 (NC_022130), v225d (NC_017355) and XZ274 (NC_017926).
2.5. In silico genome analysis
The revised genome sequences of H. pylori UM032, 298 and 299 were compared pairwise using progressiveMauve (version 2.3.1), followed by variants calling using Geneious R7.25
2.6. COG functional analysis
COG of H. pylori UM032 genes with amino acid substitutions detected in the mice-adapted strains 298 and 299 was annotated by rpsblast.26
2.7. Western immunodetection
Whole cell lysates were prepared from bacterial cells standardised to OD600 of 10. For each blot, 10 µl of sample was used for SDS-PAGE, followed by transfer to an Immobilon®-P polyvinylidene difluoride membrane (Merck Millipore). Blood group antigen-binding adhesin A (BabA) detected with polyclonal antibody, with detection by horseradish peroxidase (HRP)-conjugated goat anti-rabbit IgG antibodies (Santa Cruz Biotechnology) and visualization by chemiluminescent signal using Clarity™ Western ECL substrate (Bio-Rad) on a Fujifilm LAS-3000 imaging system. H. pylori UreB was detected using rabbit anti-urease B polyclonal antibodies purchased from Sigma Aldrich. H. pylori Le antigen phenotypes were determined using monoclonal antibodies (Abcam) to Lea, Leb, Lex, or Ley. Bound immunoglobulin M (IgM) or IgG antibodies were detected with HRP-conjugated goat anti-mouse antibodies (Sigma Aldrich) against IgM or IgG.
2.8. Quantitative real-time polymerase chain reaction (qRT-PCR)
Bacterial total RNA was isolated using NucleoSpin® RNA (Macherey-Nagel). After RNA quantification, cDNA was synthesized using 1 µg of RNA in a 20 µl reverse transcription reaction using QuantiTect Reverse Transcription Kit (Qiagen) according to manufacturer’s instructions. Real time PCR analysis was subsequently performed in 20 µl reactions in 96-well plates using cDNA from 20 ng RNA as template, LuminoCt® SYBR® Green qPCR ReadyMixTM (Sigma Aldrich) and the LightCycler 480 instrument (Roche Applied Science). Cycling conditions comprised an initial denaturation at 95 °C for 20 s, followed by 40 cycles of 95 °C for 3 s, 56 °C for 15 s and 60 °C for 30 s. Following each run a melting curve analysis was performed in which the reactions were heated slowly from 55 °C to 95 °C (0.1 °C/s). The relative expression of each target gene was calculated using the 2−ΔΔCT method as described by Livak and Schmittgen27, after normalisation against the gyrA reference gene. Primers used are listed in Supplementary Table S4.
2.9. Statistical analysis
Paired Student’s t-test was used to test for differences in relative quantification of gene expression. P-values of less than 0.05 were considered statistically significant.
2.10. Genome sequence update
The updated complete genomes of UM032, 298 and 299 are available in GenBank under the original accession numbers: NC_021215.3 (GI:685455742), NC_021882.2 (GI:687961717) and NC_021216.3 (GI:685456130).
3. Results and discussion
3.1. Global pairwise alignment of previously published UM032, 298 and 299 genomes revealed sequence duplications and homopolymeric tract errors
3.1.1. Duplicate sequences

Whole-genome alignment of H. pylori strains UM032, 298 and 299 using Mauve indicates no genome shuffling. Insertions are visualised as gaps in the alignment.

Duplicate sequences in UM032, 298 and 299 complete genomes. The ' character indicates the erroneous overlaps at each end of the original published contig.
3.1.2. Homopolymeric tracts
By using the NCBI Prokaryotic Genome Automatic Annotation Pipeline (PGAAP), a total of 167 pseudogenes have previously been annotated in UM032 genome compared to 29 in 298 and 25 in 299.17 We performed BLASTN pairwise alignment of the pseudogenes in UM032 against their counterparts in 298 and 299, of which 146 were shown to have deletions associated with homopolymeric sequences in 298 and 299 (Supplementary Table S5), resulting in restoration of correct reading frame and thus the functional coding sequence (CDS). The presence of homopolymeric tracts in H. pylori is known to play an essential role in phase variation through slipped-strand mispairing, leading to differential antigenic expression that facilitates host-adaptation and persistent colonisation.13 However, it is intriguing to observe such overwhelming mutational bias towards deletion in both 298 and 299 genomes, implying that there is potential sequencing and/or de novo assembly error in the Hierarchical Genome Assembly Process (HGAP) workflow (http://pacbiodevnet.com/) available in Single-Molecule Real-Time (SMRT) Analysis v2.0. Sanger sequencing was performed to verify the length of the homopolymeric tract in which variation occurs in 7 randomly chosen UM032 pseudogenes and their functional counterparts. Homopolymer insertion errors detected in selected UM032 pseudogenes could be due to sequencing errors in the original genomes.
3.2. Whole genome resequencing using MiSeq
To remove homopolymer-length sequencing errors and duplications in the original sequences, all three genomes were resequenced using the MiSeq platform with at least 100× sequencing depth. The reads generated were subsequently mapped to the corresponding reference genome as described in Materials and Methods.
3.2.1. Below mean read depth across the duplicate sequences
The analysis of read depth can be a useful approach to check for duplications. It is expected that the read depth at any location is equal to the global average coverage, given that the sequencing reads are randomly distributed across a genome. An average read depth of 135x, 122x and 174x, respectively, was determined for UM032, 298 and 299. The erroneously duplicated sequences, however, had substantially lower read depth than the global read depth and compared to that of the closely resembled upstream and downstream sequences, as summarised in Table 1, indicating they were misassembled and incorrectly present in two copies. The false duplication in each genome was therefore removed. This was followed by remapping of the sequencing reads to determine and compare the reads ratio (regional read depth divided by global read depth) of the previously duplicated sequences and their counterparts across all three genomes. These reads ratios are expected to be comparable among each other. As anticipated, the removal of duplication restored the reads ratio of these now single-copy sequences, as shown in Table 2. It is also now clear that what seemed to be the duplicated sequences are in fact near identical overlaps found at each end of the contig, which failed to be removed during sequence assembly to generate accurate circular consensus DNA sequence.
Global read depth . | Duplication sequence . | Correct sequence . | ||||
---|---|---|---|---|---|---|
Fragment* . | Size (bp) . | Read depth . | Fragment* . | Size (bp) . | Read depth* . | |
UM032 | ||||||
135x | A’ | 4,512 | 60 ± 38.4 | A | 4,500 | 103.5 ± 41.9 |
B’ | 1,147 | 7.7 ± 12.3 | B | 1,139 | 178.5 ± 58.6 | |
298 | ||||||
122x | C’ | 6,774 | 28.9 ± 29.1 | C | 6,744 | 73.3 ± 35.8 |
D’ | 2,893 | 16.1 ± 17.8 | D | 2,879 | 120.6 ± 45.6 | |
299 | ||||||
174x | E’ | 5,200 | 74.3 ± 28.4 | E | 5,200 | 82.9 ± 35.5 |
F’ | 1,378 | 32.7 ± 22.6 | F | 1,378 | 81.1 ± 22.7 |
Global read depth . | Duplication sequence . | Correct sequence . | ||||
---|---|---|---|---|---|---|
Fragment* . | Size (bp) . | Read depth . | Fragment* . | Size (bp) . | Read depth* . | |
UM032 | ||||||
135x | A’ | 4,512 | 60 ± 38.4 | A | 4,500 | 103.5 ± 41.9 |
B’ | 1,147 | 7.7 ± 12.3 | B | 1,139 | 178.5 ± 58.6 | |
298 | ||||||
122x | C’ | 6,774 | 28.9 ± 29.1 | C | 6,744 | 73.3 ± 35.8 |
D’ | 2,893 | 16.1 ± 17.8 | D | 2,879 | 120.6 ± 45.6 | |
299 | ||||||
174x | E’ | 5,200 | 74.3 ± 28.4 | E | 5,200 | 82.9 ± 35.5 |
F’ | 1,378 | 32.7 ± 22.6 | F | 1,378 | 81.1 ± 22.7 |
Please refer to Figure 2 for the position of each sequence fragment described in this table.
Global read depth . | Duplication sequence . | Correct sequence . | ||||
---|---|---|---|---|---|---|
Fragment* . | Size (bp) . | Read depth . | Fragment* . | Size (bp) . | Read depth* . | |
UM032 | ||||||
135x | A’ | 4,512 | 60 ± 38.4 | A | 4,500 | 103.5 ± 41.9 |
B’ | 1,147 | 7.7 ± 12.3 | B | 1,139 | 178.5 ± 58.6 | |
298 | ||||||
122x | C’ | 6,774 | 28.9 ± 29.1 | C | 6,744 | 73.3 ± 35.8 |
D’ | 2,893 | 16.1 ± 17.8 | D | 2,879 | 120.6 ± 45.6 | |
299 | ||||||
174x | E’ | 5,200 | 74.3 ± 28.4 | E | 5,200 | 82.9 ± 35.5 |
F’ | 1,378 | 32.7 ± 22.6 | F | 1,378 | 81.1 ± 22.7 |
Global read depth . | Duplication sequence . | Correct sequence . | ||||
---|---|---|---|---|---|---|
Fragment* . | Size (bp) . | Read depth . | Fragment* . | Size (bp) . | Read depth* . | |
UM032 | ||||||
135x | A’ | 4,512 | 60 ± 38.4 | A | 4,500 | 103.5 ± 41.9 |
B’ | 1,147 | 7.7 ± 12.3 | B | 1,139 | 178.5 ± 58.6 | |
298 | ||||||
122x | C’ | 6,774 | 28.9 ± 29.1 | C | 6,744 | 73.3 ± 35.8 |
D’ | 2,893 | 16.1 ± 17.8 | D | 2,879 | 120.6 ± 45.6 | |
299 | ||||||
174x | E’ | 5,200 | 74.3 ± 28.4 | E | 5,200 | 82.9 ± 35.5 |
F’ | 1,378 | 32.7 ± 22.6 | F | 1,378 | 81.1 ± 22.7 |
Please refer to Figure 2 for the position of each sequence fragment described in this table.
Read depth analysis of correct sequences against counterpart sequences following removal of duplications
Strain (global read depth) . | Read depth | read ratio (read depth/global read depth) . | ||
---|---|---|---|
AB fragments . | CD fragments . | EF fragments . | |
UM032 (135x) | 168x | 1.24 | 121x | 0.90 | 113x | 0.84 |
298 (122x) | 146x | 1.20 | 112x | 0.92 | 102x | 0.84 |
299 (174x) | 217x | 1.25 | 157x | 0.90 | 148x | 0.85 |
Strain (global read depth) . | Read depth | read ratio (read depth/global read depth) . | ||
---|---|---|---|
AB fragments . | CD fragments . | EF fragments . | |
UM032 (135x) | 168x | 1.24 | 121x | 0.90 | 113x | 0.84 |
298 (122x) | 146x | 1.20 | 112x | 0.92 | 102x | 0.84 |
299 (174x) | 217x | 1.25 | 157x | 0.90 | 148x | 0.85 |
Read depth analysis of correct sequences against counterpart sequences following removal of duplications
Strain (global read depth) . | Read depth | read ratio (read depth/global read depth) . | ||
---|---|---|---|
AB fragments . | CD fragments . | EF fragments . | |
UM032 (135x) | 168x | 1.24 | 121x | 0.90 | 113x | 0.84 |
298 (122x) | 146x | 1.20 | 112x | 0.92 | 102x | 0.84 |
299 (174x) | 217x | 1.25 | 157x | 0.90 | 148x | 0.85 |
Strain (global read depth) . | Read depth | read ratio (read depth/global read depth) . | ||
---|---|---|---|
AB fragments . | CD fragments . | EF fragments . | |
UM032 (135x) | 168x | 1.24 | 121x | 0.90 | 113x | 0.84 |
298 (122x) | 146x | 1.20 | 112x | 0.92 | 102x | 0.84 |
299 (174x) | 217x | 1.25 | 157x | 0.90 | 148x | 0.85 |
3.2.2. Variants calling in duplication-free reference genomes
As indicated in Supplementary Table S6, 245 variants were detected in UM032, of which 243 were deletions found within homopolymeric runs. Surprisingly, there were significantly less errors in both 298 and 299 genomes, only seven and three variants being detected, respectively. Nonetheless, the reported errors, including all additional nucleotides found in the homopolymeric tracts, were corrected and the genomes were reannotated as stated in Methods. Table 3 summarises the general features of all three genomes, prior to and after the sequence error correction and reannotation. It is important to highlight that a number of false pseudogene candidates have been efficiently removed, especially in UM032, in which the number has been significantly reduced from 167 to 42, improving the reliability of comparative analysis.
General features of pre- and post-correction H. pylori UM032, 298 and 299 genomes
. | H. pylori . | |||||
---|---|---|---|---|---|---|
. | UM032 . | 298 . | 299 . | |||
Features . | Pre . | Post . | Pre . | Post . | Pre . | Post . |
Size (bp) | 1,599,441 | 1,593,537 | 1,604,216 | 1,594,544 | 1,601,149 | 1,594,569 |
GC content (%) | 38.8 | 38.8 | 38.8 | 38.8 | 38.8 | 38.8 |
CDSs | 1,415 | 1,458 | 1,553 | 1,456 | 1,576 | 1,457 |
Genes | 1,624 | 1,543 | 1,624 | 1,544 | 1,644 | 1,544 |
Pseudogenes | 167 | 43 | 29 | 46 | 25 | 45 |
rRNAs | 6 | 6 | 6 | 6 | 6 | 6 |
tRNAs | 36 | 36 | 36 | 36 | 37 | 36 |
. | H. pylori . | |||||
---|---|---|---|---|---|---|
. | UM032 . | 298 . | 299 . | |||
Features . | Pre . | Post . | Pre . | Post . | Pre . | Post . |
Size (bp) | 1,599,441 | 1,593,537 | 1,604,216 | 1,594,544 | 1,601,149 | 1,594,569 |
GC content (%) | 38.8 | 38.8 | 38.8 | 38.8 | 38.8 | 38.8 |
CDSs | 1,415 | 1,458 | 1,553 | 1,456 | 1,576 | 1,457 |
Genes | 1,624 | 1,543 | 1,624 | 1,544 | 1,644 | 1,544 |
Pseudogenes | 167 | 43 | 29 | 46 | 25 | 45 |
rRNAs | 6 | 6 | 6 | 6 | 6 | 6 |
tRNAs | 36 | 36 | 36 | 36 | 37 | 36 |
General features of pre- and post-correction H. pylori UM032, 298 and 299 genomes
. | H. pylori . | |||||
---|---|---|---|---|---|---|
. | UM032 . | 298 . | 299 . | |||
Features . | Pre . | Post . | Pre . | Post . | Pre . | Post . |
Size (bp) | 1,599,441 | 1,593,537 | 1,604,216 | 1,594,544 | 1,601,149 | 1,594,569 |
GC content (%) | 38.8 | 38.8 | 38.8 | 38.8 | 38.8 | 38.8 |
CDSs | 1,415 | 1,458 | 1,553 | 1,456 | 1,576 | 1,457 |
Genes | 1,624 | 1,543 | 1,624 | 1,544 | 1,644 | 1,544 |
Pseudogenes | 167 | 43 | 29 | 46 | 25 | 45 |
rRNAs | 6 | 6 | 6 | 6 | 6 | 6 |
tRNAs | 36 | 36 | 36 | 36 | 37 | 36 |
. | H. pylori . | |||||
---|---|---|---|---|---|---|
. | UM032 . | 298 . | 299 . | |||
Features . | Pre . | Post . | Pre . | Post . | Pre . | Post . |
Size (bp) | 1,599,441 | 1,593,537 | 1,604,216 | 1,594,544 | 1,601,149 | 1,594,569 |
GC content (%) | 38.8 | 38.8 | 38.8 | 38.8 | 38.8 | 38.8 |
CDSs | 1,415 | 1,458 | 1,553 | 1,456 | 1,576 | 1,457 |
Genes | 1,624 | 1,543 | 1,624 | 1,544 | 1,644 | 1,544 |
Pseudogenes | 167 | 43 | 29 | 46 | 25 | 45 |
rRNAs | 6 | 6 | 6 | 6 | 6 | 6 |
tRNAs | 36 | 36 | 36 | 36 | 37 | 36 |
3.3. Whole genome comparisons of revised H. pylori UM032, 298 and 299
The revised complete genome sequences of H. pylori clinical isolate UM032 and its mice-adapted derivatives 298 and 299 were compared against each other. A pairwise global alignment of the genome of UM032 with that of 298 and 299 revealed 99.9% identity at the nucleotide level. The 298 strain contained 89 variants relative to strain UM032, of which 32 were indels and the remaining were substitution mutations including single-nucleotide polymorphisms (SNPs) (Supplementary Table S7). Nine indels affected ORFs. In addition to tandem repeats or homopolymeric tracts and additions or deletions to the 5’ or 3’ ends of ORFs, an insertion has introduced a complete ORF, which is annotated as UM298_1363 encoding a lipopolysaccharide (LPS) biosynthesis protein. Amongst the 57 substitutions detected, 16 were intergenic and 41 were located in protein coding sequences. Only 12 of the latter resulted in amino acid changes and the remaining were synonymous substitutions. Relative to 298, only two tandem repeat insertions and 1 deletion within a homopolymeric tract were identified in 299, of which the latter restores an open reading frame designated UM299_0755 (Supplementary Table S8). We further performed a COG (clusters of orthologous groups) analysis on genes with non-synonymous changes (Table 4). Genes without an inferred COG category were also analysed. Together it was shown that genes encoding for outer membrane and lipopolysaccharide biosynthesis proteins are enriched for substitution mutations.
COG analysis of UM032 genes with non-synonymous substitutions in its mice-adapted derivatives
COG . | Function . | Number . | Locus tag . |
---|---|---|---|
D | Cell cycle control, cell division, chromosome partitioning | 1 | UM032_0202 |
E | Amino acid transport and metabolism | 1 | UM032_0124 |
H | Coenzyme transport and metabolism | 1 | UM032_1143 |
L | Replication, recombination and repair | 1 | UM032_1365 |
M | Cell wall/membrane/envelope biogenesis | 2 | UM032_0124, UM032_1363 |
N | Cell motility | 1 | UM032_0447 |
T | Signal transduction mechanisms | 1 | UM032_0447 |
COG . | Function . | Number . | Locus tag . |
---|---|---|---|
D | Cell cycle control, cell division, chromosome partitioning | 1 | UM032_0202 |
E | Amino acid transport and metabolism | 1 | UM032_0124 |
H | Coenzyme transport and metabolism | 1 | UM032_1143 |
L | Replication, recombination and repair | 1 | UM032_1365 |
M | Cell wall/membrane/envelope biogenesis | 2 | UM032_0124, UM032_1363 |
N | Cell motility | 1 | UM032_0447 |
T | Signal transduction mechanisms | 1 | UM032_0447 |
COG analysis of UM032 genes with non-synonymous substitutions in its mice-adapted derivatives
COG . | Function . | Number . | Locus tag . |
---|---|---|---|
D | Cell cycle control, cell division, chromosome partitioning | 1 | UM032_0202 |
E | Amino acid transport and metabolism | 1 | UM032_0124 |
H | Coenzyme transport and metabolism | 1 | UM032_1143 |
L | Replication, recombination and repair | 1 | UM032_1365 |
M | Cell wall/membrane/envelope biogenesis | 2 | UM032_0124, UM032_1363 |
N | Cell motility | 1 | UM032_0447 |
T | Signal transduction mechanisms | 1 | UM032_0447 |
COG . | Function . | Number . | Locus tag . |
---|---|---|---|
D | Cell cycle control, cell division, chromosome partitioning | 1 | UM032_0202 |
E | Amino acid transport and metabolism | 1 | UM032_0124 |
H | Coenzyme transport and metabolism | 1 | UM032_1143 |
L | Replication, recombination and repair | 1 | UM032_1365 |
M | Cell wall/membrane/envelope biogenesis | 2 | UM032_0124, UM032_1363 |
N | Cell motility | 1 | UM032_0447 |
T | Signal transduction mechanisms | 1 | UM032_0447 |
Taken together, our data have demonstrated an outburst of mutations when UM032 encounters a new host environment. The second round of mice infection using the mice-adapted strain 298 however resulted only in slight genomic alterations in the output strain 299, signifying that the bacterium has undergone sufficient genomic adaptations to establish a chronic infection. It is important to mention that our findings contrast the experimental outcome of Linz et al. who conducted a macaque infection study using the previously macaque-adapted H. pylori strain J166.28 In their study, J166output strains accumulated up to 12 SNPs within 1 week post infection. Our output strain 299, however, showed no SNPs 2 weeks after the infection. This suggests that there was a lack of acute inflammatory response, perhaps due to less variability between the gastric environments of mice, than between those of individual macaques.
Of note is that an amino acid change was detected in UM298_1343 relative to UM032_1343. This protein is known as the antigenic membrane-associated tumour necrosis factor α-inducing protein. The change replaces the isoleucine at position 111 for valine. Previous studies have shown that H. pylori TNFα-inducing protein is able to induce interleukin-1α, TNFα, IL-8, and macrophage inflammatory protein 1α productions in monocytes, as well as tumorigenesis in nude mice implanted with transfected Bhas 42 cells.29,30
3.4. Intragenomic recombination between babA and babB
Among 65 nucleotide substitutions detected in 298 and 299 relative to UM032, 24 were observed within ORF UM032_1223, which was identified as the babA gene by BLASTN search against the Helicobacter group in NCBI database. Such a high number of mutations could reflect the strong selective pressure for rapid adjustment of BabA-mediated adhesion to facilitate adaptation and colonisation of different host gastric niche. However, these mutations were all synonymous substitutions. Furthermore, rather than being arbitrarily distributed across the 2226 bp gene, they were confined within the C-terminal region, prompting us to investigate whether the mutations could arise due to a recombination event.
Paralogous to babA are both babB and babC genes. Due to their extensive sequence identity at 5’ and 3’ regions, intragenomic recombination can occur within the babABC family, resulting in phenotypic change of Lewis b antigen binding capacity.31–34 The bab genes reside in three different chromosomal locations, downstream of the hypD gene (locus A), the rpsR gene (locus B) and hp0318 in H. pylori 26695 (locus C), respectively.31,35 Nevertheless not all bab genes are present within each H. pylori strain. There are some strains which do not possess the babA or babC gene and some harbour two copies of babA or babB gene.36,37 There has also been an interesting observation by Kawai et al. that the babC locus, corresponding to hp0317 in H. pylori 26695, is empty in all the hpEastAsia strains as a result of reductive evolution in the outer membrane protein families.38

3.5. Mutations in LPS biosynthesis genes
Among the 89 total variants identified in 298 relative to UM032, 10 were located in two genes designated UM298_1086 and UM298_1395. There is no genotypic difference present in these genes between strains 298 and 299. BLASTN comparison revealed that both genes encode for fucosyltranferase (FucT) that plays a substantial role in the synthesis of Lewis (Le) antigens of which the former gene product exhibits 75.9% amino acid sequence identity to HP0651 (FutB) whilst the latter harbours 78.3% identical amino acids to that of HP0379 (FutA). In H. pylori genomes, there are three phase-variable FucT-encoding genes, termed futA, futB and futC. FutA and FutB, both are paralogous and can fucosylate either Lec antigen (type I carbohydrate backbone) in a α-(1,4) linkage to generate Lea antigen, and/or N-acetyl-lactosamine (LacNAc) (type II carbohydrate backbone) in a α-(1,3) linkage to create Lex antigen.40–42 Both Lea and Lex antigens can be further fucosylated by FutC in a α-(1,2) manner to create difucoslylated Leb or Ley (Fig. 5).43
In addition to the nucleotide variations found in FucT encoding genes, a 1-kb insertion was identified upstream of the UM032_1363 counterpart in both the 298 and the 299 genomes, designated UM298_1364 and UM299_1364 respectively. UM032_1363 shares approximately 80% nucleotide sequence identity with HP0619 in H. pylori strain 26695, and jhp0563 in H. pylori strain J99. Jhp0563 is a β-(1,3)galT gene with a product essential for the expression of Lec.44 The upstream homologous glycosyltransferase-encoding gene jhp0562, is however absent in several H. pylori strains including 26695.45 Due to a high degree of shared nucleotide sequence identity, it has been demonstrated that both jhp0562 and β-(1,3)galT genes can undergo intragenic recombination within a single strain to generate functional chimeric alleles. This explains why certain strains do not possess a jhp0562 allele.45 Interestingly, the 1-kb insertion has introduced a gene which is 90.7% similar to jhp0562 in the mice-derivative strains, designated 298_1363 and 299_1363 respectively. In this study, the mice were inoculated intragastrically with a pool consisting of 12 H. pylori clinical strains including UM032. The jhp0562-like allele was probably acquired by homologous recombination from one of the other strains inoculated.

Immunoblot analysis of H. pylori strains UM032, 298 and 299 whole cell lysates with anti-Ley and anti-Leb antibodies.

A schematic diagram of type I and type II Lewis antigen biosynthesis pathways in H. pylori strain UM032. GlcNAc, N-acetylglucosamine; LacNAc, N-acetyl-D-lactosamine.
The minute expression of Ley in strain UM032 might be either due to low level of endogenous α-1,3-FucT activity or a low LacNAc content and thus a reduced expression of Lex. In strain UM032, the futA gene (UM032_1394) is switched off because its intragenic poly(C) tract located near the ATG start codon lacks one nucleotide, resulting in out-of-frame translation. The futB gene, by contrast, is in-frame, yielding correct full-length FucT enzyme in each strain. Given that FutA is inactive, the presence of both Leb and Ley in strain UM032 indicates that its FutB enzyme, encoded by UM032_1086, must contain both α-1,3 and α-1,4 properties. This is in agreement with our prediction by pairwise comparison that UM032_1086 displays extensive amino acid identity of 93.1% with a previously documented α-1,3/4-FucT found in H. pylori strain UA948 (GenBank accession number AF194963) (Table 5).40,46
Fucosylation . | H. pylori FucT . | NCTC11639 . | UA948 . | DSM6709 . | UM032_1086 . | UM298_1086 . | UM298_1395 . |
---|---|---|---|---|---|---|---|
Distance | |||||||
α-1,3 | NCTC11639 | 0.22 | 0.14 | 0.22 | 0.22 | 0.2 | |
α-1,3 and α-1,4 | UA948 | 76.19 | 0.17 | 0.05 | 0.05 | 0.09 | |
α-1,4 | DSM6709 | 77.34 | 77.47 | 0.19 | 0.18 | 0.19 | |
α-1,3 and α-1,4 | UM032_1086* | 75.16 | 93.1 | 77.78 | 0.01 | 0.06 | |
α-1,3 and α-1,4 | UM298_1086* | 71.43 | 89.22 | 75.82 | 94.53 | 0.05 | |
α-1,3 and α-1,4 | UM298_1395* | 73.03 | 83.94 | 75.55 | 88.26 | 93.39 | |
% Identity |
Fucosylation . | H. pylori FucT . | NCTC11639 . | UA948 . | DSM6709 . | UM032_1086 . | UM298_1086 . | UM298_1395 . |
---|---|---|---|---|---|---|---|
Distance | |||||||
α-1,3 | NCTC11639 | 0.22 | 0.14 | 0.22 | 0.22 | 0.2 | |
α-1,3 and α-1,4 | UA948 | 76.19 | 0.17 | 0.05 | 0.05 | 0.09 | |
α-1,4 | DSM6709 | 77.34 | 77.47 | 0.19 | 0.18 | 0.19 | |
α-1,3 and α-1,4 | UM032_1086* | 75.16 | 93.1 | 77.78 | 0.01 | 0.06 | |
α-1,3 and α-1,4 | UM298_1086* | 71.43 | 89.22 | 75.82 | 94.53 | 0.05 | |
α-1,3 and α-1,4 | UM298_1395* | 73.03 | 83.94 | 75.55 | 88.26 | 93.39 | |
% Identity |
The enzymatic activity is predicted based on amino acid sequence comparison.
Fucosylation . | H. pylori FucT . | NCTC11639 . | UA948 . | DSM6709 . | UM032_1086 . | UM298_1086 . | UM298_1395 . |
---|---|---|---|---|---|---|---|
Distance | |||||||
α-1,3 | NCTC11639 | 0.22 | 0.14 | 0.22 | 0.22 | 0.2 | |
α-1,3 and α-1,4 | UA948 | 76.19 | 0.17 | 0.05 | 0.05 | 0.09 | |
α-1,4 | DSM6709 | 77.34 | 77.47 | 0.19 | 0.18 | 0.19 | |
α-1,3 and α-1,4 | UM032_1086* | 75.16 | 93.1 | 77.78 | 0.01 | 0.06 | |
α-1,3 and α-1,4 | UM298_1086* | 71.43 | 89.22 | 75.82 | 94.53 | 0.05 | |
α-1,3 and α-1,4 | UM298_1395* | 73.03 | 83.94 | 75.55 | 88.26 | 93.39 | |
% Identity |
Fucosylation . | H. pylori FucT . | NCTC11639 . | UA948 . | DSM6709 . | UM032_1086 . | UM298_1086 . | UM298_1395 . |
---|---|---|---|---|---|---|---|
Distance | |||||||
α-1,3 | NCTC11639 | 0.22 | 0.14 | 0.22 | 0.22 | 0.2 | |
α-1,3 and α-1,4 | UA948 | 76.19 | 0.17 | 0.05 | 0.05 | 0.09 | |
α-1,4 | DSM6709 | 77.34 | 77.47 | 0.19 | 0.18 | 0.19 | |
α-1,3 and α-1,4 | UM032_1086* | 75.16 | 93.1 | 77.78 | 0.01 | 0.06 | |
α-1,3 and α-1,4 | UM298_1086* | 71.43 | 89.22 | 75.82 | 94.53 | 0.05 | |
α-1,3 and α-1,4 | UM298_1395* | 73.03 | 83.94 | 75.55 | 88.26 | 93.39 | |
% Identity |
The enzymatic activity is predicted based on amino acid sequence comparison.

Protein structure modelling comparing both UA948FucT and UM032_1086. The catalytic sites accounting for interaction with the donor substrate, GDP-fucose, are highlighted.
An alternative hypothesis is that the amount of LacNAc is inherently low within the UM032 strain, permitting FutB to utilise Lec for Lea synthesis once the LacNAc reservoir is depleted. To enhance LacNAc production and thus Ley expression in strains 298 and 299, an additional copy of gylcosyltransferase gene must be present. This is consistent with the acquisition of a jhp0562-like allele in both strains. jhp0562 has been demonstrated in previous mutagenesis and complementation studies as an essential component in both type I and type II Le synthesis pathways.45 Despite displaying only 21% nucleotide sequence similarity to known β-(1,4)galT genes including jhp0765 and HP0826, our findings offer further support for the idea that the product of this acquired jhp0562-like allele contains β-1,4-galactosyltransferase activity for the conversion of N-acetyl-glucosamine (GlyNAc) to LacNAc. Subsequently the expression of full-length FutA, which is expected to exhibit similar functions to that of FutB by conserving approximately 93.4% amino acids, in strains 298 and 299, provides additional α-1,3 enzymatic activity to produce Lex that is immediately converted into Ley by FutC.
The occurrence of genotypic and phenotypic changes in Le antigens in strain 298 relative to the input strain UM032, but not between strains 298 and 299, reflects an initial event of host-driven Le antigen expression adaptation taking place in H. pylori upon its first encounter to a new host species. This would aid the bacterium in establishment of long-term colonisation. The acquisition of jhp0562-like allele further suggests that the product of this non-phase-variable allele may further confer an increase in competitive advantage among H. pylori strains.
3.6. Intergenic homopolymeric tract length alterations affect gene expression

Real-time quantitation of genes identified with modified intergenic homopolymer-length. Data are expressed as fold change relative to strain UM032. The symbol * indicates statistical significance where p<0.05.
. | UM032 . | 298 . | 299 . | |||
---|---|---|---|---|---|---|
Product . | Gene . | Pseudo . | Gene . | Pseudo . | Gene . | Pseudo . |
Tetratricopeptide repeat family protein | UM032_0224 | No | UM298_0224 | Yes | UM298_0224 | Yes |
Oligopeptide transport system permease protein OppC | UM032_0307 | No | UM298_0307 | Yes | UM299_0307 | Yes |
Putative metal-dependent hydrolase fragment 1 | UM032_0607 | No | UM298_0607 | Yes | UM299_0607 | Yes |
Hypothetical protein | UM032_0755 | No | UM298_0755 | Yes | UM299_0755 | No |
α-(1,3)-fucosyltransferase | UM032_1394 | Yes | UM298_1395 | No | UM299_1395 | No |
. | UM032 . | 298 . | 299 . | |||
---|---|---|---|---|---|---|
Product . | Gene . | Pseudo . | Gene . | Pseudo . | Gene . | Pseudo . |
Tetratricopeptide repeat family protein | UM032_0224 | No | UM298_0224 | Yes | UM298_0224 | Yes |
Oligopeptide transport system permease protein OppC | UM032_0307 | No | UM298_0307 | Yes | UM299_0307 | Yes |
Putative metal-dependent hydrolase fragment 1 | UM032_0607 | No | UM298_0607 | Yes | UM299_0607 | Yes |
Hypothetical protein | UM032_0755 | No | UM298_0755 | Yes | UM299_0755 | No |
α-(1,3)-fucosyltransferase | UM032_1394 | Yes | UM298_1395 | No | UM299_1395 | No |
. | UM032 . | 298 . | 299 . | |||
---|---|---|---|---|---|---|
Product . | Gene . | Pseudo . | Gene . | Pseudo . | Gene . | Pseudo . |
Tetratricopeptide repeat family protein | UM032_0224 | No | UM298_0224 | Yes | UM298_0224 | Yes |
Oligopeptide transport system permease protein OppC | UM032_0307 | No | UM298_0307 | Yes | UM299_0307 | Yes |
Putative metal-dependent hydrolase fragment 1 | UM032_0607 | No | UM298_0607 | Yes | UM299_0607 | Yes |
Hypothetical protein | UM032_0755 | No | UM298_0755 | Yes | UM299_0755 | No |
α-(1,3)-fucosyltransferase | UM032_1394 | Yes | UM298_1395 | No | UM299_1395 | No |
. | UM032 . | 298 . | 299 . | |||
---|---|---|---|---|---|---|
Product . | Gene . | Pseudo . | Gene . | Pseudo . | Gene . | Pseudo . |
Tetratricopeptide repeat family protein | UM032_0224 | No | UM298_0224 | Yes | UM298_0224 | Yes |
Oligopeptide transport system permease protein OppC | UM032_0307 | No | UM298_0307 | Yes | UM299_0307 | Yes |
Putative metal-dependent hydrolase fragment 1 | UM032_0607 | No | UM298_0607 | Yes | UM299_0607 | Yes |
Hypothetical protein | UM032_0755 | No | UM298_0755 | Yes | UM299_0755 | No |
α-(1,3)-fucosyltransferase | UM032_1394 | Yes | UM298_1395 | No | UM299_1395 | No |
Loci in strain UM032 with altered intergenic homopolymeric tracts relative to strains 298 and 299
Locus tag . | Gene product . | Length changes in output strains UM298 & UM299 . | Position . | Tract length comparison in 49 H. pylori complete genomes excluding strains UM032, 298 and 299 . | ||
---|---|---|---|---|---|---|
Observed frequency . | Maximum length (bp) . | Minimum length (bp) . | ||||
UM032_0025 | Hypothetical protein | (A)14→13 | ≪ −35 | 48/49 | 19 | 7 |
UM032_0212 | Hypothetical protein | (A)15→16 | < −35 | 49/49 | 22 | 11 |
UM032_0213 | CTP synthase | (T)15→16 | ≪ −35 | 49/49 | 22 | 11 |
UM032_0547 | Putative endonuclease G | (A)16→15 | −35/−10 | 15/49 | 17 | 11 |
UM032_0548 | Outer membrane protein HopD | (T)16→15 | < −35 | 49/49 | 21 | 8 |
UM032_0781 | Biotin synthase | (G)12→13 | ≪ −35 | 46/49 | 14 | 7 |
UM032_0908 | Outer membrane protein BabB | (T)14→13 | ≪ −35 | 27/49 | 22 | 7 |
UM032_1223 | Outer membrane protein BabA | (A)12→13 | −35/−10 | 44/49 | 15 | 8 |
UM032_1372 | Hypothetical protein | (T)12→10 | N/A* | 49/49 | 18 | 8 |
Locus tag . | Gene product . | Length changes in output strains UM298 & UM299 . | Position . | Tract length comparison in 49 H. pylori complete genomes excluding strains UM032, 298 and 299 . | ||
---|---|---|---|---|---|---|
Observed frequency . | Maximum length (bp) . | Minimum length (bp) . | ||||
UM032_0025 | Hypothetical protein | (A)14→13 | ≪ −35 | 48/49 | 19 | 7 |
UM032_0212 | Hypothetical protein | (A)15→16 | < −35 | 49/49 | 22 | 11 |
UM032_0213 | CTP synthase | (T)15→16 | ≪ −35 | 49/49 | 22 | 11 |
UM032_0547 | Putative endonuclease G | (A)16→15 | −35/−10 | 15/49 | 17 | 11 |
UM032_0548 | Outer membrane protein HopD | (T)16→15 | < −35 | 49/49 | 21 | 8 |
UM032_0781 | Biotin synthase | (G)12→13 | ≪ −35 | 46/49 | 14 | 7 |
UM032_0908 | Outer membrane protein BabB | (T)14→13 | ≪ −35 | 27/49 | 22 | 7 |
UM032_1223 | Outer membrane protein BabA | (A)12→13 | −35/−10 | 44/49 | 15 | 8 |
UM032_1372 | Hypothetical protein | (T)12→10 | N/A* | 49/49 | 18 | 8 |
This ORF is the last gene in an operon.
Loci in strain UM032 with altered intergenic homopolymeric tracts relative to strains 298 and 299
Locus tag . | Gene product . | Length changes in output strains UM298 & UM299 . | Position . | Tract length comparison in 49 H. pylori complete genomes excluding strains UM032, 298 and 299 . | ||
---|---|---|---|---|---|---|
Observed frequency . | Maximum length (bp) . | Minimum length (bp) . | ||||
UM032_0025 | Hypothetical protein | (A)14→13 | ≪ −35 | 48/49 | 19 | 7 |
UM032_0212 | Hypothetical protein | (A)15→16 | < −35 | 49/49 | 22 | 11 |
UM032_0213 | CTP synthase | (T)15→16 | ≪ −35 | 49/49 | 22 | 11 |
UM032_0547 | Putative endonuclease G | (A)16→15 | −35/−10 | 15/49 | 17 | 11 |
UM032_0548 | Outer membrane protein HopD | (T)16→15 | < −35 | 49/49 | 21 | 8 |
UM032_0781 | Biotin synthase | (G)12→13 | ≪ −35 | 46/49 | 14 | 7 |
UM032_0908 | Outer membrane protein BabB | (T)14→13 | ≪ −35 | 27/49 | 22 | 7 |
UM032_1223 | Outer membrane protein BabA | (A)12→13 | −35/−10 | 44/49 | 15 | 8 |
UM032_1372 | Hypothetical protein | (T)12→10 | N/A* | 49/49 | 18 | 8 |
Locus tag . | Gene product . | Length changes in output strains UM298 & UM299 . | Position . | Tract length comparison in 49 H. pylori complete genomes excluding strains UM032, 298 and 299 . | ||
---|---|---|---|---|---|---|
Observed frequency . | Maximum length (bp) . | Minimum length (bp) . | ||||
UM032_0025 | Hypothetical protein | (A)14→13 | ≪ −35 | 48/49 | 19 | 7 |
UM032_0212 | Hypothetical protein | (A)15→16 | < −35 | 49/49 | 22 | 11 |
UM032_0213 | CTP synthase | (T)15→16 | ≪ −35 | 49/49 | 22 | 11 |
UM032_0547 | Putative endonuclease G | (A)16→15 | −35/−10 | 15/49 | 17 | 11 |
UM032_0548 | Outer membrane protein HopD | (T)16→15 | < −35 | 49/49 | 21 | 8 |
UM032_0781 | Biotin synthase | (G)12→13 | ≪ −35 | 46/49 | 14 | 7 |
UM032_0908 | Outer membrane protein BabB | (T)14→13 | ≪ −35 | 27/49 | 22 | 7 |
UM032_1223 | Outer membrane protein BabA | (A)12→13 | −35/−10 | 44/49 | 15 | 8 |
UM032_1372 | Hypothetical protein | (T)12→10 | N/A* | 49/49 | 18 | 8 |
This ORF is the last gene in an operon.
UM032_0213 mRNA levels remained unaffected by the length change in a poly(T) tract located 30 nucleotides upstream of the -35 element. However, UM032_0212 transcription was significantly down-regulated at the poly(A) tract positioned 3 nucleotides upstream of the -35 element thatwas altered in length. The latter result was consistent with the findings of Åberg et al. that variation in the length of homopolymeric tracts located adjacent to the −35 element modulates promoter activity by changing local DNA structure and thereby binding of the RNA polymerase.13 Similarly, transcriptional activity was also reduced in both outer membrane protein-encoding UM032_0548 and UM32_0908 as the poly(T) tract situated ∼20 nucleotides upstream of each −35 element was reduced by single base pair in both strains 298 and 299. This indicates that regulation via variation in homopolymeric tract is a fairly general mechanism in H. pylori.

Western immunodetection of BabA in H. pylori strains UM032, 298 and 299. G27 and 26695 served as the positive and negative controls, respectively, in this assay.

Western immunodetection of UreB in H. pylori strains UM032, 298 and 299.
It is unclear what functional roles biotin synthase and the hypothetical protein play in H. pylori host colonisation. Recently, it was reported that enterohemorrhagic Escherichia coli (EHEC) is able to regulate its adherence to intestinal epithelial cells by sensing surrounding biotin level.50 Upon its arrival in the low-biotin large intestine, EHEC down-regulates the expression of its biotin protein ligase BirA to remove the repression on the global regulator Fur, thereby activating LEE (locus of enterocyte effacement) genes to promote bacterial adherence. In H. pylori, both BirA and Fur are also present (designated HP1140 and HP1027 in strain 26695, respectively).51 Nevertheless, it is uncertain in H. pylori if BirA may interact with Fur in a similar paradigm to that of E. coli as it lacks the N-terminal winged helix-turn-helix regulatory domain. It is tempting to hopothesise that the down-regulated expression of biotin synthase could possibly result in low intracellular biotin concentration, thus allowing derepression of Fur by BirA, which in turns activates genes involved in motility and chemotaxis to facilitate host colonisation.52 This warrants further investigation. The down-regulation of outer membrane protein genes, especially babA, however, is thought to increase bacterial dispersion and colonisation by preventing autoaggregation which might occur as a result of binding to LPS Leb antigen.
3.7. Concluding remarks
When H. pylori encounters a hostile foreign environment, the bacterium rapidly adapts to the new environment via a series of genomic alterations. Here, we demonstrated that a host change led to modification of the Lewis antigen profile of H. pylori lipopolysaccharide via acquisition of a jhp0562-like allele. In addition, expression levels of outer membrane proteins including BabA, BabB and HopD changed via altered homopolymeric tract lengths. These observations provide further evidence that rapid changes in membrane associated protein expressions play a major role in the early adaptation of bacterial populations to an individual host and these components are one of the key factors in H. pylori’s success as a pathogen.
Acknowledgements
This project was supported by the University of Malaya-Ministry of Education (UM-MoE) High Impact Research (HIR) grant (reference UM.C/625/1/HIR/MoE/CHAN13/3; Account No. H-50001-A000030), the National Health and Medical Research Council (grant no. 572723), the Vice Chancellor of the University of Western Australia, and the Western Australian Department of Commerce and Department of Health. SP was supported by SCELCE Microbiome Centre and grants from LKC School of Medicine, NTU University, Singapore. We thank Dr K. Mary Webberley for providing critical comments on this manuscript. We also thank Dr Hong Li (West China Hospital, Sichuan University) for providing blood group antigen-binding adhesin A (BabA) polyclonal antibody. We would also like to acknowledge Susana Wang, Primo Baybayan and Meredith Ashby of PacBio Biosciences (USA) and Siddarth Singh of PacBio Singapore for sequencing and assembling the original complete genomes of UM032, 298 and 299.
Conflict of interest
None declared.
Supplementary data
Supplementary data are available at www.dnaresearch.oxfordjournals.org.