Characterization of the Escherichia coli O59 and O155 O-antigen gene clusters: The atypical wzx genes are evolutionary related

O-antigens are highly polymorphic. The genes speciﬁcally involved in O-antigen synthesis are generally grouped together on the chromosome as a gene cluster. In Escherichia coli , the O-antigen gene clusters are characteristically located between the housekeep-ing genes galF and gnd . In this study, the O-antigen gene clusters of E. coli O59 and E. coli O155 were sequenced. The former was found to contain genes for GDP-mannose synthesis, glycosyltransferase genes and the O-antigen polymerase gene ( wzy ), while the latter contained only glycosyltransferase genes and wzy . O unit ﬂippase genes ( wzx ) were found immediately downstream of the gnd gene, in the region between the gnd and hisI genes in these two strains. This atypical location of wzx has not been reported before, and furthermore these two genes complemented in trans despite the fact that diﬀerent O-antigen structures are present in E. coli O59 and O155. A putative acetyltransferase gene was found downstream of wzx in both strains. Comparison of the region between gnd and hisI revealed that the wzx and acetyltransferase genes are closely related between E. coli O59 and O155, indicating that the two gene clusters arose recently from a common ancestor. This work provides further evidence for the O-antigen gene cluster having formed gradually, and selection pressure will eventually bring O-antigen genes into a single cluster. Genes speciﬁc for E. coli O59 and O155, respectively, were also identiﬁed.


Introduction
The O-antigen, which contains many repeats of an oligosaccharide unit (O unit), is part of the lipopolysaccharide (LPS) present on the surface of gram-negative bacteria. It is one of the most variable components of cells and it varies by the types of sugars present, the arrangement of the sugars and the linkages between the sugars within an O unit, as well as the linkage between the O units [1,4]. Based on the O-antigen variation, a bacterial species is divided into different O serotypes. In Escherichia coli (including Shigella), 186 O serotypes have been recognized [2]. The O-antigen is subject to intense selection by the host immune system, which may be the major factor for the maintenance of many different O-antigen forms [1,4].
There are three main classes of genes involved in O-antigen synthesis: (1)  sugar precursors; (2) genes encoding glycosyltransferases for sequential transfer of sugars from their respective nucleotide precursors to the carrier lipid, undecaprenolphosphate (Und-P) to form O units; and (3) O unit processing genes encoding O unit flippase (wzx) and O-antigen polymerase (wzy) [1]. The role of wzx is to translocate or flip the O units formed at the cytoplasmic face of the membrane to the periplasmic face. The O units are then polymerised to form a long chain O-antigen at the periplasmic face of the membrane by Wzy. Both wzx and wzy are specific to individual O-antigens [3].
The genes specifically involved in O-antigen synthesis are generally grouped together on the chromosome as a gene cluster. In E. coli and Salmonella enterica, the O-antigen gene clusters lie between the galF and gnd genes [4]. The O-antigen genes are found generally very close to each other, often overlapping in their reading frames, and thought to be transcribed as a unit [5]. There are few cases of one or more O-antigen genes being located outside the O-antigen gene clusters. For example, in E. coli O55, two of the colitose synthetic pathway genes, col1 and col2, were found outside the O-antigen gene cluster, downstream of the gnd gene, while the rest of the pathway genes were found within the O-antigen gene cluster between galF and gnd [6]. The location of col1 and col2 is perhaps at intermediate stage, and selection pressure will relocate the two genes into one operon with other O-antigen genes.
The structural variation of O-antigens is almost entirely dependent on the variation of the O-antigen gene clusters. Typically, O-antigen gene clusters have a GC content lower than the average level of the genome, indicating the gene clusters have been arisen from other species by lateral transfer [7,8]. Inter-and intraspecies lateral transfer of O-antigen genes appear to play important roles for expanding O-antigen polymorphism. Both transfer of O-antigen genes for assembly of new O-antigen gene clusters and transfer of entire O-antigen gene clusters between clones of a species by homologous recombination were observed [7,9,10]. Interspecies transfer of the entire O-antigen gene cluster from Plesiomonas shigelloides to E. coli was also reported [11,12].
In this study, the region between galF and his operon of E. coli O59 and O155 chromosome were sequenced, respectively, wzy genes were identified functionally and other genes were identified on the basis of homology. It was shown that the O59 and O155 antigen gene clusters are located between galF and gnd. However, the O unit flippase gene (wzx) and an acetyltransferase gene were found outside the main O-antigen gene clusters, locating between the gnd gene and the his operon, in both strains. Both wzx genes were proven to be functional and able to trans-complement with each other, despite the fact that this gene is usually highly specific to a particular O unit. This indicates that the E. coli O59 and O155 gene clusters were recently generated from a common ancestor, and they are at a stage to have the wzx genes adapted specifically to the respective O units and the wzx and acetyltansferase gene moved to the main cluster. Finally, by screening against all 186 Shigella and E. coli O serotypes, genes specific for each of E. coli O59 and O155 were identified.

Bacterial strains and plasmids
The bacterial strains and plasmids used in this study are listed in Table 1. Primers used in this study are listed in Table 2.

Construction of DNaseI shot gun bank
Chromosomal DNAs were prepared as previously described [13]. Long PCR was carried out using the Expand Long Template PCR system from Roche. Primers #1523 (5 0 -ATTGTGGCTGCAGGGATCAA-AGAAAT-3 0 ) and #1524 (5 0 -TAGTCGCGTGNGCC-TGGATTAAGTTCGC-3 0 ) [14] were used to amplify the region between galF and gnd, and the PCR cycles were as follows: denaturation at 94°C for 10 s, annealing at 60°C for 30 s and extension at 68°C for 15 min. Primers WL-359 and WL-360 were used to amplify the region between the gnd gene and the his operon, and the PCR cycles were as follows: denaturation at 94°C for 10 s, annealing at 58°C for 30 s and extension at 68°C for 10 min. To limit any possible PCR errors, 10 individual PCR products were pooled. The PCR products were subjected to DNaseI digestion and cloned into pGEM-T Easy to construct banks using the method described previously [15].

Sequencing and analysis
Sequencing was carried out using an ABI 3730 automated DNA sequencer. Sequence data were assembled using the Staden Package [16]. The program Artemis was used for gene annotation [17]. The program BLOCKMAKER was used for searching conserved motifs [18]. BLAST and PSI-BLAST [19] were used for searching databases including the GenBank database and the Pfam protein motif database [20]. Sequence alignment and comparison were performed using the program Clustal W [21].
2.4. Deletion of wzx, wzy and wclD genes from E. coli O59 and O155 The wzx, wzy and wclD genes were each replaced by the chloramphenicol acetyltransferase (CAT) gene using the RED recombination system of phage lambda [22,23]. The CAT gene was PCR-amplified from plasmid pKK232-8 using primers binding to the 5 0 and 3 0 ends of the gene, respectively, and each primer also carrying about 40bp which flanks the target gene. Primer pairs WL À995/996 and WL À991/992 were used for replacement of wzx; WL À997/998 and WL À993/994 for wzy in E.Coli O59 and 0155, respectively; WL À983/984 for wclD in E. coli O59. The PCR products were transformed into E. coli O59 and O155, respectively, carrying pKD20, and chloramphenicol-resistant transformants were selected after induction of the RED genes according to the protocol described by Datsenko and Wanner [22]. PCR using primers specific to the CAT gene and the flanking DNA of the target gene was carried out to confirm the replacement. Membrane preparation, SDS-PAGE and silver staining for visualization of the LPS were carried out as described previously [24].
To complement the deficient mutants, wzx, wzy and wclD genes from E. coli O59 and O155 were PCR ampli-fied and cloned into the NcoI and BamHI or NcoI and EcoRI sites of pTRC99A to make plasmids pLW1011, pLW1012, pLW1211, pLW1212, pLW1213, and pLW1214 (see Table 1). Expression of the cloned genes was induced by 2.5 mM IPTG (isopropylthiogalactopyranoside).

Specificity assay by PCR
Chromosomal DNA was isolated from 186 E. coli (including Shigella) strains representing all the different O serotypes. The quality of chromosomal DNA from each strain was examined by PCR amplification of the mdh gene (coding for malate dehydrogenase and present as a house-keeping gene in E. coli) using primers described previously [25]. A total of 28 DNA pools were made, and each pool contained between 6 and 10 strains [2]. Pools were screened using primers based on genes This study specific to E. coli O59 or O155, respectively. PCR was carried out in a total volume of 25 ll, of which 10 ll was run on an agarose gel to check for amplified DNA.

Nucleotide sequence accession number
The DNA sequence of the region between galF and hisI from E. coli O59 and O155 O-antigen gene cluster have been deposited in GenBank under the accession numbers AY654590 and AY657020, respectively.

Results and discussion
3.1. Sequencing the regions between galF and hisI from E. coli O59 and O155 Sequences of 16,573 and 12,755 bases between galF and hisI were obtained from E. coli O59 and O155, respectively. The former contained fourteen open reading frames (ORFs) including galF and hisI, and the latter contained nine. All of the ORFs in the two sequences are transcribed from galF to hisI with the exception of gla, which transcribed in the opposite direction ( Fig.  1). ORFs were assigned functions based on knowledge of previously characterized O-antigen genes and homology comparisons using available databases (Tables 3  and 4).

The region between galF and gnd is the O-antigen gene cluster
The region between galF and gnd from E. coli O59 was found to contain two genes for the synthesis of GDP-mannose (manC and manB), three glycosyltransferase genes (wclA, wclB and wclC), and the O-antigen polymerase gene (wzy). The same region from E. coli Table 2 Primers used in this work a

Primers
Oligonucleotide O155 was found to contain three glycosyltransferase genes (wclE, wclF, and wclG) and wzy. All of the genes are typical of those involved in the synthesis of O-antigen, and the region between galF and gnd is the normal location for E. coli O-antigen gene cluster. The putative function of wzy was further confirmed by comparison of LPS phenotype between the wild type and the mutant in which wzy was replaced by a CAT gene. While H1412, the wzy deficient mutant strain of E. coli O59, produced only lipid-A/core part of the LPS and one oligosaccharide unit, the wild type E. coli O59 and complement strain (H1125) produced normal LPS ( Fig. 2(b)). The same was found for the mutant and wild type E. coli O155 (Fig. 2(a)). This confirms the designation of wzy, and the regions between galF and gnd in E. coli O59 and O155 are O-antigen gene clusters. However, the O unit flippase gene is absent in this region in both strains.

wzx and acetyltransferase genes are located downstream of the main gene cluster in E. coli O59 and O155
The regions between gnd and hisI in both E. coli O59 and O155 were found to contain five ORFs. orf11, orf12 and orf13 of O59 and orf9, orf10 and orf11 of O155 showed high level similarity to ugd, gla and wzz, respectively (Tables 3 and 4). ugd and gla are house-keeping genes, encoding proteins involved in UDP-GlcA and UGP-GalA biosynthesis, respectively [5]. wzz encodes the O-antigen chain length determinant protein. ugd, gla and wzz are usually located in this region in other E. coli serotypes [27]. orf11, orf12 and orf13 of O59 and orf9, orf10 and orf11 of O155 were named ugd, gla, and wzz, respectively.
orf9 of E. coli O59 was predicted to encode protein with 12 transmembrane segments, relating to the protein family Pfam062047 and COG2244, which both include Wzx proteins. When Orf9 and the putative Wzx proteins of Nostoc punctiforme (GenBank entry ZP_00108476) and Yersinia enterocolitica (GenBank entry AAC-60766) were analyzed using the BlockMaker program, six conserved motifs were revealed (of length 37, 9, 28, 49, 33and 41 amino acids, respectively). The consensus sequences of those motifs were used to run the program PSI-BLAST to search the Genpept database. Except for the Wzx proteins of N. punctiforme and Y. enterocolitica, other distantly related Wzx proteins were retrieved after three iterations (E value = 4 · 10 À9 ). This indicates that orf9 is a putative wzx gene.
orf7 of O155 was also predicted to encode protein with 12 transmembrane segments. Orf7 showed 73% identity to Orf9 of O59, and was also identified as putative Wzx using programs BlockMaker and PSI-BLAST as mentioned above.
Orf10 of O59 and Orf8 of O155 share 64% identity. Both show 41% identity or 65% similarity to CysE, a serine acetyltransferase of Clostridium tetani E88 [28], When searching against the Pfam and COG databases, Orf10 and Orf8 were found belonging to the COG1045 acetyltransferase family (E value = 2 · 10 À12 ). We proposed that orf10 of O59 and orf8 of O155 were putative acetyltransferase genes, and named both genes wclD.
To confirm that wzx and wclD genes are involved in O-antigen synthesis, we replaced wzx of both O59 and 0155 and wclD gene of O59 with CAT genes to make the mutant strains H1414, H1413, and H1056, respectively. While the wild-type and complement strains produced the normal LPS, the mutant strains produced only the lipid-A/core part of the LPS (Figs. 2(a)-(c)). We also showed that H1056 complemented with wcld gene of 0155 produced normal LPS (Fig.  2b), indicating the wcld genes of both O59 and 0155 have the same function. This indicates that wzx and wclD genes are required for the synthesis of O59 and O155 antigens.

wzx genes of E. coli O59 and O155 are evolutionary related
Wzx flips the undecaprenol-phosphate (Und-P) linked O units across the bacterial inner membrane. As the Und-P is also needed for the synthesis of cell wall peptidoglycan, teichoic acids and other carbohydrate polymers, accumulation of Und-P-linked O units on the cytoplasmic face of inner membrane results in cell death [3]. Thus, it is no surprise that the wzx gene is located within the main cluster in the 70 or so known cases (http://www.microbio.usyd.edu.au/BPGD/default.htm). This is the first time that the wzx gene was found outside of the main gene cluster.
The region between gnd and hisI in O59 and O155 contains the same genes in the same order, and the DNA identity between each pair of the genes is also high (Fig. 1). In comparison to gnd, which has a GC content of about 0.5, wzx and wclD, adjacent to gnd gene, have a lower GC content of about 0.30, and other genes in this region have a GC content of about 0.43. This suggests that the first two genes (wzx and wclD) were incorporated into their current position recently from the same ancestor or evolved one from the other.
Plasmids pLW1011 and pLW1012 expressing respective Wzx protein of O59 and O155 restored the O155 and O59 antigen synthesis in wzx mutants to levels that were comparable to those obtained with the parental strains (Fig. 2(c)). The fact that the two Wzx proteins are functionally interchangeable further confirms that the two genes were evolved from the same origin. We then calculated the synonymous substitution rate (Ks) and nonsynonymous substitution rate (Ka) between the O59 and O155 genes using program K-estimator ( Fig. 1) [29]. Comparison between the two wzx genes gave a Ks of 1.2397, a Ka of 0.1685, and a Ka/Ks ratio of 0.14, the two wclD had a Ks of 1.1854, a Ka of 0.1873, and a Ka/Ks ratio of 0.16. Comparison of 67 house-keeping genes present in both E. coli K-12 and S. enterica serovar gave average values for Ks and Ka of 0.94 and 0.039, respectively, and an average Ka/Ks ration of 0.04 [30]. The much higher Ka/Ks ratio of wzx and wclD of the two strains suggests an occurrence of adaptive change after divergence for more efficiently flipping of different O units or acetylating on different O units. At this stage, it is not clear if the wzx and wclD genes of the two strains evolved from a common ancestor or one from the other, but in no doubt that they are both at an intermediate stage and the two genes will eventually be brought into the main gene between galF and gnd.

Identification of genes specific for E. coli O59 and O155
Glycosyltransferase and O unit processing genes (wzx and wzy) are usually specific to individual O-antigens [15]. Four primer pairs based on wclA, wclB, wclC and wzy from E. coli O59, and two pairs based on wzy and wzx from E. coli O155 were designed, and used to screen the DNA pools contained representatives of the 186 known O-antigen forms of E. coli and Shigella strains. Except for the pools containing E. coli O59 and O155, which gave the expected PCR products with respective primer pairs, none of the other pools produced the expected PCR products. Therefore, all of the genes tested were specific to E. coli O59 and O155, respectively. PCR based assays using the primers based on the O-antigen specific genes may be developed for the rapid detection and identification of E. coli O59 and O155 strains as well as their DNA serotyping.

Conclusions
The O-antigen gene clusters of E. coli O59 and O155 were found between galF and hisI, which contained twelve and ten genes, respectively. The designation of wzy, wzx and wlcD genes in E. coli O59 and O155 were further confirmed by mutation analysis. The wzx genes of the two strains were found downstream of gnd and complementary in trans. An acetyltransferase gene, wclD, is adjacent to wzx in both strains. Further comparison indicates that wzx and wclD genes of the two strains were evolved from the same origin. We suggest that wzx and wclD genes of the two strains site at the present position as an intermediate stage, they are expected to relocate into the main O-antigen gene clusters under selection pressure. The genes specific for E. coli O59 and O155 were also identified, which can be potentially used for rapid identification of the two strains by PCR.