The generation of complete genome sequences provides a blueprint that facilitates the genetic characterization of pathogens and their hosts. The genome of Salmonella enterica serovar Typhi (S. Typhi) harbors ∼5 million base pairs encoding some 4000 genes, of which >200 are functionally inactive. Comparison of S. Typhi isolates from around the world indicates that they are highly related (clonal) and that they emerged from a single point of origin ∼30,000–50,000 years ago. Evidence suggests that, as well as undergoing gene degradation, S. Typhi has also recently acquired genes, such as those encoding the Vi antigen, by horizontal transfer events.
The species Salmonella enterica is made up of pathogenic bacteria that have the ability to infect a wide range of animals and cause a variety of disease syndromes. Since the early days of microbiology, Salmonella has fascinated researchers and clinicians alike, in part because of the antigenic diversity within the genus, leading to the assignment of isolates to >2500 different serovars. Serologic typing has dominated the classification of Salmonella, and the methodologies have proved useful in both epidemiologic control and the clinical treatment of infections. Taxonomists have struggled to define Salmonella as a species group. We now work with the definition of the S. enterica species divided into several subspecies and many serovars. Salmonella bongori has been classified as a distinct species. The new science of genomics and the ability to sequence the entire genome of individual bacteria will allow us to improve our understanding of the organization and evolution of the species S. enterica and allows us to look in detail at specialized members, such as S. enterica serovar Typhi (S. Typhi) [1, 2]. We should also recognize that species such as S. enterica are still evolving, and we are currently observing only a small snapshot as it evolves through time. Examination of the organization of genomes should allow us to gain a better understanding of the mechanisms by which species evolve and may also help us predict the type of organisms that might appear in the future.
Basic Features Of The Salmonellae
Salmonella is a genus within the Enterobacteriacae and, as such, falls into the general group of enteric bacteria that includes Escherichia coli and Shigella species. The basic life cycle of these microorganisms involves, allowing for a few exceptions, colonization of the lumen of the intestine of animals and transmission, via the external environment, between hosts. Most members of the Enterobacteriaceae are commensal microorganisms that exist as components of the normal intestinal microflora, although the pathogenic potential of some is enhanced. A comparison of the genomes of several sequenced enteric bacteria immediately highlights some important common traits. All have a single chromosome, normally 4.3–5.0 Mb in size [1, 3,4,5,6,7–8]. Different strains may also harbor extrachromosomal DNA in the form of plasmids. Plasmids often carry genes associated with virulence or antibiotic resistance and can be considered to be a rapidly evolving gene pool. Comparison of the chromosomes of different enteric bacteria identifies a common set of so called “core genes” that are, in general, shared among enteric species [6, 9]. These core genes can be regarded as genes that perform “household” functions associated with the common shared lifestyle of intestinal colonization and transmission (environmental survival). The full definition of the core gene set is difficult, but the shared genome can be identified by comparing DNA sequences and examining shared gene function. Such core genes may play a role in central metabolism or polysaccharide biosynthesis or encode common structural proteins. However, perhaps a more interesting feature is that the core genome is mainly organized with these genes aligned in the same conserved order along the single chromosome, a characteristic referred to as “synteny.” Genomic synteny can be highlighted using a simple genome comparison tool, such as the one illustrated in figure 1. The core genome may be compared to the “chassis” of a car, in that it is a conserved design feature that works successfully and aids the lifestyle of the enteric bacteria. If the DNA sequences of genes in the core genome of different enteric bacteria are compared, E. coli and S. enterica are found to differ by ∼10%, and Salmonella serovars within S. enterica differ by ∼1%. This 10% divergence between the core sequences of E. coli and S. enterica most likely represents evolutionary drift over the ∼100 million years since the 2 species separated from a common ancestor.
The genomes of enteric bacteria are under intensive selective pressure because of factors that include competition within the normal flora, coping with fluctuating nutrient sources in the host and the environment, and pressure from the host immune system, to name but a few. Thus, we may expect the genomes of enteric bacteria to show signatures of these evolutionary pressures, and this appears to be the case. Scattered along the core genome are blocks of genes—or, in some cases, single genes or gene remnants—that have limited or no homology with the core genome. Furthermore, these genes can be unrelated or show significant divergence between different enteric species or even between members of the same species. These novel genes can, however, share some related features and common functions. For example, they might work together to enhance the virulence potential of a species. Such virulence-associated gene combinations are often referred to as “pathogenicity islands” . Examples include Salmonella pathogenicity islands 1 and 2 (SPI-1 and SPI-2, respectively), which contribute to invasiveness and the ability of Salmonella organisms to survive inside eukaryotic cells. These gene clusters often have GC content that differs from that of the core genome, suggesting that they have been acquired more recently and independently by a horizontal gene-transfer event. Indeed, these genes are frequently associated with genes from plasmids and phage and are sometimes integrated into redundant transfer RNA genes that can act as receptor sites for foreign DNA . Prophages or phage remnants are signature genes of the diverse, noncore regions of the enteric bacterial genome. Bacteriophages have the ability to mobilize DNA and appear to be one of critical drivers of diversity in these microorganisms . This is not necessarily surprising, because we can imagine frequent and multiple interactions between phage and bacteria in both the intestine and the environment.
Features Of The Genome Of S. Typhi
Genome sequencing allows us to look in some detail at the genomes of individual species, subspecies, serovars, and even different isolates within the same serovar [1, 2, 6,7,8–9]. Thus, we can examine the genetic blueprint of such bacteria and make simple comparisons with the genomes of bacteria that share certain phenotypic characteristics (such as host restriction or biotype). We may also compare the genome of S. Typhi with the genomes of other bacteria that are limited in their pathogenic potential for particular hosts (i.e., host promiscuity or restriction). S. Typhi is a particular Salmonella serovar and is recognized as the cause of typhoid fever [13, 14]. S. Typhi isolates frequently share common antigenic determinants (e.g., O9, Hd, and Vi), and, indeed, serologic analysis is routinely used in the confirmation of the identity of clinical isolates. S. Typhi causes an invasive form of salmonellosis in humans that is characterized by a prolonged incubation period, fever, and systemic bacterial dissemination, which can be detected by isolation of S. Typhi from the blood and bone marrow. Two key phenotypic signatures of S. Typhi include the ability to express the Vi polysaccharide antigen at the surface of the bacteria and host restriction to humans. S. Typhi can infect some higher primates but is poorly infectious in mice and other mammals . The human host restriction phenotype of S. Typhi has inhibited direct studies of the pathogenicity of this serovar. Most of our understanding of the pathogenicity of S. Typhi has been gleaned from comparative studies with more promiscuous S. enterica serovars, such as S. enterica serovar Typhimurium (S. Typhimurium), in the mouse. At the time of writing, the complete DNA sequences of 2 different S. Typhi isolates are available in public databases [1, 2]. These genome sequences are of the classic S. Typhi type strain Ty2 (originally isolated in the early 1900s), frequently used in laboratory studies around the world, and S. Typhi CT18, a multidrug-resistant isolate collected in Vietnam in 1992. S. Typhi CT18 harbors 2 large plasmids, one of which encodes multidrug resistance, that are not found in Ty2.
A comparison of the chromosomes of S. Typhi CT18 and Ty2 demonstrates a remarkable degree of conservation. Almost all of the genes are shared between the 2 isolates. Some differences include an additional cluster of a few genes in Ty2 that might be a novel pathogenicity island and a P4-like phage determinant (N. R. Thomson, unpublished data). However, the overriding feature is conservation of sequence. This comparison immediately begs the question of how diverse the different S. Typhi isolated around the world are. One simple method to gauge bacterial diversity is to use multilocus sequence typing (MLST), an approach that compares the sequences of portions of 7 household genes from the chromosomal DNA of different isolates . MLST can be used in some species, such as S. enterica itself, to classify isolates into related MLST types or clusters . Interestingly, MLST sometimes clusters isolates of a particular serovar into a single cluster, but this is not always the case, and some serovars may be polyphyletic, following different evolutionary lineages. However, MLST and other forms of analysis of S. Typhi isolates suggest a remarkable degree of conservation, with a very limited number of highly related MLST patterns [18,19–20]. These data in themselves (but also taken together with information gleaned from other, independent approaches) suggest that S. Typhi belongs to a single clonal type that has evolved from the same progenitor. Indeed, examination of the DNA sequence and the rate of change of single-nucleotide polymorphisms suggest that S. Typhi may be as young as 30,000 years old , indicating that its divergence occurred significantly later than the estimated divergence of S. enterica and E. coli. Thus, S. Typhi has had a limited time frame in which to accumulate diversity. The fact that S. Typhi isolates are so highly related means that it is often difficult to differentiate between isolates in a local area and even more difficult to define evolutionary lineages within the population. Genomic methodologies can be applied to help. Relatively simple methods, such as PFGE, have proved valuable to distinguish between isolates . This method has been helped by the unusual characteristic of S. Typhi to undergo recombination, between different ribosomal RNA operons, that rearranges blocks of the genome in a distinct manner . However, PFGE has limited potential for identifying evolutionary lineages at a global or local level. We await the development of new single-nucleotide polymorphism–based genomewide typing methods to address this need.
One of the main characteristics of most S. Typhi isolates is the ability to express the Vi polysaccharide capsule. Vi capsule production is associated with a set of genes that are situated within a novel gene island that has been designated as SPI-7 [23, 24]. This island is 134 kb in length and encodes a variety of putative virulence-associated gene clusters, including the Vi locus, a phage encoding the sopE effector protein of SPI-1, a type IV pilus, and a putative type IV secretion system. SPI-7 has many typical features that are associated with horizontally acquired DNA, and the structure is indicative of several independent integration events. There is also some evidence that SPI-7 may be able to act in a fashion similar to that of a conjugative transposon. Indeed, an SPI-7–like element that encodes Vi is present in some S. enterica serovar Paratyphi C and some S. enterica serovar Dublin isolates. The fact that the Vi genes are encoded on a mobile and potentially unstable genetic region is of interest because the Vi antigen is the target of a promising Vi-based typhoid vaccine. Surveillance methods will have to be introduced to ensure that no capsule replacement events occur as a result of selection driven by such Vi-based mass-vaccination campaigns .
Why Is S. Typhi Invasive and Host Restricted?
The question of what the genetic basis of the invasive nature and the human host–restricted phenotype of S. Typhi is has fascinated researchers for many years. S. Typhi yielded few clues prior to genome sequencing. However, examination of the S. Typhi genome and comparison with other serovars, such as S. enterica serovar Paratyphi A, that also have the ability to cause invasive typhoid-like disease in humans has provided some intriguing clues. The S. Typhi CT18 genome was the first to be completely determined, and annotation of the gene repertoire of this isolate indicated that >200 of the genes present in the genome showed evidence of being disrupted or inactivated. We refer to such genes as “pseudogenes.” Pseudogenes have been described in human DNA and some fastidious bacteria but were not expected to be present in such high numbers in a free-living bacteria such as S. Typhi . Comparison with the genome of S. Typhimurium LT2 (the first publicly available S. Typhimurium genome) showed that the majority of the genes appeared to be intact and most likely fully functional in this serovar . Hence, for some reason, S. Typhi had accumulated mutations that inactivated ∼5% of its gene repertoire.
Further analysis indicated that different S. Typhi isolates harbor an almost identical complement of pseudogenes with essentially the same mutations, again supporting the hypothesis that S. Typhi had evolved once, relatively recently, and had spread in the human population from that one source. Interestingly, if tables are drawn linking related genes, some interesting gene types are found to have been inactivated in S. Typhi. Seven of the 12 fimbrial systems are inactivated, along with other genes that encode putative fimbrial-like genes . Genes such as shdA and ratB that have been associated with intestinal persistence in S. Typhimurium have been inactivated . Several genes associated with known pathogenicity islands such as SPI-1, SPI-2, SPI-3, SPI-4, and SPI-5 have also become inactivated. This suggests that S. Typhi has lost the ability to express several virulence-associated traits, a factor that could begin to explain the loss of host range of this serovar.
Interestingly, a number of the genes that are present as pseudogenes in S. Typhi are also pseudogenes in S. enterica serovar Paratyphi A, and some of these are inactivated by identical mutations, suggesting a common evolutionary origin. Loss of function of genes associated with intestinal attachment and persistence would fit into a lifestyle associated with invasion of systemic tissue and transmission by excretion via the gall bladder rather than luminal gut colonization. This trait may be apparent from data provided by human challenge studies (both virulent and live vaccine formulations), in which S. Typhi appears to be shed at a lower level after oral challenge, compared with less-invasive serovars, such as S. Typhimurium [14, 28]. Although this evidence is, on the whole, circumstantial, it does allow for the formulation of testable hypotheses by which to address these questions. More recently, studies of Vi antigen  have indicated that Vi may have immunomodulatory activities that limit the ability of neutrophils to be recruited to the intestine during the early phases of S. Typhi infection, limiting diarrhea but promoting the invasive potential of the colonization process . Thus, a combination of acquisition and loss of virulence-associated traits may contribute to the pathogenic potential of S. Typhi.
The sequencing of the genomes of S. Typhi isolates and the ability to compare these genetic blueprints with those of related microorganisms has provided new insights into how this pathogen has evolved to cause invasive disease in humans. Proceeding from these leads, we can expect new tools to emerge that will improve our abilities to interrogate the structure and evolution of S. Typhi, and these developments are likely to be followed by improved diagnostic tools. We might also expect more efforts to be made to understand the basic pathogenic mechanisms employed by this pathogen in the human host.
Financial support. Wellcome Trust.
Supplement sponsorship. This article was published as part of a supplement entitled “Tribute to Ted Woodward,” sponsored by an unrestricted grant from Cubist Pharmaceuticals and a donation from John G. McCormick of McCormick & Company, Hunt Valley, Maryland.
Potential conflicts of interest. S.B. and G.D.: no conflicts.