Abstract

As part of the ongoing sequencing of the complete Salmonella typhimurium LT2 genome, a partly ordered set of 416 lambda clones has been developed, representing over 90% of the genome. The average insert size is 17 kb. Sequences were obtained from both ends of each clone in this set. A total of over 600 kb of sequence has been deposited in the genome survey sequence section of GenBank. This resource of clones is available from the Salmonella Genome Stock Center. A preliminary comparison with the Escherichia coli K12 genome indicates that there are likely to be many hundred insertion deletion events, encompassing more than one gene, that distinguish these genomes. Fully 30% of the S. typhimurium sequences have no close homologs in the GenBank database.

Introduction

As part of a project to complete the genomic sequence of Salmonella typhimurium LT2 we have constructed a bacteriophage lambda library. Sequencing the ends of a sample of these lambda clones ensures the correct melding of the genomic sequence by confirming linkage over many kilobases while also contributing information towards complete sequence of the genome. This library is also a resource for closing gaps in the sequence.

The M13 clones used in the sequencing project will not be maintained after the end of the project. However, the lambda clones will be maintained as a permanent resource of clones from the genome. This manuscript describes the lambda resource, which is being made available prior to the completion of the sequencing project.

Materials and methods

Genomic DNA from Salmonella typhimurium LT2 strain AZ1516 was partially digested with Sau3A and the 15–20-kb size class was cloned in a lambda DASHII vector. A total of over 2000 clones were examined for overlap by previously described restriction mapping methods or by deriving radiolabeled riboprobes from one end of an insert in a clone and hybridizing this probe to an array of the clones [1]. The preparation of the library and the methods used to order clones are described in more detail elsewhere [1].

Phage were prepared using standard procedures [2]. DNA was purified from each of these bacteriophage using the Bio101 quick spin kit (www.bio101.com, La Jolla, CA). Five micrograms of each DNA was sequenced from both ends using a Li-Cor sequencer (www.licor.com/bio/, Lincoln, NE) and two vector primers, bearing different infrared fluors, located in the T3 and T7 promoters flanking the cloning site. This strategy allowed the sequences from both ends of the clones to be obtained from four lanes on a sequencing gel (www.licor.com/bio/Posters/GenSeq97/GSAabs.htm).

Results and discussion

A total of 416 clones that had minimal or no overlap with each other were selected for sequencing. These clones are estimated to represent well over 90% of the Salmonella genome, after taking overlap into account. An average of about 900 bases of readable sequence was obtained from each successful sequencing reaction. Approximately 600 kb of sequence from 836 reads have been deposited in the genome survey sequence division of the GenBank database (www4.ncbi.nlm.nih.gov/dbGSS/index.html) with accession numbers AF003831AF003833, AF029406AF036003, AF075756AF076018 and AF120033AF120089.

The sequence data are also part of the sequencing project web site (http://genome.wustl.edu/gsc/bacterial/Salmonella.shtml). The latter web site contains a Blast server at http://genome.wustl.edu/gsc/bacterial/bacterial_blast_server.html. This server searches the sequences presented here and the melded M13 sequences from the ongoing sequencing project, currently amounting to over 3 Mb of sequence.

Each sequence from the lambda clones was compared to the complete Escherichia coli K12 genome [3] using BlastN [4] (http://www.ncbi.nlm.nih.gov/BLAST/). Homologous regions between Salmonella and E. coli are generally about 85% identical at the nucleotide level [5]. Thus, a probability threshold of P<e−50 in BlastN was chosen as the definition of putative orthologs because this generally indicated a more than 80% match spanning at least 400 bases. These data are summarized in Table 1.

1
graphic
 
graphic
 
graphic
 
graphic
 
graphic
 
graphic
 
graphic
 
graphic
 
graphic
 
graphic
 
graphic
 
graphic
 
graphic
 
graphic
 
graphic
 
graphic
 
graphic
 
graphic
 

High homology with the E. coli K12 genome was seen for both ends of a clone in 222 cases. Among the clones that matched E. coli at both ends, we determined the insert size in 106 cases. Forty of these 106 insert sizes differed in size by more than 4000 bases when compared to the corresponding apparently orthologous region in E. coli, indicating there may be a relatively large net insertion/deletion event in these clones (marked in bold in column E of Table 1). Nine more clones matched the E. coli K12 genome at both ends, but at very widely divergent positions in the E. coli genome. These clones are marked with an asterisk in column C of Table 1. Some of these clones may represent true rearrangements between the Salmonella and E. coli genomes, whereas others may indicate paralogous comparisons with sequences that are not adjacent in the E. coli genome. One hundred and twenty-nine clones matched E. coli K12 only at one end; 65 clones matched E. coli at neither end.

One hundred and fifty-eight of the 836 sequences were highly homologous or identical to sequences from various Salmonella strains already in the GenBank database (P<e−50 in BlastN), reflecting the amount of sequence already available from Salmonella genomes (Marked in bold italics in columns F and G of Table 1). The 836 S. typhimurium sample sequences were also compared to the rest of the GenBank database and a few sequences shared their best homology with sequences other than E. coli K12 or Salmonella. Homologies with a significance of P<e−9 are indicated in Table 1. Further details of the genes involved are presented in Table 2. In many of these cases, there is a close match with E. coli K12 at one end and a close match with a different genome at the other end of the clone. There are cases where bacteriophage or plasmid sequences are the best homologs in the database for one end of a clone. It is possible that these sequences are from previously unknown extrachromosomal phage or plasmids. They are more likely to be from genes that are integrated in the genome of LT2 (such as the FELS prophage [6, 7]), but are related to genes found on phage or plasmids in other bacteria.

2

Novel S. typhimurium LT2 genome sequences with higher homology to sequences other than the E. coli K12 genome

Clone number Organism with best homology with clone end Accession number Gene BlastN 
 T3 
1049 K. oxytoca AF017781 ddrA, ddrB 0e+0 
B235 Klebsiella pneumoniae L41068 hpaA 1e−138 
A350 E. coli plasmid R100-1 AF005044 traV 6e−88 
968 E. coli M55249 retron Ec67 6e−50 
A78 Klebsiella aerogenes L01114 nac 2e−48 
163 E. coli bacteriophage N15 AF064539 gp13 5e−34 
270 K. pneumoniae U19581 ramA 6e−32 
175 P. putida X58483 hutU 7e−29 
A365 E. coli plasmid pRSD2 U82290 rafY 9e−15 
1234 E. coli plasmid F M59763 traG 5e−13 
A173 Bacteriophage PA-2 J02580 RZ 6e−10 
 T7 
716 Citrobacter freundii D28594 hyaA 1e−149 
560 E. coli Bacteriophage P2 P25479 terminase 1e−91 
B44 P. putida M35140 hutH 4e−83 
741 E. coli hpa gene Z37980 hpaG 1e−64 
B220 E. coli prophage CP4-57 P32053 integrase 7e−63 
178 Klebsiella sp. U32616 asst 1e−45 
1248 E. coli F plasmid M97768 gene 32 1e−32 
419 E. coli Bacteriophage lambda J02459 vhsJ 4e−18 
Clone number Organism with best homology with clone end Accession number Gene BlastN 
 T3 
1049 K. oxytoca AF017781 ddrA, ddrB 0e+0 
B235 Klebsiella pneumoniae L41068 hpaA 1e−138 
A350 E. coli plasmid R100-1 AF005044 traV 6e−88 
968 E. coli M55249 retron Ec67 6e−50 
A78 Klebsiella aerogenes L01114 nac 2e−48 
163 E. coli bacteriophage N15 AF064539 gp13 5e−34 
270 K. pneumoniae U19581 ramA 6e−32 
175 P. putida X58483 hutU 7e−29 
A365 E. coli plasmid pRSD2 U82290 rafY 9e−15 
1234 E. coli plasmid F M59763 traG 5e−13 
A173 Bacteriophage PA-2 J02580 RZ 6e−10 
 T7 
716 Citrobacter freundii D28594 hyaA 1e−149 
560 E. coli Bacteriophage P2 P25479 terminase 1e−91 
B44 P. putida M35140 hutH 4e−83 
741 E. coli hpa gene Z37980 hpaG 1e−64 
B220 E. coli prophage CP4-57 P32053 integrase 7e−63 
178 Klebsiella sp. U32616 asst 1e−45 
1248 E. coli F plasmid M97768 gene 32 1e−32 
419 E. coli Bacteriophage lambda J02459 vhsJ 4e−18 

A BlastX score.

In 836 sequence reads from around the S. typhimurium genome, we detected 259 sequence reads that were not homologous to E. coli K12. This represents about 30% of the sequences. Thus, based on a genome size of about 5 Mb, it is estimated that there may be 1.5 Mb of non-homologous sequences present in S. typhimurium and absent in the E. coli K12 genome. In each case, such genes may have been introduced into Salmonella after divergence from the common ancestor with E. coli, or these genes may have been deleted in the E. coli lineage.

The large number of S. typhimurium sequences that showed little or no homology with the E. coli K12 genome indicate that these two genomes are rather more different than might be suggested by the considerable concordance in their genetic maps [8]. DNA–DNA hybridization studies estimated the amount of non-homologous sequence to be 30–40% of these genomes [8–10], which may more accurately reflect the number of regions in these genomes that do not share homology. The proportion of non-homologous sequences observed in the sample we present here (30%) is similar to these DNA–DNA hybridization estimates and is also similar to the proportion of non-homologous sequences we obtained when we compared sample sequences from Salmonella typhi with the complete E. coli K12 genome (38%) [11]. The difference between the 30% and 38% divergence estimates may be attributed to the different length of the sequence reads in the two studies and the different threshold in BlastN used for scoring a homolog that the difference in sequence length required.

The number of insertion/deletion events that distinguish the S. typhimurium and E. coli genomes must be very high. We noted that 40 clones (38%) of the 106 clones of known size that matched E. coli at both ends showed insertion/deletion events of over 4000 bases (Table 1, column E). These clones represent about 1.8 Mb of the genome (106×17 kb), so by extrapolation, perhaps there are well over 100 insertion/deletion events of over 4000 base pairs (40×5 Mb/1.8 Mb=111). This latter estimate is similar to the estimate we obtained for the S. typhi versus E. coli genome, which was determined using a very different approach: the rate of detection of putative junctions between homologous and unique DNA in a set of sample sequences [11].

The sequences we report in this paper and the associated lambda clones have already proved useful as a source of DNA for complementation studies [12] and are a vital component for completion of the Salmonella typhimurium LT2 genome sequence (http://genome.wustl.edu/gsc/bacterial/salmonella.shtml). We previously published an additional set of restriction mapped clones covering the region from about 4 250 000 to about 4 500 000 in E. coli[1]. End sequences from some of these clones have been obtained and are included in Table 1. The resource of over 2000 lambda clones is deposited at the Salmonella stock center (www.ucalgary.ca/~kesander/intro.html).

Acknowledgements

This work was supported by grants from the United States National Institute of Allergy and Infectious Diseases grants AI-34829 and AI-43283. We thank Ken Sanderson, Rick Wilson, and the bioinformatics staff at the Genome Sequencing Center of Washington University, St. Louis, for many helpful discussions and for maintaining the web sites.

References

[1]
Wong
K.K.
Wong
R.M.
Rudd
K.E.
McClelland
M.
(
1994
)
High-resolution restriction map for a 240-kilobase region spanning 91 to 96 minutes on the Salmonella typhimurium LT2 chromosome
.
J. Bacteriol.
 
176
,
5729
5734
.
[2]
Sambrook
J.
Fritsch
E.F.
Maniatis
T.
(
1998
)
Molecular Cloning, A Laboratory Manual
 .
Cold Spring Harbor Laboratory
,
Cold Spring Harbor, NY
.
[3]
Blattner
F.R.
Plunkett
G.
Bloch
C.A.
Perna
N.T.
Burland
V.
Riley
M.
Collado-Vides
J.
Glasner
J.D.
Rode
C.K.
Mayhew
G.F.
Gregor
J.
Davis
N.W.
Kirkpatrick
H.A.
Goeden
M.A.
Rose
D.J.
Mau
B.
Shao
Y.
(
1997
)
The complete genome sequence of Escherichia coli K-12
.
Science
 
277
,
1453
1474
.
[4]
Altschul
S.F.
Madden
T.L.
Schaffer
A.A.
Zhang
J.
Zhang
Z.
Miller
W.
Lipman
D.J.
(
1997
)
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
.
Nucleic Acids Res.
 
25
,
3389
3402
.
[5]
Sharp
P.M.
(
1991
)
Determinants of DNA sequence divergence between Escherichia coli and Salmonella typhimurium: codon usage, map position, and concerted evolution
.
J. Mol. Evol.
 
33
,
23
33
.
[6]
Affolter
M.
Parent-Vaugeois
C.
Anderson
A.
(
1983
)
Curing and induction of the Fels 1 and Fels 2 prophages in the Ames mutagen tester strains of Salmonella typhimurium
.
Mutat. Res.
 
110
,
243
262
.
[7]
Wong
K.K.
McClelland
M.
(
1992
)
A BlnI restriction map of the Salmonella typhimurium LT2 genome
.
J. Bacteriol.
 
174
,
1656
1661
.
[8]
Riley
M.
Sanderson
K.E.
(
1990
)
Comparative genetics of Escherichia coli and Salmonella typhimurium
. In:
The Bacterial Chromosome
  (
Drlica
K.
Riley
M.
, Eds.), pp.
85
95
.
American Society of Microbiology
,
Washington, DC
.
[9]
Krawiec
S.
Riley
M.
(
1990
)
Organization of the bacterial chromosome
.
Microbiol. Rev.
 
54
,
502
539
.
[10]
Brenner
D.J.
(
1984
)
Enterobacteriacea
. In:
Bergey's Manual of Systematic Bacteriology
  (
Krieg
N.R.
Holt
J.G.
, Eds.), pp.
408
420
.
Williams and Wilkins
,
Baltimore, MD
.
[11]
McClelland
M.
Wilson
R.K.
(
1998
)
Comparison of sample sequences of the Salmonella typhi genome to the sequence of the complete Escherichia coli K-12 genome
.
Infect. Immun.
 
66
,
4305
4312
.
[12]
Wong
K.K.
McClelland
M.
Stillwell
L.C.
Sisk
E.C.
Thurston
S.J.
Saffer
J.D.
(
1998
)
Identification and sequence analysis of a 27-kilobase chromosomal fragment containing a Salmonella pathogenicity island located at 92 minutes on the chromosome map of Salmonella enterica serovar typhimurium LT2
.
Infect. Immun.
 
66
,
3365
3371
.

Author notes

1
Present address: Molecular Biosciences, Pacific Northwest National Laboratory, Richland, WA 99352, USA.