Abstract

The idea that Escherichia coli gained the lac operon via horizontal transfer, allowing it to invade a new niche and form a new species, has become a paradigmatic example of bacterial nonpathogenic adaptation and speciation catalyzed by horizontal transfer. Surprisingly, empirical evidence for this event is essentially nonexistent. To see whether horizontal transfer occurred, I compared a phylogeny of 14 Enterobacteriaceae based on two housekeeping genes to a phylogeny of a part of their lac operon. Although several species in this clade appear to have acquired some or all of the operon via horizontal transfer, there is no evidence of horizontal transfer into E. coli. It is not clear whether the horizontal transfer events for which there is evidence were adaptive because those species which have acquired the operon are not thought to live in high lactose environments. I propose that vertical transmission from the common ancestor of the Enterobacteriaceae, with subsequent loss of these genes in many species can explain much of the patchy distribution of lactose use in this clade. Finally, I argue that we need new, well-supported examples of horizontal transfer spurring niche expansion and speciation, particularly in nonpathogenic cases, before we can accept claims that horizontal transfer is a hallmark of bacterial adaptation.

Introduction

If we are to understand horizontal transfer as a mechanism of bacterial adaptation, rather than only genome evolution, we need case studies of adaptive horizontal transfer events. A number of authors have proposed that the lac operon of Escherichia coli has been acquired by horizontal transfer (Starlinger 1977; Riley and Anilionis 1980; Buvinger et al. 1984; Ochman, Lawrence, and Groisman 2000; Lawrence 2001). Lactose use by E. coli could be a useful case study of adaptation via horizontal transfer because we understand a great deal about the genetics and biochemistry of the enzymes responsible for metabolism of this milk sugar. The lac operon codes for three enzymes: β-galactosidase, encoded by lacZ, which splits lactose into glucose and galactose (Miller and Reznikoff 1978); lactose permease, encoded by lacY, which transports lactose into the cytoplasm (Miller and Reznikoff 1978); and a transacetylase, encoded by lacA, which acetylates thiogalctosides and may serve as a detoxifying enzyme (Andrews and Lin 1976); D. Dykhuizen, personal communication). The genes for these three enzymes are physically adjacent on the chromosome and are transcribed as a single message. Located adjacent to lacZYA, but transcribed as a separate message, is the regulator of expression of the operon, lacI.

Lactose use by E. coli has become a paradigm of horizontal transfer leading to nonpathogenic adaptation and speciation because we also have an understanding of the ecology of E. coli (Lawrence and Roth 1996; Lawrence and Ochman 1998; Ochman, Lawrence, and Groisman 2000; Lawrence 2001). The importance of the supposed transfer event lies in the supposition that, for bacteria, metabolism is ecology (Lawrence 2001), in that the ability to degrade or synthesize specific compounds can limit the distribution and abundance of bacterial species. Ochman, Lawrence, and Groisman (2000) write that “E. coli, by acquiring the lac operon, gained the ability to use the milk sugar lactose as a carbon source and to explore a new niche, the mammalian colon, where it established a commensal relationship.” In contrast, a close relative of E. coli, Salmonella cannot use lactose and lives in reptiles or as a pathogen in mammals. The hypothesis of Ochman, Lawrence, and Groisman (2000) links a single horizontal transfer event to the acquisition of novel metabolic properties, allowing the expansion of a niche and consequent speciation.

The reality of this transfer event should be questioned, however, as the data said to support E. coli's gain of lac are ambiguous. For example, the fact that S. enterica, the supposed sister group of E. coli, does not have a lac operon and does not ferment lactose (fig. 1) is consistent with a horizontal transfer event, but the pattern could also be due to loss of the lac operon in the Salmonella lineage. (It is also worth noting that lactose fermenting E. fergusonii, rather than any Salmonella sp., is E. coli's sister species [Lawrence, Ochman, and Hartl (1991)].)

FIG. 1.—

The phylogenetic relationships and lactose utilization phenotypes of a six enteric bacteria. Redrawn from figure 1 in Ochman, Lawrence and Groisman (2000).

FIG. 1.—

The phylogenetic relationships and lactose utilization phenotypes of a six enteric bacteria. Redrawn from figure 1 in Ochman, Lawrence and Groisman (2000).

An early mention of horizontal transfer of the lac operon was made by Starlinger (1977), who noted: “The finding of two copies of IS3 in inverted order on both sides of the lac operon in E. coli might indicate that the lac operon was inserted into the chromosome of an ancestor by a transposition event …” (p. 116). A few years later, Riley and Antonillis (1980) used quantitative Southern blots to compare the similarity of regions of the E. coli genome to several other species of enteric bacteria. These authors found that the lac region of enteric genomes did not have the same pattern of binding as the other genomic regions they tested, and they argued that lac had therefore evolved via horizontal transfer. This DNA hybridization approach, which attempts to use degree of similarity as a proxy for relatedness, has the same problem as assessing relatedness by BLAST search: it can be led astray by heterogeneity in the rates of molecular evolution in different lineages (Eisen 1998; Koski and Golding 2001).

The horizontal transfer of lac was revisited with DNA sequence data by Buvinger et al. (1984), who sequenced 1,430 bp 3′ of the lac operon and found a terminal repeat similar to those in the transposon Tn5, suggestive of a past horizontal transfer. This 9-bp motif may be the signature of a transposon, but no other such repeat has been located 5′ of the lac operon. The occurrence of one 9-bp repeat is at best marginal evidence that there was ever a transposon located in this region of the E. coli genome, let alone that a transposon brought with it the lac operon. In their survey of the complete genome of E. coli K-12, Lawrence and Ochman (1998) classified the lac operon as horizontally transferred but did not indicate the basis on which they made that determination.

The acquisition of the lac operon by E. coli could be an excellent case study of adaptive horizontal transfer, not only because we understand details of the molecular genetics and physiology of the operon but also because we can use this knowledge to make predictions about the ecological and evolutionary impacts of the acquisition of the operon. Given the tenuous nature of the evidence for this event, I attempted to rigorously test whether E. coli gained the lac operon via horizontal transfer. I used a phylogenetic approach, as methods using genome sequence information like G + C content or codon bias have been criticized for their poor performance (Guindon and Perriere 2001; Koski, Morton, and Golding 2001). Phylogenetic methods have recently been used successfully to verify the vertical transmission of a group of shared genes in the Enterobacteriaceae (Daubin, Moran, and Ochman 2003; Lerat, Daubin, and Moran 2003). In this article, I compare the phylogenies inferred from part of the lac operon of 14 Enterobacteriaceae to the phylogeny inferred from two housekeeping genes, which are likely to be consistent markers of organismal phylogeny. If a horizontal transfer event has occurred, the phylogeny of lac will differ from the phylogeny of most of the genome. The entire operon is over 7000 bp long, so only pieces of lacZ and lacY were sequenced.

Materials and Methods

Strains

The 14 strains used in this study, listed in table 1, all contain at least part of the lac operon. Although initially included, strains of E. blattae and Serratia odorifera were ultimately excluded from the analysis because no region of lac could be amplified from them.

Table 1

Species Used in the Study


Speciesa
 

Strain
 

Has lacY?
 

Source
 
Serratia sp.? MF 426 Yes Isolated by M. Feldgarden 
Serratia sp.? MF 416 Yes Isolated by M. Feldgarden 
Enterobacter cloacae E482b No J. Lawrence 
Enterobacter cloacae10.8–42c No A. Bronikowski 
Escherichia vulneris (type I) ATCC 29943b No J. Lawrence 
Escherichia vulneris (type II) ATCC 33821b No J. Lawrence 
Escherichia hermannii ATCC 33650b No J. Lawrence 
Citrobacter freundii OS60b Yes J. Lawrence 
Citrobacter freundiiMF 466 Yes Isolated by M. Feldgarden 
Salmonella group IIIb CDC 156–87d Yes Salmonella Genetic Stock Centre 
Escherichia fergusonii ATCC 35469b No J. Lawrence 
Escherichia coli MG1655e Yes Complete genome sequence 
Yersinia pestis KIMf No Complete genome sequence 
Klebsiella pneumoniae
 
MGH 78578g
 
Yes
 
Complete genome sequence
 

Speciesa
 

Strain
 

Has lacY?
 

Source
 
Serratia sp.? MF 426 Yes Isolated by M. Feldgarden 
Serratia sp.? MF 416 Yes Isolated by M. Feldgarden 
Enterobacter cloacae E482b No J. Lawrence 
Enterobacter cloacae10.8–42c No A. Bronikowski 
Escherichia vulneris (type I) ATCC 29943b No J. Lawrence 
Escherichia vulneris (type II) ATCC 33821b No J. Lawrence 
Escherichia hermannii ATCC 33650b No J. Lawrence 
Citrobacter freundii OS60b Yes J. Lawrence 
Citrobacter freundiiMF 466 Yes Isolated by M. Feldgarden 
Salmonella group IIIb CDC 156–87d Yes Salmonella Genetic Stock Centre 
Escherichia fergusonii ATCC 35469b No J. Lawrence 
Escherichia coli MG1655e Yes Complete genome sequence 
Yersinia pestis KIMf No Complete genome sequence 
Klebsiella pneumoniae
 
MGH 78578g
 
Yes
 
Complete genome sequence
 
a

Species with a question mark were identified on the basis of the similarities of their sequences of ompA, gap, and gyrB to sequences in GenBank.

Sequencing

A species phylogeny was necessary to test the horizontal transfer of the lac operon. The relationships of the strains in this study were inferred with ompA and gap; these genes have been used successfully for systematic studies in this group (Lawrence, Ochman, and Hartl 1991). Sequences of gap and ompA were already available for those species in table 1 which were provided by J. Lawrence. The polymerase chain reaction (PCR) primers previously described by Lawrence, Ochman and Hartl (1991) were used to sequence gap and ompA from the remaining species. The PCR conditions for ompA were 2 min denaturation at 94°C, followed by 35 cycles of 94°C for 30 s, 40°C for 60 s, and 72°C for 45 s, and concluded with 10 min of extension at 72°C. When products of several sizes occurred, a stab of the band of proper size was taken, melted in water, and used as template for another reaction. The conditions for gap were 2 min denaturation at 94°C, followed by 35 cycles of 94°C for 10 s, 50°C for 30 s, and 72°C for 30 s, and concluded with 10 min of extension at 72°C. Both gap and ompA were sequenced directly. The ompA data set was 692 bp, and the gap data set was 849 bp. I also concatenated the two sequences, for a data set of 1541 bp.

To amplify lacZ and lacY, I used a two-part strategy. First, I designed primers from a published sequence to amplify pieces of lacZ and lacY. Although I amplified lacZ for all of the strains used in this study, I was not able to amplify lacY from six of them, as noted in table 1. I could not amplify a continuous lacZY region from any strain with these primers, presumably because the primers' imperfect matches prevented the 3.5-kb reaction. To amplify this large region from those species from which I amplified both lacZ and lacY, I designed species-specific primers from the lacZ and lacY sequences. All primers are listed in item 1 of the Supplementary Material online.

The reaction conditions for lacZ and lacY were 1 U Taq polymerase, 20 mM Tris-HCl, 50 mM KCl, 1.5 mM MgCl2, each primer at 1 mM, and each nucleotide at 0.2 mM. The primers to amplify approximately 1.2 kb of lacZ were lacZ2 and either lacZ6 or lacZ7. The reaction conditions were initial denaturation of 2 min at 94°C, followed by 35 cycles of 94°C for 30 s, 50°C for 60 s, and 72°C for 45 s, and concluded with 10 min of extension at 72°C. To amplify the small piece of lacY, primers lacY2 and lacY5 were used. The reaction conditions were as for lacZ above, but with annealing at 45°C. These PCR products were then directly sequenced. To amplify the 3.5-kb continuous fragment of lacZY, 1.25 U of the proofreading polymerase Platinum Pfx (Invitrogen), 2× amplification buffer, 1 mM MgSO4, 0.2 mM of each nucleotide, and each primer at 1 mM were used. The reaction conditions were 5 min of denaturation at 94°C, followed by 35 cycles of 94°C for 15 s, 50°C for 30 s, and 68°C for 3 min, finishing with extension at 68°C for 10 min. The reactions gave products of several sizes, so the proper sized band was gel purified and ligated into plasmid pCR (Invitrogen). Because I never observed polymorphisms in the directly sequenced lacZ and lacY PCR products, and because the other bands were at least 50% larger or smaller than the expected size, I assumed that the other bands were spurious products rather than duplicated copies of lacZY which varied in length from the published sequence. Sequencing of both PCR products and plasmids was performed by the Stony Brook University DNA sequencing facility. The GenBank accession numbers for the sequences are AY743917AY743920 and AY746943AY746962.

Phylogenetic Analysis

The sequences were aligned using Clustal X 1.82 (Thompson et al. 1997) with default settings, and further refined by eye to preserve codon boundaries. For the lacZY data set, the region between the stop codon of lacZ and the start codon of lacY was excluded, as the length of this spacer ranged from 52 bp in E. coli to 210 bp in Serratia sp. MF 416 and could not be aligned unambiguously. PAUP* 4.0b10 (Swofford 2003) was used to infer the phylogeny for each data set. For each data set, Modeltest 3.4 (Posada and Crandall 1998) was used to select the appropriate model of molecular evolution. Modeltest implements both the Akaike Information Criteria and the hierarchical likelihood ratio test for model selection. When the two criteria differed in the models chosen, I selected the model with fewer parameters (which was always the model chosen by the hierarchical likelihood ratio test). I used this model to find the maximum likelihood (ML) tree, and the Neighbor-Joining (NJ) tree from the ML distances. I also built trees using unweighted maximum parsimony (MP). For all data sets, a heuristic search of 1000 bootstrap replicates was used to assess uncertainty in the branching patterns. I rooted trees with Yersinia pestis for the ompA, gap and lacZ. lacY is not present in the Y. pestis genome, so I used Serratia sp. MF 416 as the outgroup for the lacZY tree, as this species was the outgroup to the other species in the lacZ tree. Rooting trees is tenuous when horizontal transfer may have occurred, as an outgroup species may not have the most distantly related copy of a gene. Concordance between the trees suggests that the rooting has not been affected by horizontal transfer.

To test for the significance of differences in likelihoods between trees, I used the PAUP* 4.0b10 implementation of the Shimodaira-Hasegawa (SH) test (Shimodaira and Hasegawa 1999), with the RELL approximation with 1000 bootstrap replicates. This test places a confidence interval on the likelihood of the ML tree, taking into account the multiple comparisons inherent in comparing a ML tree to other phylogenetic trees. Likelihoods of reversible models (all of those considered here) are calculated on unrooted trees, so the SH test is unaffected by the choice of outgroup. Testing for the support of individual differences was a three-step process of generating a constrained tree with regard to the taxa of interest, and comparing it to the unconstrained tree. First, a constraint was made forcing the taxa of interest to have the sister relationship inferred from the phylogeny of housekeeping genes. Second, the most likely tree, consistent with the constraint, was found for the data set being tested. Finally, all constrained trees were tested simultaneously against the ML tree. This procedure tests the null hypothesis that the placement of a branch in the ML tree is not actually different from the placement in the species phylogeny. If there is no true difference, then constraining the branch to its location in the species tree will not significantly lower the likelihood of the phylogeny.

Results

Employing a phylogenetic approach to the identification of horizontally transferred genes requires comparing the phylogeny of the gene of interest to the phylogeny of the species. In this study, the species phylogeny was inferred from ompA and gap. The ML phylogenies of gap and ompA are nearly identical, differing only in the placement of E. hermannii and E. vulneris type II (fig. 2). The branches that differ have very weak bootstrap support in the gap tree, suggesting that there may not be actual differences between the phylogenies of these two genes. The MP and NJ bootstrap consensus trees of ompA and gap (as well as those of lac) did not differ in topology from their respective ML trees, although they were sometimes less resolved (data not shown); therefore only the ML trees are presented here. A tree based on the concatenation of ompA and gap has the same topology as the ompA tree (data not shown). To directly test whether the phylogenies of ompA and gap were significantly different from each other, I employed the SH test (Shimodaira and Hasegawa 1999). The two trees do not have significantly different likelihoods on the gap data set (P = 0.098), but when tested on the ompA data, the ompA tree was significantly more likely than the gap tree (P = 0.009). This suggests that the ompA tree can explain both data sets, and it was used as the best estimate of the true phylogeny of the strains. To conserve statistical power, I did not include further tests based on the gap tree.

FIG. 2.—

The ML trees inferred from the (A) ompA and (B) gap data sets. Numbers on branches are bootstrap proportions. Gray branches indicate taxa whose placement differs between the two trees.

FIG. 2.—

The ML trees inferred from the (A) ompA and (B) gap data sets. Numbers on branches are bootstrap proportions. Gray branches indicate taxa whose placement differs between the two trees.

The ML phylogeny of lacZ was inferred using a data set of 1289 bp from the 14 strains used in this study (fig 3A). The phylogeny has 15 operational taxonomic units (OTUs) because the Klebsiella pneumoniae genome has two copies of lacZ. One copy is 97% similar to that of E. coli K-12 and is likely a recent transfer from E. coli into the genome of this strain of K. pneumoniae. This level of divergence is smaller than the difference between the two Citrobacter freundii sequences (91% similar) and is similar to the divergence in lac of the sequenced E. coli genomes (data not shown). In addition to this clear horizontal transfer event, there are a number of differences between the lacZ phylogeny and both the ompA and the gap phylogenies. Both E. vulneris type II and Serratia sp. MF 426 group with the C. freundii strains, and E. hermannii groups with E. vulneris type I and K. pneumoniae. Finally, E. coli, E. fergusonii, and Salmonella group IIIb are paraphyletic in the lacZ tree, rather than forming a clade as in ompA. The branches which differ in placement between the ompA and lacZ phylogenies are in gray on the lacZ tree, and they represent potential horizontal transfer events. Alternatively, they could differ in placement as a result of undetected paraology or lineage sorting. I never observed double peaks in the sequenced PCR products of lacZ or lacY, suggesting that there are not multiple copies of lacZY in the genomes of these strains. Lineage sorting is also an unlikely mechanism to explain topological differences, as it should produce small rearrangements rather than the distant rearrangements seen here. For these reasons, I consider the differences to be potential horizontal transfer events.

FIG. 3.—

The ML tree inferred from (A) the 1289-bp lacZ data set; and (B) the 3480-bp lacZY data set. Numbers on branches are bootstrap proportions. Gray branches indicate taxa whose placement differs between that tree and the ompA tree. Daggers in (A) indicate those taxa from which lacY was not amplified.

FIG. 3.—

The ML tree inferred from (A) the 1289-bp lacZ data set; and (B) the 3480-bp lacZY data set. Numbers on branches are bootstrap proportions. Gray branches indicate taxa whose placement differs between that tree and the ompA tree. Daggers in (A) indicate those taxa from which lacY was not amplified.

A summary of the results of SH tests of differences between phylogenies is presented in table 2. The ompA phylogeny (fig. 2A) is significantly less likely than the lacZ phylogeny (fig. 3A) for the lacZ data set (P < 0.001). A significant difference between the lacZ and ompA phylogenies does not imply that all of the differences are themselves significant, only that the total topologies differ. To test for the support of individual differences, I found the most likely tree constrained to have the sister relationship indicated by the ompA tree. For example, to test the significance of the difference in position of Serratia sp. MF 426, I found the ML tree which placed Serratia sp. MF 416 and Serratia sp. MF 426 together. The direction of transfer is not immediately obvious for some transfer events. For example, on the one hand, lacZ may have been transferred from the ancestor of K. pneumoniae and E. vulneris type into E. hermannii, or it may have moved from the E. hermannii lineage into the common ancestor of K. pneumoniae and E. vulneris type I. I considered both of these hypotheses in the SH test. On the other hand, assuming Serratia sp. MF 426 or E. vulneris type II to have a vertically transmitted copy required multiple transfers into other taxa, so I only considered these species as the recipients of transfer events on the basis of parsimony. All of the constrained trees were tested against the most likely tree with the SH test. The constrained trees for Serratia sp. MF 426 (P = 0.036), E. vulneris type II (P < 0.001), E. hermannii (P < 0.001), and K. pneumoniae/E. vulneris type I (P < 0.001) were all significantly less likely than the unconstrained tree (i.e., fig. 3A). This reduction in the likelihood of the tree indicates that the position of this particular branch in the lacZ tree is supported over its position in the ompA tree. The positions of these OTUs are consistent with horizontal transfer.

Table 2

Summary of Shimodaira-Hasegawa Tests of Potential Horizontal Transfer Events


Test of
 

Data Set
 

Δ lnL
 

P Value
 
lacZ vs.    
    ompAa lacZ 196.95329 P < 0.001 
    Serratia sp. MF 426b lacZ 50.08307 P = 0.036 
    E. hermanniib lacZ 132.82133 P < 0.001 
    K. pnumoniae/ E. vulneris type Ib lacZ 64.25714 P < 0.001 
    E. vulneris type IIb lacZ 119.69421 P < 0.001 
    E. coli, E. fergusonii, Salmonella group IIIb monophyletic lacZ 7.99068 P = 0.681 
lacZY vs.    
    ompAa lacZY 21.72474 P = 0.02 
    Serratia sp. MF 426b lacZY 15.71165 P = 0.071 
    E. coli, Salmonella group IIIb monophyletic
 
lacZY
 
8.88639
 
P = 0.212
 

Test of
 

Data Set
 

Δ lnL
 

P Value
 
lacZ vs.    
    ompAa lacZ 196.95329 P < 0.001 
    Serratia sp. MF 426b lacZ 50.08307 P = 0.036 
    E. hermanniib lacZ 132.82133 P < 0.001 
    K. pnumoniae/ E. vulneris type Ib lacZ 64.25714 P < 0.001 
    E. vulneris type IIb lacZ 119.69421 P < 0.001 
    E. coli, E. fergusonii, Salmonella group IIIb monophyletic lacZ 7.99068 P = 0.681 
lacZY vs.    
    ompAa lacZY 21.72474 P = 0.02 
    Serratia sp. MF 426b lacZY 15.71165 P = 0.071 
    E. coli, Salmonella group IIIb monophyletic
 
lacZY
 
8.88639
 
P = 0.212
 

NOTE.—Significant P values are consistent with horizontal transfer.

a

This test compares the likelihoods of ompA and ML trees for the specified data set.

b

This test compares the likelihood of the unconstrained tree for the specfied data set against the tree with the species listed constrained to the sister relationship in the ompA tree.

The impetus for this project was the supposition that E. coli gained the lac operon via horizontal transfer, and the most likely phylogeny of the lacZ data does show that the placement of E. coli, E. fergusonii, and Salmonella are not as expected if they had been vertically transmitted. However, the ML tree is not significantly more likely than a tree with these three taxa constrained to their relationships in the ompA tree (P = 0.681). This means that the placement of these OTUs is consistent with vertical transmission.

Lack of support for the paraphyly of these taxa could reflect that they have actually evolved vertically, or it could be that the paraphyly is correct but the test failed to reject the null hypothesis (that is, type II error). The SH test is known to be quite conservative, so I attempted to bring more data to bear on this question. For those species from which part of lacY was amplified, the most likely tree for the 3480-bp data set is shown in figure 3B. As in the previous analysis, Serratia sp. MF 426 groups with C. freundii rather than with the other Serratia sp., and E. coli and Salmonella are paraphyletic. (lacY was not amplified from E. fergusonii, so this species is not in this analysis.) In the same fashion as the lacZ analysis, I found the ML tree which placed Serratia sp. MF 416 and Serratia sp. MF 426 together, and the ML tree placing E. coli and Salmonella sp. group IIIb together. I then compared these two trees and the ompA tree to the unconstrained tree (fig. 3B) with the SH test. Although there are significant differences between the ompA tree and the unconstrained tree (P = 0.02), neither the placement of MF 426 (P = 0.071) nor that of E. coli and Salmonella (P = 0.212) is significantly different from placement in the ML tree.

Discussion

Horizontal Transfer

There are four major differences between the phylogeny of lacZ and ompA: Serratia sp. MF 426, E. vulneris type II, and either E. hermannii or K. pneumoniae and E. vulneris type I are in different positions on the lacZ tree, and E. coli, E. fergusonii, and Salmonella group IIIb are paraphyletic, rather than their inferred monophyletic relationship. Perversely, the only one of these differences that is not statistically supported, the non-monophyly of E. coli, E. fergusonii, and Salmonella group IIIb, was the event which was a priori supposed to have occurred. There is no phylogenetic support for the horizontal transfer of the lac operon into E. coli. It is possible, however, that the rest of the operon has a more complex history than that of lacZY; this should be investigated in future work.

Analysis of the longer lacZY sequences from a subset of species reveals a pattern similar to the lacZ data, which were collected from more species. The lacZY tree differs significantly from the ompA phylogeny, although neither the placement of Salmonella sp. group IIIb nor that of Serratia sp. MF 426 was individually significant. The lack of significance of either individual transfer is somewhat surprising, as the lacZY data set contained over 2000 more base pairs than the lacZ data, even though it contained fewer species. A better understanding of how the number of taxa and the length of sequences affects the SH test would help direct the design of future studies.

Neither the lacZ nor the lacZY analyses supports the idea that E. coli gained the operon through horizontal transfer. This is a setback for studies of the role of horizontal transfer in niche expansion and speciation. Plausible hypotheses existed regarding the adaptive nature of a gain of the lac operon by E. coli, but we know little about ecology of the species that have actually gained lac via horizontal transfer. These strains might not even use their lac gene products to metabolize lactose. For example, Serratia sp. MF 426 was isolated from a turtle (M. Feldgarden, pers. comm.), animals that neither produce nor consume lactose. E. vulneris and E. hermannii have predominately been isolated from wounds (Brenner et al. 1982a, 1982b), which are not high lactose environments. K. pneumoniae is found in the intestinal tract of mammals, but it is also found in soil and water (Grimont, Grimont, and Richard 1992). Glycerol-galactoside (another β-galactoside) is found in chloroplasts, and it may be a major substrate for the lac operon of many species (Egel 1979; Boos 1982). Without knowledge of the ecology of most Enterobacteriaceae, we do not know what role the lac operon may have played in adaptation of these other species.

Gene Loss

If Salmonella form a clade, and strains in group IIIb retain an ancestral copy (as shown here), then most Salmonella must have lost the operon. The loss of the lac operon in much of the Salmonella lineage may be part of a common pattern. lacZ could not be amplified from E. blattae and S. odorifera when they were screened for this study. Likewise, β-galactosidase activity (the product of lacZ) is a very labile character in the Enterobacteriaceae (Holt and Kreig 1984, p. 414), potentially reflecting multiple losses of this gene. Only a subset of the strains in this study also have lacY, and this gene was not found in strains from which I could not amplify lacZ. Although the presence or absence of a PCR product is not a definitive test of the presence or absence of a gene, these data suggest that there are three basic lac genotypes of Enterobacteriaceae: those with both lacZ and lacY, those with only lacZ, and those with neither.

A similar pattern exists in the four named species of Shigella, which are actually strains of E. coli that are pathogenic specialists. Different authors have contended that the clones either originated independently (Pupo, Lan, and Reeves 2000) or via a single event (Escobar-Paramo et al. 2003), yet all of the clones have converged on similar lac phenotypes. All Shigella use lactose poorly, but different species seem to have a different genetic basis for this trait. S. flexneri and S. boydii have neither lacZ nor lacY, S. dysenterii has lacZ but not lacY, and S. sonneii has both genes, but lacY is nonfunctional (Ito et al. 1991). It appears the loss of lacY function has played a major role in the convergent evolution of poor lactose fermentation in Shigella. This might be due to genetic drift of nonfunctional lacY alleles. It is also possible that there is a selective benefit to deleting lacY in some environments. The mechanistic cause of this selection is unclear, but it is not simply the energetic cost of production: LacZ is three times as large as LacY. If there is an energetic cost of production of these proteins, it must be accompanied by selection to retain LacZ.

The Dynamics of Deletion and Horizontal Acquisition

Recent work has shown that rates of homologous horizontal transfer are very low, particularly in the Enterobacteriaceae (Daubin, Moran, and Ochman 2003; Lerat, Daubin, and Moran 2003). If the horizontal transfer events found in this study occurred by homologous replacement, then lac would exhibit an extremely rapid rate of transfer. The loss of the operon in most Salmonella and Shigella species suggests, instead, a more dynamic process of horizontal transfer. I suggest that some lineages lose the operon through selection for loss or through drift. Descendants can regain the operon via illegitimate recombination, and these individuals may persist if ecological pressures favor the ability to metabolize β-galactosides. For example, the lineage leading to Serratia sp. MF 426 may have lost the operon and regained it from a C. freundii-like species, rather than having a vertically transmitted copy replaced by a horizontally transferred one. It might be possible to test the hypothesis of deletion and illegitimate recombination by adding species to the phylogeny which lack the lac operon, and reconstructing the character state of the lac operon at each node as vertically inherited, deleted, or horizontally acquired. If these predictions are correct, Serratia sp. MF 426 should be found to nest within a group of Serratia strains which lack the operon, rather than those, like Serratia sp. MF 416, which seem to have retained the operon through vertical transmission.

Caveats

The results of this phylogenetic analysis show that there are significant differences between the phylogenies of lacZ and those of the housekeeping genes ompA and gap. I have interpreted this divergence as potential horizontal transfer, and used the SH test to determine which transfers are supported by the data. Thus, the conclusions of this article are contingent on ML in general, and particularly on the SH test. If the model of evolution employed in this study is not a reasonable characterization of the actual evolutionary process, then the SH test will have given incorrect P values. The best fit model of evolution was frequently the most complex model I examined (GTR + I + Γ), and it is possible that an even more complex model would have been justified by the data. The effect of model misspecification on the SH test seems to be poorly understood; simulation studies of this problem might prove enlightening. Model misspecification is a potential problem in any statistical analysis, and I have taken steps to avoid this problems by statistically justifying the model of evolution, rather than choosing it arbitrarily.

A major conclusion of this article is that there is no evidence of horizontal transfer of the lac operon into E. coli. While I found a difference in the placements of E. coli, E. fergusonii, and Salmonella group IIIb, this difference was not statistically significant. As in any statistical analysis, the variation may be due to type II error, the lack of power to reject a true difference. The SH test was able to detect several significant differences between the trees, so this method can detect horizontal transfer events. Given the absence of other evidence of the horizontal transfer of the lac operon into E. coli, the null hypothesis of vertical transmission should not be rejected. A power analysis of the SH test would help future workers to interpret the implications of negative results.

Conclusions

To understand the role of horizontal gene transfer in nonpathogenic adaptation, we need cases that link the functional characteristics of the acquired genes to knowledge about the ecology of those species which have acquired those genes. The lac operon of E. coli could have played this role, as its function is well characterized, and there are hypotheses about how E. coli uses it to relate to the environment. Unfortunately, there is no evidence that E. coli gained the lac operon via horizontal transfer. Instead, it appears to have been transmitted vertically in many Enterobacteriaceae, including E. coli, to have lost at least a few lineages, and to have gained via horizontal transfer in others. The lac operon can still play an important role in studies of the role that gene loss plays in adaptation, but to connect horizontal transfer to nonpathogenic adaptation, we need well-supported cases involving ecologically understood traits.

Laura Katz, Associate Editor

I thank J. Lawrence and A. Bronikowski for some of the strains used in this study, T. Engstrom for extensive conversations about the SH test, and D. Brisson, D. Dykhuizen, M. Feldgarden, R. Geeta, L. Katz, M. Last, T. Merritt, S. Smith, L. Weintraub, and three anonymous reviewers for many helpful comments on the manuscript. Financial support was provided by the National Institutes of Health under NIH grants GM6073102 and GM6380001 to D.D., by a Stony Brook Graduate Council Fellowship, and by a National Science Foundation NSF Pre-Doctoral Fellowship. This is contribution 1132 in Ecology and Evolution from Stony Brook University.

References

Andrews, K. J., and E. C. Lin.
1976
. Thiogalactoside transacetylase of the lactose operon as an enzyme for detoxification.
J. Bacteriol.
 
128
:
510
–513.
Blattner, F. R., G. Plunkett, 3rd, C. A. Bloch, N. T. Perna, V. Burland, M. Riley, J. Collado-Vides, J. D. Glasner, C. K. Rode, G. F. Mayhew et al.
1997
. The complete genome sequence of Escherichia coli K-12.
Science
 
277
:
1453
–1474.
Boos, W.
1982
. Synthesis of (2R)-glycerol-o-beta-D-galactopyranoside by beta-galactosidase.
Methods Enzymol.
 
89
(Pt D):
59
–64.
Boyd, E. F., F. S. Wang, T. S. Whittam, and R. K. Selander.
1996
. Molecular genetic relationships of the salmonellae.
Appl. Environ. Microbiol.
 
62
:
804
–808.
Brenner, D. J., B. R. Davis, A. G. Steigerwalt, C. F. Riddle, A. C. McWhorter, S. D. Allen, J. J. Farmer, 3rd, Y. Saitoh, and G. R. Fanning.
1982
a. Atypical biogroups of Escherichia coli found in clinical specimens and description of Escherichia hermannii sp. nov.
J. Clin. Microbiol.
 
15
:
703
–713.
Brenner, D. J., A. C. McWhorter, J. K. Knutson, and A. G. Steigerwalt.
1982
b. Escherichia vulneris: a new species of Enterobacteriaceae associated with human wounds.
J. Clin. Microbiol.
 
15
:
1133
–1140.
Bronikowski, A. M., A. F. Bennett, and R. E. Lenski.
2001
. Evolutionary Adaptation to temperature. VIII. Effects of temperature on growth rate in natural isolates of Escherichia coli and Salmonella enterica from different thermal environments.
Evolution
 
55
:
33
–40.
Buvinger, W. E., K. A. Lampel, R. J. Bojanowski, and M. Riley.
1984
. Location and analysis of nucleotide sequences at one end of a putative lac transposon in the Escherichia coli chromosome.
J. Bacteriol.
 
159
:
618
–623.
Daubin, V., N. A. Moran, and H. Ochman.
2003
. Phylogenetics and the cohesion of bacterial genomes.
Science
 
301
:
829
–832.
Deng, W., V. Burland, G. Plunkett, 3rd, A. Boutin, G. F. Mayhew, P. Liss, N. T. Perna, D. J. Rose, B. Mau, S. Zhou, D. C. et al.
2002
. Genome sequence of Yersinia pestis KIM.
J. Bacteriol.
 
184
:
4601
–4611.
Egel, R.
1979
. The lac-operon for lactose degradation, or rather for the utilization of galactosylglycerols from galactolipids?
J. Theor. Biol.
 
79
:
117
–119.
Eisen, J. A.
1998
. Phylogenomics: improving functional predictions for uncharacterized genes by evolutionary analysis.
Genome Res.
 
8
:
163
–167.
Escobar-Paramo, P., C. Giudicelli, C. Parsot, and E. Denamur.
2003
. The evolutionary history of Shigella and enteroinvasive Escherichia coli revised.
J. Mol. Evol.
 
57
:
140
–148.
Grimont, F., P. A. D. Grimont, and C. Richard.
1992
. The Genus Klebsiella. Pp. 2775–2796 in A. Balows, H. G. Truper, M. Dworkin, W. Harder, and K.-H. Schleifer, eds. The prokaryotes. Springer-Verlag, New York.
Guindon, S., and G. Perriere.
2001
. Intragenomic base content variation is a potential source of biases when searching for horizontally transferred genes.
Mol. Biol. Evol.
 
18
:
1838
–1840.
Holt, J. G., and N. R. Kreig.
1984
. Bergey's manual of systematic bacteriology. Lippincott, Williams & Wilkins, Baltimore.
Ito, H., N. Kido, Y. Arakawa, M. Ohta, T. Sugiyama, and N. Kato.
1991
. Possible mechanisms underlying the slow lactose fermentation phenotype in Shigella spp.
Appl. Environ. Microbiol.
 
57
:
2912
–2917.
Koski, L. B., and G. B. Golding.
2001
. The closest BLAST hit is often not the nearest neighbor.
J. Mol. Evol.
 
52
:
540
–542.
Koski, L. B., R. A. Morton, and G. B. Golding.
2001
. Codon bias and base composition are poor indicators of horizontally transferred genes.
Mol. Biol. Evol.
 
18
:
404
–412.
Lawrence, J. G.
2001
. Catalyzing bacterial speciation: correlating lateral transfer with genetic headroom.
Syst. Biol.
 
50
:
479
–496.
Lawrence, J. G., and H. Ochman.
1998
. Molecular archaeology of the Escherichia coli genome.
Proc. Natl. Acad. Sci. USA
 
95
:
9413
–9417.
Lawrence, J. G., H. Ochman, and D. L. Hartl.
1991
. Molecular and evolutionary relationships among enteric bacteria.
J. Gen. Microbiol.
 
137
:
1911
–1921.
Lawrence, J. G., and J. R. Roth.
1996
. Selfish operons: horizontal transfer may drive the evolution of gene clusters.
Genetics
 
143
:
1843
–1860.
Lerat, E., V. Daubin, and N. A. Moran.
2003
. From gene trees to organismal phylogeny in prokaryotes: the case of the gamma-proteobacteria.
PLoS Biol.
 
1
:
E19
.
Miller, J. H., and W. S. Reznikoff.
1978
. The operon. Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y.
Ochman, H., J. G. Lawrence, and E. A. Groisman.
2000
. Lateral gene transfer and the nature of bacterial innovation.
Nature
 
405
:
299
–304.
Posada, D., and K. A. Crandall.
1998
. MODELTEST: testing the model of DNA substitution.
Bioinformatics
 
14
:
817
–818.
Pupo, G. M., R. Lan, and P. R. Reeves.
2000
. Multiple independent origins of Shigella clones of Escherichia coli and convergent evolution of many of their characteristics.
Proc. Natl. Acad. Sci. USA
 
97
:
10567
–10572.
Riley, M., and A. Anilionis.
1980
. Conservation and variation of nucleotide sequences within related bacterial genomes: enterobacteria.
J. Bacteriol.
 
143
:
366
–376.
Shimodaira, H., and M. Hasegawa.
1999
. Multiple comparisons of log-likelihoods with applications to phylogenetic inference.
Mol. Biol. Evol.
 
16
:
1114
–1116.
Starlinger, P.
1977
. DNA rearrangements in procaryotes.
Annu. Rev. Genet.
 
11
:
103
–126.
Swofford, D. L.
2003
. PAUP*. Phylogenetic analysis using parsimony (*and other methods). Sinauer Associates, Sunderland, Mass.
Thompson, J. D., T. J. Gibson, F. Plewniak, F. Jeanmougin, and D. G. Higgins.
1997
. The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools.
Nucleic Acids Res.
 
25
:
4876
–4882.