Abstract

Horizontal gene transfer (HGT) is a prevalent and a highly important phenomenon in microbial species evolution. One of the important challenges in HGT research is to better understand the factors that determine the tendency of genes to be successfully transferred and retained in evolution (i.e., transferability). It was previously observed that transferability of genes depends on the cellular process in which they are involved where genes involved in transcription or translation are less likely to be transferred than metabolic genes. It was further shown that gene connectivity in the protein–protein interaction network affects HGT. These two factors were shown to be correlated, and their influence on HGT is collectively termed the “Complexity Hypothesis”. In this study, we used a stochastic mapping method utilizing advanced likelihood-based evolutionary models to quantify gene family acquisition events by HGT. We applied our methodology to an extensive across-species genome-wide dataset that enabled us to estimate the overall extent of transfer events in evolution and to study the trends and barriers to gene transferability. Focusing on the biological function and the connectivity of genes, we obtained novel insights regarding the “complexity hypothesis.” Specifically, we aimed to disentangle the relationships between protein connectivity, cellular function, and transferability and to quantify the relative contribution of each of these factors in determining transferability. We show that the biological function of a gene family is an insignificant factor in the determination of transferability when proteins with similar levels of connectivity are compared. In contrast, we found that connectivity is an important and a statistically significant factor in determining transferability when proteins with a similar function are compared.

Introduction

Comparative genomics have revealed vast and surprising variability in gene content even among closely related species (Berg and Kurland 2002; Mira et al. 2002; Konstantinidis and Tiedje 2004; Koonin and Wolf 2008). The dynamics of genomes remodeling include drastic genome erosions by gene losses (Moran et al. 2009) and acquisition of novel genetic material by gene gains through horizontal gene transfer (HGT) (Syvanen 1994; Hacker and Carniel 2001). A pivotal role for HGT was demonstrated in the adaptation of organisms to new ecological niches (Gogarten and Townsend 2005), acquisition of novel functions (Pennisi 2004; Gogarten and Townsend 2005), metabolic networks expansions (Pal et al. 2005), and speciation (Lawrence 1999). The transfer of genes among bacteria also bears significant medical implications as the emergence of new virulent strains as well as their resistance to antibiotics is mainly attributed to HGT (Holden et al. 2004; Gal-Mor and Finlay 2006). Thus, studying HGT dynamics and the factors that determine gene transferability is important for evolutionary, ecological, and molecular biology studies.

Although most genes are susceptible to HGT (Sorek et al. 2007), it is well established that the tendency to undergo HGT is highly variable among genes (Nakamura et al. 2004; Cohen et al. 2008; Hao and Golding 2008). Over a decade ago, it was suggested that the biological process in which a gene is involved strongly affects its transferability. It was shown that informational genes are less transferable than operational genes. Later, it was additionally shown that the number of protein–protein interactions (PPIs) is an important factor in determining transferability. The dependency of transferability on these two factors, the biological process and the network connectivity, is now collectively referred to as the “complexity hypothesis” (Rivera et al. 1998; Doolittle 1999; Jain et al. 1999, 2002; Sicheritz-Ponten and Andersson 2001; Gogarten et al. 2002; Brown 2003; Wellner et al. 2007; Lercher and Pal 2008).

Since it was suggested, the complexity hypothesis was in the center of the discussion regarding gene transferability: The hypothesis was extended (Aris-Brosou 2005) and received support from both bioinformatic analyses (Lercher and Pal 2008) and experimental studies (Wellner and Gophna 2008). Nevertheless, it was also debated and criticized (Brochier et al. 2000; Nesbo et al. 2001).

Testing the validity of the complexity hypothesis requires accurate inference of HGT events. There are three widely used computational approaches to detect HGT, each tailored toward the detection of only a subset of all transfer events. The first detects genes with phylogenetic incongruence as compared with the inferred ribosomal trees or trees that are supposed to represent the organismal evolutionary history. This approach is only suitable for relatively widespread genes with “not too much or too little” sequence divergence (e.g., Graybeal 1994). The second detects genes that are significantly different from the rest of the genome in some compositional attributes such as G+C content or codon usage. This approach can only detect recent transfer events due to sequence amelioration (Koski et al. 2001; Wang 2001). The third uses a presence-absence matrix of gene families across multiple genomes (phyletic pattern) to detect acquisition events of gene families along the assumed phylogeny. Although this approach is suitable for the detection of both recent and ancient events of all gene families, it is only capable of detecting transfer events that resulted in the acquisition of the first copy of a particular gene family. For example, this approach ignores xenologous gene replacements or HGT events that result in additional paralogs. This subset of transfer events may be only a fraction of all HGTs, but it is of a particular evolutionary significance as the acquisition of a novel gene family increases the proteomic repertoire of the recipient and holds the greatest potential for functional innovations and adaptations.

HGT inference from phyletic patterns has been classically inferred based on the parsimony criterion (Yang 1996; Mirkin et al. 2003; Cordero et al. 2008; Lercher and Pal 2008). Recently, more statistically robust models for phyletic pattern analysis were developed in which the dynamics of gain and loss of gene families is modeled as a stochastic process (Hao and Golding 2006). Several improvements to such evolutionary models were developed (Cohen et al. 2008; Hao and Golding 2008; Spencer and Sangaralingam 2009; Cohen and Pupko 2010), and we have utilized these maximum-likelihood models to develop a methodology to accurately detect branch-specific HGT events (Cohen and Pupko 2010).

Here, we apply our HGT detection methodology to characterize the factors that determine transferability in genome-wide data. Specifically, we test the complexity hypothesis and disentangle cellular function and the number of protein interactions as factors that determine transferability. Notably, in this manuscript, for brevity, we use the general term HGT in lieu of the more accurate expression, gene family acquisition by HGT. Thus, our conclusions regarding the complexity hypothesis are limited to this type of HGT events.

Methods

Phyletic Pattern and Phylogeny

The presence-absence matrix of gene family was extracted from the Clusters of Orthologous Groups of proteins (COG) database (Tatusov et al. 2003), which contains 4,873 gene families across 66 species (50 bacteria, 13 archaea, and 3 fungi). In this research, we focused on HGT inference from the set of 50 bacterial genomes. Previous research has shown that gain and loss dynamics is different in parasitic bacteria versus free living organisms (Spencer and Sangaralingam 2009). Because the models used here do not allow branch-specific changes in the evolutionary process, we removed the 12 known parasitic bacteria from our analysis (Mycoplasma pulmonis, Mycop. pneumoniae, Mycop. genitalium, Ureaplasma urealyticum, Buchnera sp. APS, Rickettsia prowazekii, R. conorii, Treponema pallidum, Borrelia burgdorferi, Chlamydia trachomatis, Chlamydophila pneumoniae, and Mycobacterium leprae), retaining 38 bacterial genomes. The COG data set definition of gene family requires its presence in at least three genomes. Therefore, the exclusion of species from the original COG database dictates retaining only gene families that are present in at least three genomes within our data set. After this filtering criterion, 3,915 gene families were retained.

The analysis was based on the assumed tree topology of Ciccarelli et al. (2006), which was constructed from a set of “core” genes that are assumed to be resistant to gene transfer. As a control, the analysis was repeated with a topology constructed based on ribosomal RNA (rRNA) sequences (Yarza et al. 2008). In both cases, branch lengths are re-estimated from the phyletic pattern using the evolutionary models.

Inference of HGT Events Using Stochastic Mapping

The gain and loss dynamics were modeled using gain loss mixture model (Cohen and Pupko 2010) in which variability in the gain and loss rates is allowed among gene families. Based on the evolutionary model and the assumed phylogeny, the stochastic mapping approach (Minin and Suchard 2008) allows for the inference of gain and loss events for each gene family along each branch (Cohen and Pupko 2010). This methodology allows for the computation of both the expected number of events and the probability of occurrence of gain and loss events.

The overall tendency of a gene family to undergo acquisition by HGT is measured by the posterior expectation of the number of gain events over all branches. We classify gene families to either transferable or not. Transferable gene families are those for which there is a high probability of HGT events during their evolution. To be conservative, we demand a gain event (HGT) in at least two branches as described in a previous research (Cohen and Pupko 2010). The transferability cutoff value is determined by limiting the number of false-positive predictions of gain events to 5% based on simulations. In this study, the transferability cutoff corresponds to a posterior probability of 0.25 for a gain event. Notably, the cutoff values that result with 5% false-positive predictions may vary with respect to simulation assumptions (Cohen and Pupko 2010). Thus, in this study, a relative strict cutoff value was used, which may result with less than 5% of false positives under realistic assumptions. Moreover, the computations were repeated with both more strict and more permissive cutoff values.

We classify all gain events to either ancient or recent. Recent gains are those that are mapped to external branches, (i.e., branches leading to extant organisms). Gain events mapped to all other branches are considered ancient (i.e., gain events mapped to internal branches).

Network and Protein Interactions

The PPI network and the number of interactions for each gene family were extracted from the STRING database version 8.3 (Jensen et al. 2009). This comprehensive PPI network is based on known interactions from several databases covering several model organisms (Salwinski et al. 2004; Alfarano et al. 2005; Joshi-Tope et al. 2005; Chatr-aryamontri et al. 2007; Kerrien et al. 2007; Vastrik et al. 2007; Breitkreutz et al. 2008; Kanehisa et al. 2008) and is augmented by methods that accurately predict interactions (Harrington et al. 2008; Skrabanek et al. 2008). As a control, the PPI network of the Database of Interacting Proteins (DIP) (Xenarios et al. 2000; Salwinski et al. 2004) version June 2010 was used. Unlike STRING, this network only considers experimentally validated interactions (i.e., it does not include predicted interactions).

Interactions in the STRING data set are given confidence score based on benchmarking with manually curated interaction maps (Kanehisa et al. 2008) in which for each pair of gene families the interaction confidence is denoted by a value in the range 0–1,000. Protein families in which all interactions have a zero confidence score may reflect lack of data rather than genuine non-interacting protein families. These were excluded from the analysis resulting in 2,442 gene families. Notably, in our analyses, we only consider interactions with a confidence score above a certain threshold. For example, a protein family having one reported interaction with a confidence score of 400 will be analyzed as having a single interaction when the threshold is 150 and zero interactions when the threshold is 700.

Functional Categories

The biological process in which each gene family is involved (functional category) is extracted from the COG database in which there are 25 specific categories grouped into four meta-categories. We limited our analysis to functional categories with at least three gene families. Thus, we retain the four meta-categories (Information storage and processing, Cellular processes and signaling, Metabolism, and Poorly characterized) and 20 specific categories (Translation, ribosomal structure, and biogenesis; Transcription; Replication, recombination, and repair; Cell cycle control, cell division, and chromosome partitioning; Defense mechanisms; Signal transduction mechanisms; Cell wall/membrane/envelope biogenesis; Cell motility; Intracellular trafficking, Secretion, and vesicular transport; Posttranslational modification, protein turnover, and chaperones; Energy production and conversion; Carbohydrate transport and metabolism; Amino acid transport and metabolism; Nucleotide transport and metabolism; Coenzyme transport and metabolism; Lipid transport and metabolism; Inorganic ion transport and metabolism; Secondary metabolites biosynthesis, transport, and catabolism; General function prediction only; and Function unknown). In the analysis of functional categories, 158 gene families have more than one functional category label. These gene families were included independently in each functional category analysis.

Statistical Analysis of Function and Transferability Association

To test for association between a functional category and transferability, we computed the ratio between the fraction of transferable genes in this category to the fraction of transferable genes not in this category. We term this ratio “relative transferability,” which is equivalent of the often used term “relative risk.” A relative transferability significantly higher than one suggests a higher propensity for a gene family to be transferred when included in this functional category compared with all other functional categories. Statistical significance is determined using Fisher's exact test. The classification of a gene family as transferable is based on stochastic mapping (see above).

We additionally compute the relative transferability while accounting for variable levels of connectivity. Specifically, we treated the data as stratified by the number of PPIs. We thus computed the relative transferability in each functional category accounting for this stratification using the Mantel–Haenszel test. Gene families were stratified into 45 levels of connectivity, in which each stratum has at least ten gene families. Notably, similar results were obtained when that data were stratified to 94 levels of connectivity (at least three gene families in each stratum) or to seven levels of connectivity (at least 100 gene families in each stratum). This indicates that the results are highly robust to the choice of stratification resolution (data not shown).

The stratification of the gene family according to their connectivity was done as follows. All gene families are sorted according to their connectivity (number of PPIs). The first stratum comprises the group of gene families with the lowest number of interactions. We incrementally add gene families with the next lower levels of connectivity to this stratum until at least ten gene families are included. All gene families with the exactly the same level of connectivity are added to a stratum, even if the size increases above 10. Once a stratum is defined, we build the next stratum.

Results and Discussion

High Number of Protein Interactions Acts As Barrier to HGT

Gain and loss dynamics of gene families were studied using the gain loss mixture model (Cohen and Pupko 2010). The ML estimate for each of the model parameters is given in supplementary table S1, Supplementary Material online. This model was used to infer gain (HGT) events for each gene family and for each branch using stochastic mapping. It was previously shown that the number of protein interactions (connectivity) is associated with gene family transferability by comparing transferability of genes with low versus high connectivity levels (Wellner et al. 2007; Davids and Zhang 2008; Lercher and Pal 2008). To gain further insight regarding this “connectivity barrier” (i.e., protein interactions that hinder or reduce transferability), instead of using such a binning approach, we directly computed the correlation between connectivity and transferability (fig. 1). Our analysis indicates that more HGT events are expected for protein families with low connectivity levels (Spearman coefficient computed for all gene families R = −0.422, P value <8.18 × 10−105, table 1).

FIG. 1.

HGT as a function of PPIs. Each dot represents one gene family. The X axis is the overall number of PPIs of the gene family with other gene families. The Y axis corresponds to the posterior number of HGT events.

Each interaction is given a confidence value (see Methods). We performed the same computation with different cutoff levels for the inclusion of interactions. In all cases, the negative correlation was highly significant (supplementary table S2A, Supplementary Material online). Notably, increasing the threshold reduces the number of included interactions (overall number of network interactions 62,642 and 5,861 for the lowest and highest confidence threshold, respectively) and the corresponding R coefficients (R = −0.422 and −0.298 for the lowest and the highest confidence threshold, respectively). Although the increase in the confidence cutoff may have reduced the number of false interactions, the decrease in the correlation strength suggests that using a stringent cutoff value significantly reduced the number of true interactions as well. Notably, we found very similar correlation levels between HGT and connectivity with exclusively low and mid confidence levels. These results (supplementary table S2B, Supplementary Material online) suggest that with the STRING confidence score method, even interactions with low confidence contribute to the connectivity barrier.

It may be claimed that the connectivity barrier is mainly the result of the most extreme cases in the connectivity spectrum, that is, that the majority of the signal arises from isolated gene families and hub gene families. We repeated the correlation analysis, removing from the analysis gene families with less than one and higher than 50 interactions, respectively. This analysis shows that the connectivity is also informative with respect to HGT for intermediate level of interactions (R = −0.362, P value <9.2 × 10−66).

The Biological Functional As a Factor Determining HGT Extent

Others and we have previously shown that the biological function of a gene family is important in determining its propensity to undergo HGT (Rivera et al. 1998; Nakamura et al. 2004; Merkl 2006; Choi and Kim 2007; Hao and Golding 2008; Kanhere and Vingron 2009; Cohen and Pupko 2010). Here, we show that the mean number of HGT events dramatically changes among various functional categories (table 2, HGT columns). In agreement with previous studies, the lowest HGT levels are observed for the informational genes (involved in transcription and translation), where the most pronounced trend was found in genes associated with the ribosome and related with translation (COG functional category: “translation, ribosomal structure, and biogenesis”). The average expected number of HGT events per gene family in this category was below 0.47, substantially lower than the 1.73 events per gene family, which is the average over all gene families. Statistical significant differences were found among the 20 specific functional categories and also among the four “meta-categories” (P values <1.7 × 10−40 and 4.52 × 10−34, Kruskal–Wallis test, respectively).

We further studied the association between functional category and the propensity for HGT by computing the relative transferability factor of each functional category (see Methods for more details). In table 3, we summarize the relative transferability of all functional categories and find that several functional categories have relative transferability that is significantly different than one. In agreement with the lower computed average HGT, the relative transferability value of the function “translation, ribosomal structure, and biogenesis” is 0.276, which is highly significant even after correction for multiple testing (P value <3.55 × 10−6, Fisher's exact test).

The classification of a gene family as transferable is dependent on an estimated “transferability cutoff” (see Methods). To verify that the obtained results are robust in this respect, we perform additional computations with both more strict and more permissive cutoffs. Changing the cutoff substantially affects the estimation of the overall percentage of transferable genes from 32.31% to 23.91% and 42.51% for the more strict and more permissive cutoffs, respectively. However, the relative transferability factors of the various functional categories were very similar (supplementary tables S3A and Supplementary Data, Supplementary Material online).

Disentangling Biological Functional and Connectivity in Determining HGT Frequency

The complexity hypothesis relates two biological factors to the tendency of gene families to undergo HGT—the connectivity and the function (or biological process). Our results above show that HGT is influenced by each of these factors when analyzed separately. However, the various functional categories vary in terms of their connectivity. In table 2, we show the average connectivity of each functional category. Substantial differences are observed among the functional categories, with averages ranging from as high as 84.51 for the ”translation, ribosomal structure, and biogenesis” category to only 7.11 for the “function unknown” category. This observed difference among categories is statistically significant in the comparison of both the specific categories and the meta-categories (P values <7.77 × 10−60 and 5.48 × 10−60, Kruskal–Wallis test, respectively). Given this strong association between functional category and connectivity, HGT dependence on the function may be a side effect of this variance in connectivity. Alternatively, it is possible that the observed effect of connectivity over HGT propensity is a by-product of the differences in functionality. Here, we tried to test for the effect of each of these factors, controlling for the effect of the other.

We computed the correlation between connectivity and transferability for each functional group separately. Our results show that the connectivity barrier holds even when the functional category factor is accounted for. Specifically, for the vast majority of functional categories, a significant negative correlation was observed between connectivity and transferability (table 1). However, the impact of the connectivity barrier was different across functional groups. We found that connectivity was the most influential in informational genes with Spearman's coefficient of −0.518, while in both metabolic and cellular groups, the coefficients were lower: −0.39 and −0.353, respectively. The lowest correlation between connectivity and transferability was found for the “poorly characterized” meta-category and for the “function unknown” category with Spearman's coefficients of −0.244 (P value 2.35 × 10−09) and −0.152 (P value 0.0096), respectively. The only two functional categories in which the correlation was not significant after correction for multiple testing are “cell motility” and “lipid transport and metabolism”.

Table 1.

The Spearman's Correlation (R) between Connectivity and Transferability Computed Separately for Various Functional Categories.

Functional CategoryRPNumber of Gene Families
All−0.4228.18 × 10−1052,442
Information storage and processing−0.5181.08 × 10−26382
Cellular processes and signaling−0.3535.95 × 10−15487
Metabolism−0.392.94 × 10−371,017
Poorly characterized−0.2442.35 × 10−09626
Translation, ribosomal structure, and biogenesis−0.2130.0112150
Transcription−0.3494.78 × 10−04105
Replication, recombination, and repair−0.4441.98 × 10−07133
Cell cycle control, cell division, and chromosome partitioning−0.4640.0072735
Defense mechanisms−0.4740.0072334
Signal transduction mechanisms−0.2350.026593
Cell wall/membrane/envelope biogenesis−0.2899.72 × 10−04138
Cell motility−0.2550.057757
Intracellular trafficking, secretion, and vesicular transport−0.3310.010563
Posttranslational modification, protein turnover, and chaperones−0.4111.03 × 10−05115
Energy production and conversion−0.4072.50 × 10−08185
Carbohydrate transport and metabolism−0.4612.35 × 10−09162
Amino acid transport and metabolism−0.3876.65 × 10−09223
Nucleotide transport and metabolism−0.4711.79 × 10−0581
Coenzyme transport and metabolism−0.5135.47 × 10−10139
Lipid transport and metabolism−0.2070.083171
Inorganic ion transport and metabolism−0.1740.0359151
Secondary metabolites biosynthesis, transport, and catabolism−0.3290.020352
General function prediction only−0.2991.57 × 10−07314
Function unknown−0.1520.00968312
Functional CategoryRPNumber of Gene Families
All−0.4228.18 × 10−1052,442
Information storage and processing−0.5181.08 × 10−26382
Cellular processes and signaling−0.3535.95 × 10−15487
Metabolism−0.392.94 × 10−371,017
Poorly characterized−0.2442.35 × 10−09626
Translation, ribosomal structure, and biogenesis−0.2130.0112150
Transcription−0.3494.78 × 10−04105
Replication, recombination, and repair−0.4441.98 × 10−07133
Cell cycle control, cell division, and chromosome partitioning−0.4640.0072735
Defense mechanisms−0.4740.0072334
Signal transduction mechanisms−0.2350.026593
Cell wall/membrane/envelope biogenesis−0.2899.72 × 10−04138
Cell motility−0.2550.057757
Intracellular trafficking, secretion, and vesicular transport−0.3310.010563
Posttranslational modification, protein turnover, and chaperones−0.4111.03 × 10−05115
Energy production and conversion−0.4072.50 × 10−08185
Carbohydrate transport and metabolism−0.4612.35 × 10−09162
Amino acid transport and metabolism−0.3876.65 × 10−09223
Nucleotide transport and metabolism−0.4711.79 × 10−0581
Coenzyme transport and metabolism−0.5135.47 × 10−10139
Lipid transport and metabolism−0.2070.083171
Inorganic ion transport and metabolism−0.1740.0359151
Secondary metabolites biosynthesis, transport, and catabolism−0.3290.020352
General function prediction only−0.2991.57 × 10−07314
Function unknown−0.1520.00968312

Note.—The data were partitioned into groups of gene families based on the COG functions. The P values were corrected for multiple testing using false discovery rate method (Benjamini and Hochberg 1995).

Table 1.

The Spearman's Correlation (R) between Connectivity and Transferability Computed Separately for Various Functional Categories.

Functional CategoryRPNumber of Gene Families
All−0.4228.18 × 10−1052,442
Information storage and processing−0.5181.08 × 10−26382
Cellular processes and signaling−0.3535.95 × 10−15487
Metabolism−0.392.94 × 10−371,017
Poorly characterized−0.2442.35 × 10−09626
Translation, ribosomal structure, and biogenesis−0.2130.0112150
Transcription−0.3494.78 × 10−04105
Replication, recombination, and repair−0.4441.98 × 10−07133
Cell cycle control, cell division, and chromosome partitioning−0.4640.0072735
Defense mechanisms−0.4740.0072334
Signal transduction mechanisms−0.2350.026593
Cell wall/membrane/envelope biogenesis−0.2899.72 × 10−04138
Cell motility−0.2550.057757
Intracellular trafficking, secretion, and vesicular transport−0.3310.010563
Posttranslational modification, protein turnover, and chaperones−0.4111.03 × 10−05115
Energy production and conversion−0.4072.50 × 10−08185
Carbohydrate transport and metabolism−0.4612.35 × 10−09162
Amino acid transport and metabolism−0.3876.65 × 10−09223
Nucleotide transport and metabolism−0.4711.79 × 10−0581
Coenzyme transport and metabolism−0.5135.47 × 10−10139
Lipid transport and metabolism−0.2070.083171
Inorganic ion transport and metabolism−0.1740.0359151
Secondary metabolites biosynthesis, transport, and catabolism−0.3290.020352
General function prediction only−0.2991.57 × 10−07314
Function unknown−0.1520.00968312
Functional CategoryRPNumber of Gene Families
All−0.4228.18 × 10−1052,442
Information storage and processing−0.5181.08 × 10−26382
Cellular processes and signaling−0.3535.95 × 10−15487
Metabolism−0.392.94 × 10−371,017
Poorly characterized−0.2442.35 × 10−09626
Translation, ribosomal structure, and biogenesis−0.2130.0112150
Transcription−0.3494.78 × 10−04105
Replication, recombination, and repair−0.4441.98 × 10−07133
Cell cycle control, cell division, and chromosome partitioning−0.4640.0072735
Defense mechanisms−0.4740.0072334
Signal transduction mechanisms−0.2350.026593
Cell wall/membrane/envelope biogenesis−0.2899.72 × 10−04138
Cell motility−0.2550.057757
Intracellular trafficking, secretion, and vesicular transport−0.3310.010563
Posttranslational modification, protein turnover, and chaperones−0.4111.03 × 10−05115
Energy production and conversion−0.4072.50 × 10−08185
Carbohydrate transport and metabolism−0.4612.35 × 10−09162
Amino acid transport and metabolism−0.3876.65 × 10−09223
Nucleotide transport and metabolism−0.4711.79 × 10−0581
Coenzyme transport and metabolism−0.5135.47 × 10−10139
Lipid transport and metabolism−0.2070.083171
Inorganic ion transport and metabolism−0.1740.0359151
Secondary metabolites biosynthesis, transport, and catabolism−0.3290.020352
General function prediction only−0.2991.57 × 10−07314
Function unknown−0.1520.00968312

Note.—The data were partitioned into groups of gene families based on the COG functions. The P values were corrected for multiple testing using false discovery rate method (Benjamini and Hochberg 1995).

Table 2.

Connectivity and HGT Propensity of All Gene Families and Specific Functional Categories.

Functional CategoryMean HGTSE HGTMean PPISE PPI
All1.7310.0425.651.02
Information storage and processing1.1880.156.53.89
Cellular processes and signaling1.540.0826.452.94
Metabolism1.7810.0621.160.97
Poorly characterized2.1070.0816.752.2
Translation, ribosomal structure, and biogenesis0.4690.0884.516.22
Transcription1.3740.1743.359.45
Replication, recombination, and repair1.8560.2147.037.56
Cell cycle control, cell division, and chromosome partitioning1.7040.3516.83.62
Defense mechanisms2.490.4715.945.35
Signal transduction mechanisms1.5590.1633.359.14
Cell wall/membrane/envelope biogenesis1.6710.1616.972.49
Cell motility1.0080.1615.122.62
Intracellular trafficking, secretion, and vesicular transport0.9610.1414.192.54
Posttranslational modification, protein turnover, and chaperones1.3940.1645.418.98
Energy production and conversion1.9520.1424.412.42
Carbohydrate transport and metabolism2.3090.1719.872.21
Amino acid transport and metabolism1.6250.1223.912.34
Nucleotide transport and metabolism1.3360.233.695.66
Coenzyme transport and metabolism1.3370.1517.391.79
Lipid transport and metabolism1.2870.1831.994.04
Inorganic ion transport and metabolism1.8890.1615.612.37
Secondary metabolites biosynthesis, transport, and catabolism2.1370.2918.084.67
General function prediction only1.9980.1126.334.28
Function unknown2.2170.117.1120.67
Functional CategoryMean HGTSE HGTMean PPISE PPI
All1.7310.0425.651.02
Information storage and processing1.1880.156.53.89
Cellular processes and signaling1.540.0826.452.94
Metabolism1.7810.0621.160.97
Poorly characterized2.1070.0816.752.2
Translation, ribosomal structure, and biogenesis0.4690.0884.516.22
Transcription1.3740.1743.359.45
Replication, recombination, and repair1.8560.2147.037.56
Cell cycle control, cell division, and chromosome partitioning1.7040.3516.83.62
Defense mechanisms2.490.4715.945.35
Signal transduction mechanisms1.5590.1633.359.14
Cell wall/membrane/envelope biogenesis1.6710.1616.972.49
Cell motility1.0080.1615.122.62
Intracellular trafficking, secretion, and vesicular transport0.9610.1414.192.54
Posttranslational modification, protein turnover, and chaperones1.3940.1645.418.98
Energy production and conversion1.9520.1424.412.42
Carbohydrate transport and metabolism2.3090.1719.872.21
Amino acid transport and metabolism1.6250.1223.912.34
Nucleotide transport and metabolism1.3360.233.695.66
Coenzyme transport and metabolism1.3370.1517.391.79
Lipid transport and metabolism1.2870.1831.994.04
Inorganic ion transport and metabolism1.8890.1615.612.37
Secondary metabolites biosynthesis, transport, and catabolism2.1370.2918.084.67
General function prediction only1.9980.1126.334.28
Function unknown2.2170.117.1120.67

Note.—The PPI (connectivity) and HGT values are computed for each functional category and for all gene families as reference. SE, standard error.

Table 2.

Connectivity and HGT Propensity of All Gene Families and Specific Functional Categories.

Functional CategoryMean HGTSE HGTMean PPISE PPI
All1.7310.0425.651.02
Information storage and processing1.1880.156.53.89
Cellular processes and signaling1.540.0826.452.94
Metabolism1.7810.0621.160.97
Poorly characterized2.1070.0816.752.2
Translation, ribosomal structure, and biogenesis0.4690.0884.516.22
Transcription1.3740.1743.359.45
Replication, recombination, and repair1.8560.2147.037.56
Cell cycle control, cell division, and chromosome partitioning1.7040.3516.83.62
Defense mechanisms2.490.4715.945.35
Signal transduction mechanisms1.5590.1633.359.14
Cell wall/membrane/envelope biogenesis1.6710.1616.972.49
Cell motility1.0080.1615.122.62
Intracellular trafficking, secretion, and vesicular transport0.9610.1414.192.54
Posttranslational modification, protein turnover, and chaperones1.3940.1645.418.98
Energy production and conversion1.9520.1424.412.42
Carbohydrate transport and metabolism2.3090.1719.872.21
Amino acid transport and metabolism1.6250.1223.912.34
Nucleotide transport and metabolism1.3360.233.695.66
Coenzyme transport and metabolism1.3370.1517.391.79
Lipid transport and metabolism1.2870.1831.994.04
Inorganic ion transport and metabolism1.8890.1615.612.37
Secondary metabolites biosynthesis, transport, and catabolism2.1370.2918.084.67
General function prediction only1.9980.1126.334.28
Function unknown2.2170.117.1120.67
Functional CategoryMean HGTSE HGTMean PPISE PPI
All1.7310.0425.651.02
Information storage and processing1.1880.156.53.89
Cellular processes and signaling1.540.0826.452.94
Metabolism1.7810.0621.160.97
Poorly characterized2.1070.0816.752.2
Translation, ribosomal structure, and biogenesis0.4690.0884.516.22
Transcription1.3740.1743.359.45
Replication, recombination, and repair1.8560.2147.037.56
Cell cycle control, cell division, and chromosome partitioning1.7040.3516.83.62
Defense mechanisms2.490.4715.945.35
Signal transduction mechanisms1.5590.1633.359.14
Cell wall/membrane/envelope biogenesis1.6710.1616.972.49
Cell motility1.0080.1615.122.62
Intracellular trafficking, secretion, and vesicular transport0.9610.1414.192.54
Posttranslational modification, protein turnover, and chaperones1.3940.1645.418.98
Energy production and conversion1.9520.1424.412.42
Carbohydrate transport and metabolism2.3090.1719.872.21
Amino acid transport and metabolism1.6250.1223.912.34
Nucleotide transport and metabolism1.3360.233.695.66
Coenzyme transport and metabolism1.3370.1517.391.79
Lipid transport and metabolism1.2870.1831.994.04
Inorganic ion transport and metabolism1.8890.1615.612.37
Secondary metabolites biosynthesis, transport, and catabolism2.1370.2918.084.67
General function prediction only1.9980.1126.334.28
Function unknown2.2170.117.1120.67

Note.—The PPI (connectivity) and HGT values are computed for each functional category and for all gene families as reference. SE, standard error.

Table 3.

The Relative Transferability of Gene Families in Each Functional Category.

Functional CategoryRelative TransferabilityPRelative Transferability (MH)P value (MH)
Information storage and processing0.6080.001040.7740.344
Cellular processes and signaling0.8660.4080.8740.617
Metabolism1.0780.6141.1380.498
Poorly characterized1.3140.02161.0720.971
Translation, ribosomal structure, and biogenesis0.2762.84 × 10−060.4180.0864
Transcription0.6680.3180.730.617
Replication, recombination, and repair1.050.91.2410.617
Cell cycle control, cell division, and chromosome partitioning0.8830.9030.8990.977
Defense mechanisms1.0940.880.9550.977
Signal transduction mechanisms0.8610.7820.9050.977
Cell wall/membrane/envelope biogenesis1.130.7261.1160.977
Cell motility0.3210.02630.3140.0909
Intracellular trafficking, secretion, and vesicular transport0.6330.4080.5640.349
Posttranslational modification, protein turnover, and chaperones0.80.5960.8990.977
Energy production and conversion1.2060.4081.330.344
Carbohydrate transport and metabolism1.370.1561.3640.344
Amino acid transport and metabolism0.9840.9431.0670.977
Nucleotide transport and metabolism0.8360.7820.9620.977
Coenzyme transport and metabolism0.7690.4080.790.617
Lipid transport and metabolism0.780.6140.9280.977
Inorganic ion transport and metabolism1.0270.9030.980.977
Secondary metabolites biosynthesis, transport, and catabolism1.1340.8531.0730.977
General function prediction only1.180.4081.0590.977
Function unknown1.3340.05871.0570.977
Functional CategoryRelative TransferabilityPRelative Transferability (MH)P value (MH)
Information storage and processing0.6080.001040.7740.344
Cellular processes and signaling0.8660.4080.8740.617
Metabolism1.0780.6141.1380.498
Poorly characterized1.3140.02161.0720.971
Translation, ribosomal structure, and biogenesis0.2762.84 × 10−060.4180.0864
Transcription0.6680.3180.730.617
Replication, recombination, and repair1.050.91.2410.617
Cell cycle control, cell division, and chromosome partitioning0.8830.9030.8990.977
Defense mechanisms1.0940.880.9550.977
Signal transduction mechanisms0.8610.7820.9050.977
Cell wall/membrane/envelope biogenesis1.130.7261.1160.977
Cell motility0.3210.02630.3140.0909
Intracellular trafficking, secretion, and vesicular transport0.6330.4080.5640.349
Posttranslational modification, protein turnover, and chaperones0.80.5960.8990.977
Energy production and conversion1.2060.4081.330.344
Carbohydrate transport and metabolism1.370.1561.3640.344
Amino acid transport and metabolism0.9840.9431.0670.977
Nucleotide transport and metabolism0.8360.7820.9620.977
Coenzyme transport and metabolism0.7690.4080.790.617
Lipid transport and metabolism0.780.6140.9280.977
Inorganic ion transport and metabolism1.0270.9030.980.977
Secondary metabolites biosynthesis, transport, and catabolism1.1340.8531.0730.977
General function prediction only1.180.4081.0590.977
Function unknown1.3340.05871.0570.977

NOTE.—Relative transferability refers to the fraction of transferable gene families within each functional category divided by the fraction of transferable gene families among other gene families. This computation is repeated, once when all connectivity levels are aggregated and once when accounting for connectivity stratification using Mantel–Haenszel (MH) test. The P values were corrected for multiple testing using the false discovery rate method (Benjamini and Hochberg 1995).

Table 3.

The Relative Transferability of Gene Families in Each Functional Category.

Functional CategoryRelative TransferabilityPRelative Transferability (MH)P value (MH)
Information storage and processing0.6080.001040.7740.344
Cellular processes and signaling0.8660.4080.8740.617
Metabolism1.0780.6141.1380.498
Poorly characterized1.3140.02161.0720.971
Translation, ribosomal structure, and biogenesis0.2762.84 × 10−060.4180.0864
Transcription0.6680.3180.730.617
Replication, recombination, and repair1.050.91.2410.617
Cell cycle control, cell division, and chromosome partitioning0.8830.9030.8990.977
Defense mechanisms1.0940.880.9550.977
Signal transduction mechanisms0.8610.7820.9050.977
Cell wall/membrane/envelope biogenesis1.130.7261.1160.977
Cell motility0.3210.02630.3140.0909
Intracellular trafficking, secretion, and vesicular transport0.6330.4080.5640.349
Posttranslational modification, protein turnover, and chaperones0.80.5960.8990.977
Energy production and conversion1.2060.4081.330.344
Carbohydrate transport and metabolism1.370.1561.3640.344
Amino acid transport and metabolism0.9840.9431.0670.977
Nucleotide transport and metabolism0.8360.7820.9620.977
Coenzyme transport and metabolism0.7690.4080.790.617
Lipid transport and metabolism0.780.6140.9280.977
Inorganic ion transport and metabolism1.0270.9030.980.977
Secondary metabolites biosynthesis, transport, and catabolism1.1340.8531.0730.977
General function prediction only1.180.4081.0590.977
Function unknown1.3340.05871.0570.977
Functional CategoryRelative TransferabilityPRelative Transferability (MH)P value (MH)
Information storage and processing0.6080.001040.7740.344
Cellular processes and signaling0.8660.4080.8740.617
Metabolism1.0780.6141.1380.498
Poorly characterized1.3140.02161.0720.971
Translation, ribosomal structure, and biogenesis0.2762.84 × 10−060.4180.0864
Transcription0.6680.3180.730.617
Replication, recombination, and repair1.050.91.2410.617
Cell cycle control, cell division, and chromosome partitioning0.8830.9030.8990.977
Defense mechanisms1.0940.880.9550.977
Signal transduction mechanisms0.8610.7820.9050.977
Cell wall/membrane/envelope biogenesis1.130.7261.1160.977
Cell motility0.3210.02630.3140.0909
Intracellular trafficking, secretion, and vesicular transport0.6330.4080.5640.349
Posttranslational modification, protein turnover, and chaperones0.80.5960.8990.977
Energy production and conversion1.2060.4081.330.344
Carbohydrate transport and metabolism1.370.1561.3640.344
Amino acid transport and metabolism0.9840.9431.0670.977
Nucleotide transport and metabolism0.8360.7820.9620.977
Coenzyme transport and metabolism0.7690.4080.790.617
Lipid transport and metabolism0.780.6140.9280.977
Inorganic ion transport and metabolism1.0270.9030.980.977
Secondary metabolites biosynthesis, transport, and catabolism1.1340.8531.0730.977
General function prediction only1.180.4081.0590.977
Function unknown1.3340.05871.0570.977

NOTE.—Relative transferability refers to the fraction of transferable gene families within each functional category divided by the fraction of transferable gene families among other gene families. This computation is repeated, once when all connectivity levels are aggregated and once when accounting for connectivity stratification using Mantel–Haenszel (MH) test. The P values were corrected for multiple testing using the false discovery rate method (Benjamini and Hochberg 1995).

We next tested if the biological function is a determining factor for transferability when controlling for the connectivity level. We thus computed the relative transferability in each functional category accounting for different levels of connectivity using Mantel–Haenszel test (see Methods). Our results show that when controlling for connectivity, the impact of functional category on transferability drastically diminishes and becomes not significant for all the functional categories (table 3). For example, when accounting for connectivity levels, the relative transferability of informational genes is raised from 0.61 to 0.82. Similarly, for poorly characterized genes, the relative transferability decreases from 1.31 to 1.03. Importantly, using Mantel–Haenszel test, after correction for multiple testing, none of the functional categories is found to have relative transferability that is significantly different from one. The only exception is the functional category “translation, ribosomal structure, and biogenesis” in which the relative transferability is significantly lower than one when the more permissive criterion for transferability is used (supplementary table S3B, Supplementary Material online). This result is not surprising because these gene families are known to be among the so called “core” of the genome, which is highly resistant to HGT (e.g., Ciccarelli et al. 2006; Sorek et al. 2007). To conclude, these results demonstrate that when the connectivity level is taken into account, the functional category is not a significant factor in determining the propensity of gene families to undergo HGT events.

The Connectivity Barrier Holds Both for Recent and Ancient Acquisitions

The stochastic mapping methodology infers branch-specific gain events. We tested whether the connectivity barrier exists both for recent and for ancient transfers by partitioning the branches of the tree to two groups, recent and ancient. Figure 2 depicts the phylogeny used in this research with branches color-coded as either recent or ancient. Our results show that this is indeed the case: connectivity is a strong predictor for transferability for both recent and ancient HGT events with Spearman's coefficients of −0.39 and −0.43, and P values of 6.01 × 1089 and 2.9 × 10−108, respectively. Notably, protein interaction data were derived from contemporary experimental observations, and thus, our observations show that current information regarding connectivity is highly informative for ancient HGT events. These results may be explained by a slow evolutionary rate of the PPI network, that is, the connectivity of gene families in current microbes highly resemble that of hypothetical ancestral lineages. This interpretation is in agreement with the findings of Lercher and Pal (2008).

FIG. 2.

The phylogeny with branches color-coded as recent or ancient. The phylogenetic tree used in this research. Recent branches are colored red and ancient branches are colored gray.

Controls and Additional Tests

The above results were validated with respect to several assumptions. First, we inferred HGT events assuming the tree topology of Ciccarelli et al. (2006). The results obtained were qualitatively the same when all computations were repeated assuming the rRNA tree (Yarza et al. 2008), with detailed results in supplementary tables S4 and Supplementary Data, Supplementary Material online. The Cicarelli tree was chosen to be the main reference as it obtained higher maximum log-likelihood value compared with the rRNA tree (supplementary table S1, Supplementary Material online). Second, connectivity was inferred based on the STRING database (Jensen et al. 2009). The conclusions were essentially the same when interactions were extracted from the DIP database instead (Salwinski et al. 2004), with detailed results in supplementary table S2, Supplementary Material online.

Conclusions

Since it was suggested, the complexity hypothesis was debated: It was shown that for cases of homologous gene acquisition, the complexity barrier may be low (Wellner et al. 2007; Omer et al. 2010). However, here, we demonstrate that gene family acquisition apparently has very different evolutionary characteristics and involves a substantial complexity barrier that is not restricted to particular protein functions. Our results are based on robust statistical models and methodologies and on a large corpus of phyletic data, which are radically different than those that were available when the complexity hypothesis was first suggested. Using these data and methods, we were able to quantify the extent to which HGT of gene families is determined by the functional category and the number of protein–protein connections that characterize them. When assessing barriers to HGT and the importance of these factors in determining transferability, we found that high connectivity hinders HGT events. Finally, we demonstrated that the functional category of a gene family is an insignificant factor in determining HGT, once the connectivity level factor is neutralized.

This study focused on the elucidation of factors that determine HGT. We note that an interesting direction for future research is to apply the methodology presented here to quantify and characterize gene family loss dynamics, that is, to elucidate the factors that determine the propensity for gene family loss (dispensability). The importance of gene loss in shaping microbial genomes in evolution was studied and quantified both computationally (Charlebois and Doolittle 2004; Csuros and Miklos 2006; Marri et al. 2006; Borenstein et al. 2007; Wapinski et al. 2007) and experimentally (Moran et al. 2009) and both gene function and network connectivity had been suggested to play an important role (Krylov et al. 2003; Pal et al. 2006; Wolf et al. 2006; Ochman et al. 2007; Yosef et al. 2009). Notably, because gene loss dynamics is known to be much more common in parasitic bacteria, models that account for a covarion-like type of evolution with regard to gain and loss parameters (heterotachy) should be more suitable to analyze gene loss dynamics. An important step forward in this direction is the recent work of Spencer and Sangaralingam (2009), which clearly shows that a covarion-type model of evolution can better capture gene gain and loss dynamics when reductive evolution in some lineages is evident.

Another interesting direction for future research is to build evolutionary models that explicitly consider the association between connectivity and the gain (and loss) rates. Such models are becoming more and more interesting as the volume of microbial genomic data accumulates and the knowledge regarding PPI becomes more accurate.

We thank Daniel Yekutieli for his help with the statistical analysis. We thank Matthew Spencer for reviewing this paper and for providing helpful criticism and suggestion that significantly improved this manuscript. We thank Nimrod Rubinstein for critically reading the manuscript. T.P. is supported by a grant from the Israel Science Foundation (878/09) and by the National Evolutionary Synthesis Center (NESCent), National Science Foundation #EF-0905606. O.C. is a fellow of the Edmond J. Safra program in bioinformatics.

References

Alfarano
C
Andrade
CE
Anthony
K
, et al. 
(75 co-authors)
The Biomolecularbimolecular Interaction Network Database and related tools 2005 update
Nucleic Acids Res
2005
, vol. 
33
 (pg. 
D418
-
D424
)
Aris-Brosou
S
Determinants of adaptive evolution at the molecular level: the extended complexity hypothesis
Mol Biol Evol
2005
, vol. 
22
 (pg. 
200
-
209
)
Benjamini
Y
Hochberg
Y
Controlling the false discovery rate: a practical and powerful approach to multiple testing
J R Stat Soc Ser B (Methodological)
1995
, vol. 
57
 (pg. 
289
-
300
)
Berg
OG
Kurland
CG
Evolution of microbial genomes: sequence acquisition and loss
Mol Biol Evol
2002
, vol. 
19
 (pg. 
2265
-
2276
)
Borenstein
E
Shlomi
T
Ruppin
E
Sharan
R
Gene loss rate: a probabilistic measure for the conservation of eukaryotic genes
Nucleic Acids Res
2007
, vol. 
35
 pg. 
e7
 
Breitkreutz
BJ
Stark
C
Reguly
T
, et al. 
(12 co-authors)
The BioGRID Interaction Database: 2008 update
Nucleic Acids Res
2008
, vol. 
36
 (pg. 
D637
-
D640
)
Brochier
C
Philippe
H
Moreira
D
The evolutionary history of ribosomal protein RpS14: horizontal gene transfer at the heart of the ribosome
Trends Genet
2000
, vol. 
16
 (pg. 
529
-
533
)
Brown
JR
Ancient horizontal gene transfer
Nat Rev Genet
2003
, vol. 
4
 (pg. 
121
-
132
)
Charlebois
RL
Doolittle
WF
Computing prokaryotic gene ubiquity: rescuing the core from extinction
Genome Res
2004
, vol. 
14
 (pg. 
2469
-
2477
)
Chatr-aryamontri
A
Ceol
A
Palazzi
LM
Nardelli
G
Schneider
MV
Castagnoli
L
Cesareni
G
MINT: the Molecular INTeraction database
Nucleic Acids Res
2007
, vol. 
35
 (pg. 
D572
-
D574
)
Choi
IG
Kim
SH
Global extent of horizontal gene transfer
Proc Natl Acad Sci U S A
2007
, vol. 
104
 (pg. 
4489
-
4494
)
Ciccarelli
FD
Doerks
T
von Mering
C
Creevey
CJ
Snel
B
Bork
P
Toward automatic reconstruction of a highly resolved tree of life
Science
2006
, vol. 
311
 (pg. 
1283
-
1287
)
Cohen
O
Pupko
T
Inference and characterization of horizontally transferred gene families using stochastic mapping
Mol Biol Evol
2010
, vol. 
27
 (pg. 
703
-
713
)
Cohen
O
Rubinstein
ND
Stern
A
Gophna
U
Pupko
T
A likelihood framework to analyse phyletic patterns
Philos Trans R Soc Lond B Biol Sci
2008
, vol. 
363
 (pg. 
3903
-
3911
)
Cordero
OX
Snel
B
Hogeweg
P
Coevolution of gene families in prokaryotes
Genome Res
2008
, vol. 
18
 (pg. 
462
-
468
)
Csuros
M
Miklos
I
A probabilistic model for gene content evolution with duplication, loss, and horizontal transfer
Lect Notes Comput Sci
2006
, vol. 
3909
 (pg. 
206
-
220
)
Davids
W
Zhang
Z
The impact of horizontal gene transfer in shaping operons and protein interaction networks—direct evidence of preferential attachment
BMC Evol Biol
2008
, vol. 
8
 pg. 
23
 
Doolittle
WF
Lateral genomics
Trends Cell Biol
1999
, vol. 
9
 (pg. 
M5
-
M8
)
Gal-Mor
O
Finlay
BB
Pathogenicity islands: a molecular toolbox for bacterial virulence
Cell Microbiol
2006
, vol. 
8
 (pg. 
1707
-
1719
)
Gogarten
JP
Doolittle
WF
Lawrence
JG
Prokaryotic evolution in light of gene transfer
Mol Biol Evol
2002
, vol. 
19
 (pg. 
2226
-
2238
)
Gogarten
JP
Townsend
JP
Horizontal gene transfer, genome innovation and evolution
Nat Rev Microbiol
2005
, vol. 
3
 (pg. 
679
-
687
)
Graybeal
A
Evaluating the phylogenetic utility of genes: a search for genes informative about deep divergences among vertebrates
Syst Biol
1994
, vol. 
43
 (pg. 
174
-
193
)
Hacker
J
Carniel
E
Ecological fitness, genomic islands and bacterial pathogenicity. A Darwinian view of the evolution of microbes
EMBO Rep
2001
, vol. 
2
 (pg. 
376
-
381
)
Hao
W
Golding
GB
The fate of laterally transferred genes: life in the fast lane to adaptation or death
Genome Res
2006
, vol. 
16
 (pg. 
636
-
643
)
Hao
W
Golding
GB
Uncovering rate variation of lateral gene transfer during bacterial genome evolution
BMC Genomics
2008
, vol. 
9
 pg. 
235
 
Harrington
ED
Jensen
LJ
Bork
P
Predicting biological networks from genomic data
FEBS Lett
2008
, vol. 
582
 (pg. 
1251
-
1258
)
Holden
MT
Feil
EJ
Lindsay
JA
, et al. 
(45 co-authors)
Complete genomes of two clinical Staphylococcus aureus strains: evidence for the rapid evolution of virulence and drug resistance
Proc Natl Acad Sci U S A
2004
, vol. 
101
 (pg. 
9786
-
9791
)
Jain
R
Rivera
MC
Lake
JA
Horizontal gene transfer among genomes: the complexity hypothesis
Proc Natl Acad Sci U S A
1999
, vol. 
96
 (pg. 
3801
-
3806
)
Jain
R
Rivera
MC
Moore
JE
Lake
JA
Horizontal gene transfer in microbial genome evolution
Theor Popul Biol
2002
, vol. 
61
 (pg. 
489
-
495
)
Jensen
LJ
Kuhn
M
Stark
M
, et al. 
(12 co-authors)
STRING 8—a global view on proteins and their functional interactions in 630 organisms
Nucleic Acids Res
2009
, vol. 
37
 (pg. 
D412
-
D416
)
Joshi-Tope
G
Gillespie
M
Vastrik
I
, et al. 
(13 co-authors)
Reactome: a knowledgebase of biological pathways
Nucleic Acids Res
2005
, vol. 
33
 (pg. 
D428
-
D432
)
Kanehisa
M
Araki
M
Goto
S
, et al. 
(11 co-authors)
KEGG for linking genomes to life and the environment
Nucleic Acids Res
2008
, vol. 
36
 (pg. 
D480
-
D484
)
Kanhere
A
Vingron
M
Horizontal Gene transfers in prokaryotes show differential preferences for metabolic and translational genes
BMC Evol Biol
2009
, vol. 
9
 pg. 
9
 
Kerrien
S
Alam-Faruque
Y
Aranda
B
, et al. 
(24 co-authors)
IntAct—open source resource for molecular interaction data
Nucleic Acids Res
2007
, vol. 
35
 (pg. 
D561
-
D565
)
Konstantinidis
KT
Tiedje
JM
Trends between gene content and genome size in prokaryotic species with larger genomes
Proc Natl Acad Sci U S A
2004
, vol. 
101
 (pg. 
3160
-
3165
)
Koonin
EV
Wolf
YI
Genomics of bacteria and archaea: the emerging dynamic view of the prokaryotic world
Nucleic Acids Res
2008
, vol. 
36
 (pg. 
6688
-
6719
)
Koski
LB
Morton
RA
Golding
GB
Codon bias and base composition are poor indicators of horizontally transferred genes
Mol Biol Evol
2001
, vol. 
18
 (pg. 
404
-
412
)
Krylov
DM
Wolf
YI
Rogozin
IB
Koonin
EV
Gene loss, protein sequence divergence, gene dispensability, expression level, and interactivity are correlated in eukaryotic evolution
Genome Res
2003
, vol. 
13
 (pg. 
2229
-
2235
)
Lawrence
JG
Gene transfer, speciation, and the evolution of bacterial genomes
Curr Opin Microbiol
1999
, vol. 
2
 (pg. 
519
-
523
)
Lercher
MJ
Pal
C
Integration of horizontally transferred genes into regulatory interaction networks takes many million years
Mol Biol Evol
2008
, vol. 
25
 pg. 
559
 
Marri
PR
Hao
W
Golding
GB
Gene gain and gene loss in streptococcus: is it driven by habitat?
Mol Biol Evol
2006
, vol. 
23
 (pg. 
2379
-
2391
)
Merkl
R
A comparative categorization of protein function encoded in bacterial or archeal genomic islands
J Mol Evol
2006
, vol. 
62
 (pg. 
1
-
14
)
Minin
VN
Suchard
MA
Counting labeled transitions in continuous-time Markov models of evolution
J Math Biol
2008
, vol. 
56
 (pg. 
391
-
412
)
Mira
A
Klasson
L
Andersson
SG
Microbial genome evolution: sources of variability
Curr Opin Microbiol
2002
, vol. 
5
 (pg. 
506
-
512
)
Mirkin
BG
Fenner
TI
Galperin
MY
Koonin
EV
Algorithms for computing parsimonious evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes
BMC Evol Biol
2003
, vol. 
3
 pg. 
2
 
Moran
NA
McLaughlin
HJ
Sorek
R
The dynamics and time scale of ongoing genomic erosion in symbiotic bacteria
Science
2009
, vol. 
323
 (pg. 
379
-
382
)
Nakamura
Y
Itoh
T
Matsuda
H
Gojobori
T
Biased biological functions of horizontally transferred genes in prokaryotic genomes
Nat Genet
2004
, vol. 
36
 (pg. 
760
-
766
)
Nesbo
CL
Boucher
Y
Doolittle
WF
Defining the core of nontransferable prokaryotic genes: the euryarchaeal core
J Mol Evol
2001
, vol. 
53
 (pg. 
340
-
350
)
Ochman
H
Liu
R
Rocha
EP
Erosion of interaction networks in reduced and degraded genomes
J Exp Zool B Mol Dev Evol
2007
, vol. 
308
 (pg. 
97
-
103
)
Omer
S
Kovacs
A
Mazor
Y
Gophna
U
Integration of a foreign gene into a native complex does not impair fitness in an experimental model of lateral gene transfer
Mol Biol Evol
2010
, vol. 
27
 
11
(pg. 
2441
-
2445
)
Pal
C
Papp
B
Lercher
MJ
Adaptive evolution of bacterial metabolic networks by horizontal gene transfer
Nat Genet
2005
, vol. 
37
 (pg. 
1372
-
1375
)
Pal
C
Papp
B
Lercher
MJ
Csermely
P
Oliver
SG
Hurst
LD
Chance and necessity in the evolution of minimal metabolic networks
Nature
2006
, vol. 
440
 (pg. 
667
-
670
)
Pennisi
E
Microbiology. Researchers trade insights about gene swapping
Science
2004
, vol. 
305
 (pg. 
334
-
335
)
Rivera
MC
Jain
R
Moore
JE
Lake
JA
Genomic evidence for two functionally distinct gene classes
Proc Natl Acad Sci U S A
1998
, vol. 
95
 (pg. 
6239
-
6244
)
Salwinski
L
Miller
CS
Smith
AJ
Pettit
FK
Bowie
JU
Eisenberg
D
The Database of Interacting Proteins: 2004 update
Nucleic Acids Res
2004
, vol. 
32
 (pg. 
D449
-
D451
)
Sicheritz-Ponten
T
Andersson
SG
A phylogenomic approach to microbial evolution
Nucleic Acids Res
2001
, vol. 
29
 (pg. 
545
-
552
)
Skrabanek
L
Saini
HK
Bader
GD
Enright
AJ
Computational prediction of protein-protein interactions
Mol Biotechnol
2008
, vol. 
38
 (pg. 
1
-
17
)
Sorek
R
Zhu
Y
Creevey
CJ
Francino
MP
Bork
P
Rubin
EM
Genome-wide experimental determination of barriers to horizontal gene transfer
Science
2007
, vol. 
318
 (pg. 
1449
-
1452
)
Spencer
M
Sangaralingam
A
A phylogenetic mixture model for gene family loss in parasitic bacteria
Mol Biol Evol
2009
, vol. 
26
 (pg. 
1901
-
1908
)
Syvanen
M
Horizontal gene transfer: evidence and possible consequences
Annu Rev Genet
1994
, vol. 
28
 (pg. 
237
-
261
)
Tatusov
RL
Fedorova
ND
Jackson
JD
, et al. 
(17 co-authors)
The COG database: an updated version includes eukaryotes
BMC Bioinformatics
2003
, vol. 
4
 pg. 
41
 
Vastrik
I
D'Eustachio
P
Schmidt
E
, et al. 
(13 co-authors)
Reactome: a knowledge base of biologic pathways and processes
Genome Biol
2007
, vol. 
8
 pg. 
R39
 
Wang
B
Limitations of compositional approach to identifying horizontally transferred genes
J Mol Evol
2001
, vol. 
53
 (pg. 
244
-
250
)
Wapinski
I
Pfeffer
A
Friedman
N
Regev
A
Natural history and evolutionary principles of gene duplication in fungi
Nature
2007
, vol. 
449
 (pg. 
54
-
61
)
Wellner
A
Gophna
U
Neutrality of foreign complex subunits in an experimental model of lateral gene transfer
Mol Biol Evol
2008
, vol. 
25
 (pg. 
1835
-
1840
)
Wellner
A
Lurie
MN
Gophna
U
Complexity, connectivity, and duplicability as barriers to lateral gene transfer
Genome Biol
2007
, vol. 
8
 pg. 
R156
 
Wolf
YI
Carmel
L
Koonin
EV
Unifying measures of gene function and evolution
Proc Biol Sci
2006
, vol. 
273
 (pg. 
1507
-
1515
)
Xenarios
I
Rice
DW
Salwinski
L
Baron
MK
Marcotte
EM
Eisenberg
D
DIP: the database of interacting proteins
Nucleic Acids Res
2000
, vol. 
28
 (pg. 
289
-
291
)
Yang
Z
Phylogenetic analysis using parsimony and likelihood methods
J Mol Evol
1996
, vol. 
42
 (pg. 
294
-
307
)
Yarza
P
Richter
M
Peplies
J
Euzeby
J
Amann
R
Schleifer
KH
Ludwig
W
Glockner
FO
Rossello-Mora
R
The All-Species Living Tree project: a 16S rRNA-based phylogenetic tree of all sequenced type strains
Syst Appl Microbiol
2008
, vol. 
31
 (pg. 
241
-
250
)
Yosef
N
Kupiec
M
Ruppin
E
Sharan
R
A complex-centric view of protein network evolution
Nucleic Acids Res
2009
, vol. 
37
 pg. 
e88
 

Author notes

Associate editor: Andrew Roger

Supplementary data