Investigating Additive and Replacing Horizontal Gene Transfers Using Phylogenies and Whole Genomes

Abstract Horizontal gene transfer (HGT) is fundamental to microbial evolution and adaptation. When a gene is horizontally transferred, it may either add itself as a new gene to the recipient genome (possibly displacing nonhomologous genes) or replace an existing homologous gene. Currently, studies do not usually distinguish between “additive” and “replacing” HGTs, and their relative frequencies, integration mechanisms, and specific roles in microbial evolution are poorly understood. In this work, we develop a novel computational framework for large-scale classification of HGTs as either additive or replacing. Our framework leverages recently developed phylogenetic approaches for HGT detection and classifies HGTs inferred between terminal edges based on gene orderings along genomes and phylogenetic relationships between the microbial species under consideration. The resulting method, called DART, is highly customizable and scalable and can classify a large fraction of inferred HGTs with high confidence and statistical support. Our application of DART to a large dataset of thousands of gene families from 103 Aeromonas genomes provides insights into the relative frequencies, functional biases, and integration mechanisms of additive and replacing HGTs. Among other results, we find that (i) the relative frequency of additive HGT increases with increasing phylogenetic distance, (ii) replacing HGT dominates at shorter phylogenetic distances, (iii) additive and replacing HGTs have strikingly different functional profiles, (iv) homologous recombination in flanking regions of a novel gene may be a frequent integration mechanism for additive HGT, and (v) phages and mobile genetic elements likely play an important role in facilitating additive HGT.

. Intra-species additive HGTs.Each edge connects two Aeromonas genomes from the same species and corresponds to inferred intra-species additive HGTs between those two genomes.Edges are colored according to the color of the donor genome (the color for each genome is shown on the associated segment in the inner ring).The tip of a edge at the donor end is colored according to the recipient genome's color.The thickness of a edge corresponds to the number of additive HGTs for that donor-recipient pair, as quantified by the numbers around each segment in the inner ring.For each genome, both incoming (where that genome serves as recipient) and outgoing (where that genome serves as donor) edges are shown.The outer ring shows three stacked columns for each genome.Among these three stacked columns, the inner column shows the color distribution of recipients for outgoing edges, the middle column shows the color distribution of donors for incoming edges, and the outer column shown the combined color distribution for both incoming and outgoing edges, for that genome.4 2 2 7  Inter-species additive HGTs.Each edge connects two Aeromonas genomes from different species and corresponds to inferred inter-species additive HGTs between those two genomes.Edges are colored according to the color of the donor genome (the color for each genome is shown on the associated segment in the inner ring).The tip of a edge at the donor end is colored according to the recipient genome's color.The thickness of a edge corresponds to the number of additive HGTs for that donor-recipient pair, as quantified by the numbers around each segment in the inner ring.For each genome, both incoming (where that genome serves as recipient) and outgoing (where that genome serves as donor) edges are shown.The outer ring shows three stacked columns for each genome.Among these three stacked columns, the inner column shows the color distribution of recipients for outgoing edges, the middle column shows the color distribution of donors for incoming edges, and the outer column shown the combined color distribution for both incoming and outgoing edges, for that genome.To discern individual edges, the figure must be viewed magnified on screen.S1.COG functional categories "Z", "Y", "W", and "R" are not shown since no gene in any of the Aeromonas genomes belonged to those categories.Only HGTs that could be assigned to a COG category with known function were considered in this analysis, i.e., HGTs that were assigned to category "S" or which could not be assigned to any COG category were excluded from this plot.S1.COG functional categories "Z", "Y", "W", and "R" are not shown since no gene in any of the Aeromonas genomes belonged to those categories.COG Functional category "S" corresponds to genes whose functions are unknown, while the category "#" corresponds to genes which could not be assigned to any COG functional category.S1.COG functional categories "Z", "Y", "W", and "R" are not shown since no gene in any of the Aeromonas genomes belonged to those categories.COG Functional category "S" corresponds to genes whose functions are unknown, while the category "#" corresponds to genes which could not be assigned to any COG functional category.The tree was inferred using the 16 housekeeping gene multi-locus sequence analysis (MLSA) scheme previously established for use in the Aeromonas by Colston et al. (2014).Bootstrap support values were inferred from 100 replicates computed using the rapid bootstrap algorithm implemented in RAxML.
Figure S1.Intra-species additive HGTs.Each edge connects two Aeromonas genomes from the same species and corresponds to inferred intra-species additive HGTs between those two genomes.Edges are colored according to the color of the donor genome (the color for each genome is shown on the associated segment in the inner ring).The tip of a edge at the donor end is colored according to the recipient genome's color.The thickness of a edge corresponds to the number of additive HGTs for that donor-recipient pair, as quantified by the numbers around each segment in the inner ring.For each genome, both incoming (where that genome serves as recipient) and outgoing (where that genome serves as donor) edges are shown.The outer ring shows three stacked columns for each genome.Among these three stacked columns, the inner column shows the color distribution of recipients for outgoing edges, the middle column shows the color distribution of donors for incoming edges, and the outer column shown the combined color distribution for both incoming and outgoing edges, for that genome.
Figure S2.Inter-species additive HGTs.Each edge connects two Aeromonas genomes from different species and corresponds to inferred inter-species additive HGTs between those two genomes.Edges are colored according to the color of the donor genome (the color for each genome is shown on the associated segment in the inner ring).The tip of a edge at the donor end is colored according to the recipient genome's color.The thickness of a edge corresponds to the number of additive HGTs for that donor-recipient pair, as quantified by the numbers around each segment in the inner ring.For each genome, both incoming (where that genome serves as recipient) and outgoing (where that genome serves as donor) edges are shown.The outer ring shows three stacked columns for each genome.Among these three stacked columns, the inner column shows the color distribution of recipients for outgoing edges, the middle column shows the color distribution of donors for incoming edges, and the outer column shown the combined color distribution for both incoming and outgoing edges, for that genome.To discern individual edges, the figure must be viewed magnified on screen.

Figure S3 .
Figure S3.Fraction of unfiltered additive HGTs by phylogenetic distance.The plot shows the fraction of HGTs classified as additive for donor-recipient pairs separated by different phylogenetic distance ranges.Results are shown for the combined set of full, unfiltered inter-and intra-species HGTs classified as additive and replacing using default parameters.The phylogenetic distance between any donor-recipient pair is the patristic distance (with branch lengths representing substitutions per site) between the two corresponding terminal taxa on the species tree.

Figure S4 .
Figure S4.Functional analysis of additive and replacing HGTs.The figure shows distributions of COG categories with known functions for (i) all genes from all genomes, (ii) all HGTs classified as additive, (iii) all HGTs classified as replacing.Only HGTs present in the filtered classification results were used.Each letter corresponds to a COG functional category as shown in Supplemental TableS1.COG functional categories "Z", "Y", "W", and "R" are not shown since no gene in any of the Aeromonas genomes belonged to those categories.Only HGTs that could be assigned to a COG category with known function were considered in this analysis, i.e., HGTs that were assigned to category "S" or which could not be assigned to any COG category were excluded from this plot.

Figure S5 .
Figure S5.Functional analysis of intra-species additive and replacing HGTs.The figure shows distributions of COG functional categories for (i) all genes from all genomes, (ii) all intra-species HGTs classified as additive, (iii) all intra-species HGTs classified as replacing.Only HGTs present in the filtered classification results were used.Each letter corresponds to a COG functional category as shown in Supplemental TableS1.COG functional categories "Z", "Y", "W", and "R" are not shown since no gene in any of the Aeromonas genomes belonged to those categories.COG Functional category "S" corresponds to genes whose functions are unknown, while the category "#" corresponds to genes which could not be assigned to any COG functional category.

Figure S6 .
Figure S6.Functional analysis of inter-species additive and replacing HGTs.The figure shows distributions of COG functional categories for (i) all genes from all genomes, (ii) all inter-species HGTs classified as additive, (iii) all inter-species HGTs classified as replacing.Only HGTs present in the filtered classification results were used.Each letter corresponds to a COG functional category as shown in Supplemental TableS1.COG functional categories "Z", "Y", "W", and "R" are not shown since no gene in any of the Aeromonas genomes belonged to those categories.COG Functional category "S" corresponds to genes whose functions are unknown, while the category "#" corresponds to genes which could not be assigned to any COG functional category.

Figure S7 .Figure S8 .
Figure S7.Dot plots of cHG 18292 transfer.Pairwise comparison of the donor (top row and left column), recipient (bottom row and right column) and all three neighboring genomes for the transfer of cHG 18292 from Aeromonas veronii F247 to Aeromonas veronii CIP107763.Note the gap along the alignment when comparing CIP107763 against its neighbors (the site of transfer) and the nearby region of repetitive DNA (possible hairpin loop site).

Figure S11 .Figure S12 .
Figure S11.Gene plot of cHG 21480 transfer.Annotations shown are derived from Prokka.Coding direction is shown by arrow heads, and scale bars indicate the relative position in nucleotides along the associated fragment.The region of transfer is located between the genes annotated as Bifunctional purine biosynthesis protein on the left and the oxygen-independent coproporphyrinogen-III oxidase-like protein encoding gene on the right.Abbreviations: dITP/XTP pyrophosphatase: dITP/XTP PP, Bifunctional purine biosynthesis protein: Purine Biosynth, HTH-type transcriptional regulator: HTH-like Regulator, Dihydrofolate reductase: DHFR, Alternative ribosome-rescue factor A: RescF, O-Oxygen-independent coproporphyrinogen-III oxidase-like protein: O-Ind.C-III Ox.

Figure S13 .
Figure S13.The 103-genome Aeromonas species tree with branch lengths and bootstrap support values.The tree was inferred using the 16 housekeeping gene multi-locus sequence analysis (MLSA) scheme previously established for use in the Aeromonas byColston et al. (2014).Bootstrap support values were inferred from 100 replicates computed using the rapid bootstrap algorithm implemented in RAxML.13

Table S3 .
Results of statistical analysis for inter-species additive HGTs as classified by DART.Results are shown for 36 different parameter setting combinations for DART, with the highlighted cell representing default parameter settings.Each cell shows 3 comma-separated values representing the percentage of inter-species additive, replacing, and ambiguous HGTs inferred by DART in the randomization analysis, respectively.The first value in each cell represents the estimated false-positive rate for inter-species additive HGTs classified by DART on the Aeromonas dataset.All results are averaged across 100 randomized runs.

Table S7 .
Genomic context conservation results for unfiltered additive and replacing HGTs.For each category (row) of additive and replacing HGTs the table reports the percentage of HGTs that (i) have the same two flanking genes in both donor and recipient genomes, (ii) have at least one of the two flanking genes in common between donor and recipient, and (iii) have none of the two flanking genes in common between donor and recipient.