- Split View
-
Views
-
Cite
Cite
Roswitha Schmickl, Sarah Marburger, Sian Bray, Levi Yant, Hybrids and horizontal transfer: introgression allows adaptive allele discovery, Journal of Experimental Botany, Volume 68, Issue 20, 28 November 2017, Pages 5453–5470, https://doi.org/10.1093/jxb/erx297
- Share Icon Share
Abstract
Evolution has devised countless remarkable solutions to diverse challenges. Understanding the mechanistic basis of these solutions provides insights into how biological systems can be subtly tweaked without maladaptive consequences. The knowledge gained from illuminating these mechanisms is equally important to our understanding of fundamental evolutionary mechanisms as it is to our hopes of developing truly rational plant breeding and synthetic biology. In particular, modern population genomic approaches are proving very powerful in the detection of candidate alleles for mediating consequential adaptations that can be tested functionally. Especially striking are signals gained from contexts involving genetic transfers between populations, closely related species, or indeed between kingdoms. Here we discuss two major classes of these scenarios, adaptive introgression and horizontal gene flow, illustrating discoveries made across kingdoms.
Introduction
Whether it is gradual or sudden, all organisms face change. Adaptive responses are therefore required for survival, especially in species that cannot migrate. The footprints of these responses are found throughout the genome, serving as powerful signals that tell us how populations have overcome hazards, both biotic and abiotic. The allelic changes at loci mediating adaptive changes are coming to light in rapidly increasing numbers of studies, and thanks to ongoing developments in population genomics, descriptions of these loci appear in remarkably high resolution. As a result, there is now very good evidence that diverse sources of genetic variation underlie important phenotypic changes in wild populations. Among these, introgression is emerging as a widespread fundamental evolutionary force. The term ‘introgressive hybridization’, hereafter referred to as ‘introgression’, was introduced by Anderson and Hubricht (1938). They referred to the introduction of syntenic nucleotide variation by recombination from a donor species into the genome of a recipient species, usually by means of hybridization and backcrossing. We will use the terms 'introgression' and 'gene flow' as synonyms, also in cases when the units that exchange variants are populations of the same species. Largely context-dependent, introgression is influenced by an array of ecological factors that control the degree of contact between species. These biotic or abiotic factors drive selection for or against hybrid genotypes and can lead to complex patterns of genetic admixture (Hand et al., 2015). This selection has important consequences: adaptive introgression commonly results in local adaptation to particular geographically distributed conditions and/or speciation (Dobzhansky, 1937; Mayr 1942; Coyne and Orr, 2004).
Here we discuss the three dominant sources of genetic variation and their relative contributions to adaptation. We show how genomic approaches are revolutionizing the discovery of adaptive alleles involved in natural solutions to diverse challenges. We highlight how the studies of introgression and speciation are linked, with speciation commonly occurring in the face of gene flow. This can create hallmark genomic architectures that facilitate the discovery of adaptive alleles, which can lead to ‘genomic islands of divergence’ that are resistant to gene flow. We argue that the novel merger of hybrid zone analysis and whole genome population resequencing is a powerful novel tool to detect variants mediating diverse adaptations. We provide an overview of important approaches taken to identify introgressed alleles that can be applied by researchers working on virtually any system and that have the potential to unambiguously identify strong candidates for adaptively introgressed alleles. Finally, we highlight horizontal gene transfer (HGT), a special case of adaptive introgression between species with established reproductive boundaries.
Maladaptive introgression
Before discussing introgression generally and adaptive introgression in particular, it is important to note that not all introgression is adaptive; indeed, introgression is a powerful force and can be strongly disruptive, particularly as the result of human activity. Introgression even has the potential to hinder conservation attempts (Allendorf et al., 2001; Edmands, 2007). Due to a shortage of conspecific mates (Vaz Pinto et al., 2016), introgression may drive rare species to extinction by genetic swamping (Wolf et al., 2001; Gómez et al., 2015; Todesco et al., 2016). Conservation attempts based on the translocation of species to reserves outside native ranges can result in introgression and inadvertent admixture that damages the biodiversity of the protected species (e.g. antelope; van Wyk et al., 2017). In addition, introgression between crops or domesticated animals and wild relatives can alter fitness-related traits, such as disease resistance and growth, although every case is unique. Recent examples include introgression from domesticated dogs into wolves (Anderson et al., 2009), between farmed and native salmonids (Glover et al., 2013; Ozerov et al., 2016; Karlsson et al., 2016), from wild pigs into domesticated pigs (Ai et al., 2015), from wildcats into domesticated cats (Ottoni et al., 2017), between maize and teosinte (Hufford et al., 2012; Hufford et al., 2013), from domesticated rice into wild rice (Wang et al., 2017), and between genetically modified plants and their wild relatives (den Nijs et al., 2004). Introgression between native and introduced, often invasive, species has also been reported, with examples including mussels (Saarman and Pogson, 2015), salamander (Fitzpatrick et al., 2010; Wilcox et al., 2015), trout (Hohenlohe et al., 2013; Muhlfeld et al., 2014; Kovach et al., 2015; Kovach et al., 2016) and Ulmus trees (Zalapa et al., 2009). Genomes may be resistant to the introgression of invading alleles when selection favours the native allele, as shown in, for example, trout (Kovach et al., 2015; Kovach et al., 2016) and Arabidopsis thaliana (Lee et al., 2017). Interestingly, adaptive introgression can also occur differentially from one subgenome of an allopolyploid, for example from wheat into Aegilops (Parisod et al., 2013).
Introgression as an engine of adaptive genetic variation
There are three primary sources of genetic variation: (1) pre-existing or ‘standing’ variants, which are the variants already present in a population, (2) new mutations, and (3) introgression (reviewed in Olson-Manning et al., 2012; Hedrick, 2013). Despite introgression traditionally being seen as maladaptive, there is growing literature demonstrating the widespread occurrence of adaptive introgression (reviewed in Mallet et al., 2016). Introgression is implicated to be a powerful adaptive force in a wide array of taxa, including, for example, the malaria transmitting mosquito Anopheles (Clarkson et al., 2014; Fontaine et al., 2015), Heliconius butterflies (Zhang et al., 2016), mice (Song et al., 2011), humans (Racimo et al., 2017), Arabidopsis (Arnold et al., 2016), sunflowers (Whitney et al., 2015), and monkeyflowers (Stankowski and Streisfeld, 2015). However, the relative importance of introgression, de novo mutation, and standing variation is far from resolved. Barton (2001) concluded that adaptive variation engendered by mutation is likely to exceed that brought about by introgression. This is all the more likely if the effective population sizes are large because the probability of a favourable mutation is a function of effective population size. For example, many pests and weeds have tremendous effective population sizes and there is strong evidence that cases of escape from chemical insecticides and herbicides have originated many times from independent novel mutations, for example in Drosophila (Karasov et al., 2010) and in weeds (Délye et al., 2013). However this is not always the case, for example Anopheles traits enhancing vectorial capacity, including the knockdown resistance mediated by a specific single nucleotide polymorphism, were transferred between two hybridizing species (Weill et al., 2000).
Anderson (1949) suggested that introgressed variation should have a higher initial frequency than new adaptive mutations and a lower initial frequency than standing variation. But if introgression is recurrent and results in fit progeny, early frequencies could be much higher, exceeding the adaptive potential of standing variation. Further, the impact of single introgressed variants on the genome can be sizable, typically causing multiple changes within a gene and at times affecting several gene-coding loci. Striking examples involve the transfer of entire complex adaptations via cassettes of multiple linked mutations, such as those in loci that control wing colour patterns for both mimicry and mate recognition in Heliconius butterflies (The Heliconius Genome Consortium, 2012). Such an advantage can also be found in some cases of adaptive standing variation (Bastide et al., 2016) and occasionally for de novo adaptive mutations (Karasov et al., 2010). In addition, introgressed alleles feature the benefit that they are likely to have been pre-tested by selection in a usually closely related donor species and would therefore be less likely to be deleterious than random mutations (Hedrick, 2013). Indeed, introgressed variants that initially have no strong advantage or disadvantage can accumulate in the genome as cryptic variation, which then serves as the raw material for selection when conditions change (reviewed in Paaby and Rockman, 2014).
The ability to introgress is dependent on the degree of divergence, as introgression between more divergent species is usually impeded by pre- and post-zygotic reproductive barriers. It has recently been suggested that polyploidy can occasionally rescue introgression between otherwise reproductively isolated species. For example, polyploidization re-established normal endosperm cellularization and enabled unidirectional interploidal introgression and bidirectional introgression between tetraploids of Arabidopsis arenosa and Arabidopsis lyrata (Lafon-Placette et al., 2017). A case of introgression between the autotetraploid cytotypes of these species reports the exchange of candidate alleles for mediating adaptation to highly challenging serpentine soils (Arnold et al., 2016). Interploidal introgression is generally assumed to be unidirectional, from diploid to polyploid (Stebbins, 1971), although there is evidence that it can also occur in the reverse direction (Ramsey and Schemske, 1998). There have been very few genomic studies on interploidal introgression and nearly all reports detail gene flow from diploids to tetraploids, for example in Betulus (Zohren et al., 2016). Another study in Miscanthus favoured the same unidirectionality of gene flow, although there was some evidence for occasional gene flow from polyploids to diploids (Clark et al., 2015). Due to the scarcity of genomic studies one is left with literature based on only handfuls of molecular markers, the vast majority of which support gene flow from diploids to polyploids, such as in Senecio (Kim et al., 2008; Chapman and Abbott, 2010), Epidendrum (Pinheiro et al., 2010), and Capsella (Han et al., 2015). Except for Kim et al. (2008) and Chapman and Abbott (2010) who found evidence of introgression of fitness-related genes, it is unclear whether interploidal introgression frequently leads to the transfer of adaptive alleles.
Hybridization, particularly in cases where hybridization leads to allopolyploidy, and introgression can also introduce variation at the structural level leading to genome rearrangements and novel gene regulatory pathways through the reactivation of dormant transposable elements (TEs) (reviewed in Fontdevila, 2005). For instance, introgression between cultivated and wild rice led to changes in transcription levels and DNA methylation patterns in TE-rich genomic regions (Liu et al., 2004). Introgression between bread wheat and tall wheatgrass led to various genetic and epigenetic changes, including deletions, differences in gene expression and TE reactivation (Liu et al., 2015). Similar changes have been identified in introgression lines between cauliflower (Brassica oleracea) and black mustard (Brassica nigra), creating variation that might prove useful for trait selection in breeding (Wang et al., 2016).
Introgression aids adaptive allele discovery in the genomic era
The widespread occurrence of introgression suggests promise in mining for adaptive alleles in natural populations that have overcome identifiable biotic or abiotic challenges. In parallel, there is a growing interest in how knowledge obtained from more broadly observing natural solutions devised by evolution might inform breeding efforts. Mining for introgressed alleles between natural populations that are native to contrasting environments offers the possibility to identify candidate alleles that mediate definable adaptations. Indeed, it has been long recognized that especially where fitness-related traits and genotypes show clinal variation, gene-environment associations can provide a window on the mechanisms of natural selection (Endler, 1977). Such information is virtually impossible to discover with experimental crosses, although evolve and resequence experiments can be informative (reviewed in Long et al., 2015). However, mining for adaptive alleles in natural populations with population genomics, namely ‘reverse ecology’ (Li et al., 2008), complements and in many ways surpasses candidate gene-based approaches. By providing a genome-wide view of the divergence landscape, population genomic studies can overcome key limitations of candidate-based approaches; for instance, they can detect polygenic adaptation provided markers are dense, that is to say genome resequencing as opposed to restriction site associated DNA sequencing. Such mining for co-evolved alleles involved in a trait is facilitated if frequencies of these alleles follow a clinal pattern in a hybrid zone that may coincide with an environmental gradient.
Genome-wide data from populations that are characterized by a history of extensive gene flow can provide detailed insights into the genetic basis of adaptive divergence. Usually in such cases only a few loci under selection rise above the neutral background that is homogenized by gene flow. Here we focus on these population genomic studies, because they can leverage sufficient resolution and statistical power to detect non-random patterns of introgression. We note, however, that crucial insights into the evolutionary role of adaptive introgression were made before the advent of high-throughput sequencing and population genomic datasets (reviewed in Arnold and Martin, 2009). Indeed, population genomics and comparative phylogenomics are young fields and sometimes issues with data quality or analytical rigor can raise concerns that render results ambiguous (Brower, 2013; Wen et al., 2016). Nevertheless, with well-designed sampling and increasingly sophisticated analysis, evidence of genomic heterogeneity in patterns of gene flow has been accumulating at all levels from weakly (Arnold et al., 2016) to highly differentiated (Pardo-Diaz et al., 2012) species. Understanding these population- and genome-wide patterns of introgression in species with diverse adaptations has broad potential to contribute to our understanding of the fundamental mechanisms of adaptation.
The genomic architecture of introgression
The study of introgression in natural populations is deeply connected with the study of speciation. It is increasingly recognized that speciation frequently occurs in the face of gene flow; either through continual gene flow or phases of secondary contact in species that have already established partial reproductive isolation, hereafter referred to as parapatrically isolated species (PaIS) (Nosil, 2008; Feder et al., 2012; Abbott et al., 2013; Harrison and Larson, 2014; Mallet et al., 2016; Shapiro et al., 2016; Arnold, 2016). One iconic example is the sunflower Helianthus annuus subsp. texanus, which originated as a result of introgression and shows different adaptations compared with its parents, mainly abiotic tolerance traits (Whitney et al., 2010; Whitney et al., 2015). It may be difficult to discriminate ongoing divergence from secondary contact in PaIS (Endler, 1977; Harrison, 2012; Gompert and Buerkle, 2016), especially in cases where species have recently diverged. Both scenarios of introgression offer potential insights into the mechanisms of adaptive introgression, although major findings have been made in cases of secondary contact (Table 1), as speciation-with-gene-flow studies usually aim to identify loci involved in the establishment and maintenance of reproductive isolation and not loci involved in local adaptation. Strong evidence for secondary contact in major systems that allow the study of adaptive introgression comes from clear estimates of the time of secondary contact as well as the delineation of clearly defined hybrid zones. Both the time of species divergence and secondary contact are relatively young in these systems (Table 1). Except for the hybridization between Heliconius erato and H. melpomene (Martin et al., 2013; Kozak et al., 2015) and Saccharomyces cerevisiae and S. uvarum (Kellis et al., 2003), all other study systems show divergence times for the introgressing species of usually much less than 2 mya and times of secondary contact between a few hundred and few thousand years before the present time.
Hybrid partners . | Genomic data types . | Methods applied . | Adaptive traits or genes influenced . | Start of divergence of the hybridizing taxa . | Estimated time of secondary contact . | Presence of hybrid zone(s) . |
---|---|---|---|---|---|---|
Arabidopsis | ||||||
Between Arabidopsis arenosa and A. lyrata (Schmickl and Koch, 2011; Arnold et al., 2016) | Whole genome sequencing (Arnold et al., 2016) | Fastsimcoal2, fD statistic, HybridCheck; genome scan | Genes controlling specific ion homeostasis-related and drought adaptation traits (Arnold et al., 2016) | ~1.63 mya (Hohmann et al., 2015) | Eastern Austrian Forealps (Schmickl and Koch, 2011) | |
Sunflowers | ||||||
From Helianthus debilis subsp. cucumerifolius into H. annuus subsp. annuus (Whitney et al., 2015) | SNP dataset (QTL mapping) | Primarily abiotic tolerance traits allowing rapid growth and reproduction before summer heat and drought, but also traits affecting herbivore resistance | Divergence between the ancestors of H. debilis and H. annuus ~2 mya (Renaut et al., 2013) | Maximum ~18,000 ya (Scascitelli et al., 2010) | South-central USA | |
Senecio | ||||||
From diploid Senecio squalidus into tetraploid S. vulgaris (Kim et al., 2008) | [No large genome- wide dataset] Sanger sequencing, RNA in situ hybridization | Genes RAY1 and RAY2 that control ray florets in radiate flower heads, which effects attractiveness to pollinators and outcrossing rate | S. squalidus is a homoploid hybrid with an origin less than ~150,000 ya (Filatov et al., 2016) | Maximum ~300 ya (S. squalidus was introduced to the Oxford Botanic Garden around 300 ya) | United Kingdom | |
Poplars | ||||||
From Populus balsamifera into P. trichocarpa (Suarez- Gonzalez et al., 2016) | Whole genome sequencing, RNAseq | Genome scan | Numerous genes, particularly in a telomeric region (PRR5 and COMT1); related to response to far red light, RNA processing and ATPase activity | ~76,000 ya (Levsen et al., 2012) | Northwestern North America | |
Mussels | ||||||
From Mytilus edulis into M. galloprovincialis (Fraïsse et al., 2014; Fraïsse et al., 2016) | AFLP and few other markers, Sanger sequencing (Fraïsse et al., 2014); target enrichment (Fraïsse et al., 2016) | Genome scan | Actin gene mac-1 (Fraïsse et al., 2014; but not Fraïsse et al., 2016) | ~2.5 mya (Roux et al., 2014) | ~0.7 mya (Roux et al., 2014) | In Europe from the Mediterranean Sea to the North Sea |
Drosophila | ||||||
From Drosophila simulans into D. sechellia (Garrigan et al., 2012; Brand et al., 2013) | Garrigan et al., 2012: whole genome sequencing; Brand et al., 2013: Sanger sequencing of the candidate region identified by Garrigan et al., 2012 | Genome scan (Garrigan et al., 2012) | A 15 kb region (Garrigan et al., 2012) – the target of selection appears to be a regulatory sequence; evidence for a trans-specific selective sweep (Brand et al., 2013), i.e., the 15 kb region harbours a polymorphism that is adaptive in both species | ~200,000–300,000 ya (Garrigan et al., 2012) | 2,600–18,600 ya | Seychelles |
From D. yakuba into D. santomea (Llopart et al., 2014; Beck et al., 2015) | Llopart et al., 2014: Sanger sequencing and high-throughput sequencing of the mitochondrial genome; Beck et al., 2015: Sanger sequencing | Potential cytonuclear co-introgression of genes involved in the oxidative phosphorylation pathway; strong signal of introgression in the three nuclear genes composing subunit V of the cytochrome c oxidase complex (Beck et al., 2015) | ~400,000 ya (Llopart et al., 2002) | ~14,000 ya (complete replacement of the mitochondrial genome) and ~2,500 years ago (Llopart et al., 2014) | São Tomé (invasion by D. yakuba) | |
Anopheles | ||||||
Between Anopheles arabiensis and the ancestor of A. gambiae and A. coluzzii (Fontaine et al., 2015) | Whole genome sequencing | D and DFOIL statistics, window-based gene tree distribution across chromosomes | Insecticide-resistance alleles might have introgressed into A. arabiensis with the 2La inversion introgressed from the ancestor of A. gambiae and A. coluzzii | Divergence between A. gambiae and A. coluzzii from its ancestor ~2 mya | ~0.5 mya | Central Africa |
From A. gambiae into A. coluzzii (Norris et al., 2015) | SNP genotyping, whole genome sequencing | Traits enhancing vectorial capacity: A. coluzzii inherited the entire A. gambiae- associated 2L divergence island including the knockdown resistance SNP L1014F in the kdr gene (L1014F has been present in A. gambiae populations for at least 20 years) | Periods of isolation interrupted by hybridization; recent introgression reported in 2002 and 2006 (Lee et al., 2013b ) | Selinkenyi, Mali | ||
Heliconius butterflies Other studies that focused on speciation with ongoing gene flow and adaptive introgression: e.g. Pardo-Diaz et al., 2012; The Heliconius Genome Consortium, 2012; Nadeau et al., 2013; Kronforst et al., 2013; Martin et al., 2013) | ||||||
Between subspecies of Heliconius erato and H. melpomene (Nadeau et al., 2014) | RAD-seq | Structure; genome scan | Loci that control wing color patterns (for both mimicry and mate recognition); the aim of this study was to identify novel loci underlying phenotypic variation by means of association mapping in hybrid zones | ~11.8 mya (Kozak et al., 2015) | Several time points post speciation (Martin et al., 2013) | Parallel zones in the Andes |
Between H. beschkei, H. numata and H. numata subsp. nanna (Zhang et al., 2016) | Whole genome sequencing | D and fD statistics | Genomic region upstream of the gene optix (known to control red wing patterning) and 39 additional genomic regions | Divergence between H. beschkei and H. numata ~1.93 mya | East-central South America | |
Darwin’s finches | ||||||
Species from different Galápagos islands (Lamichhaney et al., 2015) | Whole genome sequencing | PSMC, NeighborNet, D statistic; genome scan | ALX1 gene encodes a transcription factor controlling craniofacial development, in particular beak shape | ~0.5 mya | Several time points post speciation | Galápagos islands |
Mice | ||||||
From Mus spretus into M. musculus subsp. domesticus (Song et al., 2011) | [No large genome- wide dataset] Sanger sequencing | >10 Mb region including the molecular target of anticoagulants vkorc1; M. spretus-specific alleles can cause resistance to rodenticides | ~1.5–3 mya | Africa and Europe | ||
Between M. musculus subsp. domesticus and M. musculus subsp. musculus (Staubach et al., 2012) | SNP genotyping | Genome scan | No general pattern of types of genes subject to introgression | 300,000–500,000 ya (Guénet and Bonhomme, 2003) | Maximum ~3,000 ya (Cucchi et al., 2005) | Eastern and southern Germany, western Czech Republic |
Humans Other studies identified genomic regions with evidence of adaptive introgression between archaic and modern humans (from Neanderthals into modern humans: e.g., Vernot and Akey, 2014; Sankararaman et al., 2014 / from Denisovans into Tibetans: e.g., Jeong et al., 2014; Huerta-Sánchez et al., 2014) | ||||||
From Neanderthals and Denisovans into modern humans (Racimo et al., 2017) | Whole genome sequencing; reevaluation of published data | D and fD statistics, RD, U and Q95 statistics, Haplostrips | Lipid metabolism, pigmentation and innate immunity | Divergence between the Neanderthal/ Denisovan lineage and modern humans 170,000–700,000 ya (Meyer et al., 2012) | Between Neanderthals and modern humans 47,000– 65,000 ya (Sankararaman et al., 2012) | Eurasia |
Yeast | ||||||
Between Saccharomyces cerevisiae and S. uvarum (Dunn et al., 2013) | Whole genome sequencing, colony-PCR and qRT-PCR | MEP2 fusion gene encodes for a high-affinity ammonium permease, which may be adaptive under nitrogen-poor environments with ammonium as only nitrogen source | ~20 mya (Kellis et al., 2003) | de novo creation of hybrid followed by continuous ammonium limitation | Experimental populations | |
Albugo candida | ||||||
Between the ancestors of three races; an adaptive radiation in progress (McMullan et al., 2015) | Whole genome sequencing, RNAseq | HybridCheck | Effector alleles that may result in a fitness advantage on most potential hosts, enabling host jumps | Multiple time points; oldest event ~200,000 ya, most recent event ~2,200 ya | Infection with virulent A. candida suppresses host immunity enabling co-colonization by otherwise non-virulent races |
Hybrid partners . | Genomic data types . | Methods applied . | Adaptive traits or genes influenced . | Start of divergence of the hybridizing taxa . | Estimated time of secondary contact . | Presence of hybrid zone(s) . |
---|---|---|---|---|---|---|
Arabidopsis | ||||||
Between Arabidopsis arenosa and A. lyrata (Schmickl and Koch, 2011; Arnold et al., 2016) | Whole genome sequencing (Arnold et al., 2016) | Fastsimcoal2, fD statistic, HybridCheck; genome scan | Genes controlling specific ion homeostasis-related and drought adaptation traits (Arnold et al., 2016) | ~1.63 mya (Hohmann et al., 2015) | Eastern Austrian Forealps (Schmickl and Koch, 2011) | |
Sunflowers | ||||||
From Helianthus debilis subsp. cucumerifolius into H. annuus subsp. annuus (Whitney et al., 2015) | SNP dataset (QTL mapping) | Primarily abiotic tolerance traits allowing rapid growth and reproduction before summer heat and drought, but also traits affecting herbivore resistance | Divergence between the ancestors of H. debilis and H. annuus ~2 mya (Renaut et al., 2013) | Maximum ~18,000 ya (Scascitelli et al., 2010) | South-central USA | |
Senecio | ||||||
From diploid Senecio squalidus into tetraploid S. vulgaris (Kim et al., 2008) | [No large genome- wide dataset] Sanger sequencing, RNA in situ hybridization | Genes RAY1 and RAY2 that control ray florets in radiate flower heads, which effects attractiveness to pollinators and outcrossing rate | S. squalidus is a homoploid hybrid with an origin less than ~150,000 ya (Filatov et al., 2016) | Maximum ~300 ya (S. squalidus was introduced to the Oxford Botanic Garden around 300 ya) | United Kingdom | |
Poplars | ||||||
From Populus balsamifera into P. trichocarpa (Suarez- Gonzalez et al., 2016) | Whole genome sequencing, RNAseq | Genome scan | Numerous genes, particularly in a telomeric region (PRR5 and COMT1); related to response to far red light, RNA processing and ATPase activity | ~76,000 ya (Levsen et al., 2012) | Northwestern North America | |
Mussels | ||||||
From Mytilus edulis into M. galloprovincialis (Fraïsse et al., 2014; Fraïsse et al., 2016) | AFLP and few other markers, Sanger sequencing (Fraïsse et al., 2014); target enrichment (Fraïsse et al., 2016) | Genome scan | Actin gene mac-1 (Fraïsse et al., 2014; but not Fraïsse et al., 2016) | ~2.5 mya (Roux et al., 2014) | ~0.7 mya (Roux et al., 2014) | In Europe from the Mediterranean Sea to the North Sea |
Drosophila | ||||||
From Drosophila simulans into D. sechellia (Garrigan et al., 2012; Brand et al., 2013) | Garrigan et al., 2012: whole genome sequencing; Brand et al., 2013: Sanger sequencing of the candidate region identified by Garrigan et al., 2012 | Genome scan (Garrigan et al., 2012) | A 15 kb region (Garrigan et al., 2012) – the target of selection appears to be a regulatory sequence; evidence for a trans-specific selective sweep (Brand et al., 2013), i.e., the 15 kb region harbours a polymorphism that is adaptive in both species | ~200,000–300,000 ya (Garrigan et al., 2012) | 2,600–18,600 ya | Seychelles |
From D. yakuba into D. santomea (Llopart et al., 2014; Beck et al., 2015) | Llopart et al., 2014: Sanger sequencing and high-throughput sequencing of the mitochondrial genome; Beck et al., 2015: Sanger sequencing | Potential cytonuclear co-introgression of genes involved in the oxidative phosphorylation pathway; strong signal of introgression in the three nuclear genes composing subunit V of the cytochrome c oxidase complex (Beck et al., 2015) | ~400,000 ya (Llopart et al., 2002) | ~14,000 ya (complete replacement of the mitochondrial genome) and ~2,500 years ago (Llopart et al., 2014) | São Tomé (invasion by D. yakuba) | |
Anopheles | ||||||
Between Anopheles arabiensis and the ancestor of A. gambiae and A. coluzzii (Fontaine et al., 2015) | Whole genome sequencing | D and DFOIL statistics, window-based gene tree distribution across chromosomes | Insecticide-resistance alleles might have introgressed into A. arabiensis with the 2La inversion introgressed from the ancestor of A. gambiae and A. coluzzii | Divergence between A. gambiae and A. coluzzii from its ancestor ~2 mya | ~0.5 mya | Central Africa |
From A. gambiae into A. coluzzii (Norris et al., 2015) | SNP genotyping, whole genome sequencing | Traits enhancing vectorial capacity: A. coluzzii inherited the entire A. gambiae- associated 2L divergence island including the knockdown resistance SNP L1014F in the kdr gene (L1014F has been present in A. gambiae populations for at least 20 years) | Periods of isolation interrupted by hybridization; recent introgression reported in 2002 and 2006 (Lee et al., 2013b ) | Selinkenyi, Mali | ||
Heliconius butterflies Other studies that focused on speciation with ongoing gene flow and adaptive introgression: e.g. Pardo-Diaz et al., 2012; The Heliconius Genome Consortium, 2012; Nadeau et al., 2013; Kronforst et al., 2013; Martin et al., 2013) | ||||||
Between subspecies of Heliconius erato and H. melpomene (Nadeau et al., 2014) | RAD-seq | Structure; genome scan | Loci that control wing color patterns (for both mimicry and mate recognition); the aim of this study was to identify novel loci underlying phenotypic variation by means of association mapping in hybrid zones | ~11.8 mya (Kozak et al., 2015) | Several time points post speciation (Martin et al., 2013) | Parallel zones in the Andes |
Between H. beschkei, H. numata and H. numata subsp. nanna (Zhang et al., 2016) | Whole genome sequencing | D and fD statistics | Genomic region upstream of the gene optix (known to control red wing patterning) and 39 additional genomic regions | Divergence between H. beschkei and H. numata ~1.93 mya | East-central South America | |
Darwin’s finches | ||||||
Species from different Galápagos islands (Lamichhaney et al., 2015) | Whole genome sequencing | PSMC, NeighborNet, D statistic; genome scan | ALX1 gene encodes a transcription factor controlling craniofacial development, in particular beak shape | ~0.5 mya | Several time points post speciation | Galápagos islands |
Mice | ||||||
From Mus spretus into M. musculus subsp. domesticus (Song et al., 2011) | [No large genome- wide dataset] Sanger sequencing | >10 Mb region including the molecular target of anticoagulants vkorc1; M. spretus-specific alleles can cause resistance to rodenticides | ~1.5–3 mya | Africa and Europe | ||
Between M. musculus subsp. domesticus and M. musculus subsp. musculus (Staubach et al., 2012) | SNP genotyping | Genome scan | No general pattern of types of genes subject to introgression | 300,000–500,000 ya (Guénet and Bonhomme, 2003) | Maximum ~3,000 ya (Cucchi et al., 2005) | Eastern and southern Germany, western Czech Republic |
Humans Other studies identified genomic regions with evidence of adaptive introgression between archaic and modern humans (from Neanderthals into modern humans: e.g., Vernot and Akey, 2014; Sankararaman et al., 2014 / from Denisovans into Tibetans: e.g., Jeong et al., 2014; Huerta-Sánchez et al., 2014) | ||||||
From Neanderthals and Denisovans into modern humans (Racimo et al., 2017) | Whole genome sequencing; reevaluation of published data | D and fD statistics, RD, U and Q95 statistics, Haplostrips | Lipid metabolism, pigmentation and innate immunity | Divergence between the Neanderthal/ Denisovan lineage and modern humans 170,000–700,000 ya (Meyer et al., 2012) | Between Neanderthals and modern humans 47,000– 65,000 ya (Sankararaman et al., 2012) | Eurasia |
Yeast | ||||||
Between Saccharomyces cerevisiae and S. uvarum (Dunn et al., 2013) | Whole genome sequencing, colony-PCR and qRT-PCR | MEP2 fusion gene encodes for a high-affinity ammonium permease, which may be adaptive under nitrogen-poor environments with ammonium as only nitrogen source | ~20 mya (Kellis et al., 2003) | de novo creation of hybrid followed by continuous ammonium limitation | Experimental populations | |
Albugo candida | ||||||
Between the ancestors of three races; an adaptive radiation in progress (McMullan et al., 2015) | Whole genome sequencing, RNAseq | HybridCheck | Effector alleles that may result in a fitness advantage on most potential hosts, enabling host jumps | Multiple time points; oldest event ~200,000 ya, most recent event ~2,200 ya | Infection with virulent A. candida suppresses host immunity enabling co-colonization by otherwise non-virulent races |
Hybrid partners . | Genomic data types . | Methods applied . | Adaptive traits or genes influenced . | Start of divergence of the hybridizing taxa . | Estimated time of secondary contact . | Presence of hybrid zone(s) . |
---|---|---|---|---|---|---|
Arabidopsis | ||||||
Between Arabidopsis arenosa and A. lyrata (Schmickl and Koch, 2011; Arnold et al., 2016) | Whole genome sequencing (Arnold et al., 2016) | Fastsimcoal2, fD statistic, HybridCheck; genome scan | Genes controlling specific ion homeostasis-related and drought adaptation traits (Arnold et al., 2016) | ~1.63 mya (Hohmann et al., 2015) | Eastern Austrian Forealps (Schmickl and Koch, 2011) | |
Sunflowers | ||||||
From Helianthus debilis subsp. cucumerifolius into H. annuus subsp. annuus (Whitney et al., 2015) | SNP dataset (QTL mapping) | Primarily abiotic tolerance traits allowing rapid growth and reproduction before summer heat and drought, but also traits affecting herbivore resistance | Divergence between the ancestors of H. debilis and H. annuus ~2 mya (Renaut et al., 2013) | Maximum ~18,000 ya (Scascitelli et al., 2010) | South-central USA | |
Senecio | ||||||
From diploid Senecio squalidus into tetraploid S. vulgaris (Kim et al., 2008) | [No large genome- wide dataset] Sanger sequencing, RNA in situ hybridization | Genes RAY1 and RAY2 that control ray florets in radiate flower heads, which effects attractiveness to pollinators and outcrossing rate | S. squalidus is a homoploid hybrid with an origin less than ~150,000 ya (Filatov et al., 2016) | Maximum ~300 ya (S. squalidus was introduced to the Oxford Botanic Garden around 300 ya) | United Kingdom | |
Poplars | ||||||
From Populus balsamifera into P. trichocarpa (Suarez- Gonzalez et al., 2016) | Whole genome sequencing, RNAseq | Genome scan | Numerous genes, particularly in a telomeric region (PRR5 and COMT1); related to response to far red light, RNA processing and ATPase activity | ~76,000 ya (Levsen et al., 2012) | Northwestern North America | |
Mussels | ||||||
From Mytilus edulis into M. galloprovincialis (Fraïsse et al., 2014; Fraïsse et al., 2016) | AFLP and few other markers, Sanger sequencing (Fraïsse et al., 2014); target enrichment (Fraïsse et al., 2016) | Genome scan | Actin gene mac-1 (Fraïsse et al., 2014; but not Fraïsse et al., 2016) | ~2.5 mya (Roux et al., 2014) | ~0.7 mya (Roux et al., 2014) | In Europe from the Mediterranean Sea to the North Sea |
Drosophila | ||||||
From Drosophila simulans into D. sechellia (Garrigan et al., 2012; Brand et al., 2013) | Garrigan et al., 2012: whole genome sequencing; Brand et al., 2013: Sanger sequencing of the candidate region identified by Garrigan et al., 2012 | Genome scan (Garrigan et al., 2012) | A 15 kb region (Garrigan et al., 2012) – the target of selection appears to be a regulatory sequence; evidence for a trans-specific selective sweep (Brand et al., 2013), i.e., the 15 kb region harbours a polymorphism that is adaptive in both species | ~200,000–300,000 ya (Garrigan et al., 2012) | 2,600–18,600 ya | Seychelles |
From D. yakuba into D. santomea (Llopart et al., 2014; Beck et al., 2015) | Llopart et al., 2014: Sanger sequencing and high-throughput sequencing of the mitochondrial genome; Beck et al., 2015: Sanger sequencing | Potential cytonuclear co-introgression of genes involved in the oxidative phosphorylation pathway; strong signal of introgression in the three nuclear genes composing subunit V of the cytochrome c oxidase complex (Beck et al., 2015) | ~400,000 ya (Llopart et al., 2002) | ~14,000 ya (complete replacement of the mitochondrial genome) and ~2,500 years ago (Llopart et al., 2014) | São Tomé (invasion by D. yakuba) | |
Anopheles | ||||||
Between Anopheles arabiensis and the ancestor of A. gambiae and A. coluzzii (Fontaine et al., 2015) | Whole genome sequencing | D and DFOIL statistics, window-based gene tree distribution across chromosomes | Insecticide-resistance alleles might have introgressed into A. arabiensis with the 2La inversion introgressed from the ancestor of A. gambiae and A. coluzzii | Divergence between A. gambiae and A. coluzzii from its ancestor ~2 mya | ~0.5 mya | Central Africa |
From A. gambiae into A. coluzzii (Norris et al., 2015) | SNP genotyping, whole genome sequencing | Traits enhancing vectorial capacity: A. coluzzii inherited the entire A. gambiae- associated 2L divergence island including the knockdown resistance SNP L1014F in the kdr gene (L1014F has been present in A. gambiae populations for at least 20 years) | Periods of isolation interrupted by hybridization; recent introgression reported in 2002 and 2006 (Lee et al., 2013b ) | Selinkenyi, Mali | ||
Heliconius butterflies Other studies that focused on speciation with ongoing gene flow and adaptive introgression: e.g. Pardo-Diaz et al., 2012; The Heliconius Genome Consortium, 2012; Nadeau et al., 2013; Kronforst et al., 2013; Martin et al., 2013) | ||||||
Between subspecies of Heliconius erato and H. melpomene (Nadeau et al., 2014) | RAD-seq | Structure; genome scan | Loci that control wing color patterns (for both mimicry and mate recognition); the aim of this study was to identify novel loci underlying phenotypic variation by means of association mapping in hybrid zones | ~11.8 mya (Kozak et al., 2015) | Several time points post speciation (Martin et al., 2013) | Parallel zones in the Andes |
Between H. beschkei, H. numata and H. numata subsp. nanna (Zhang et al., 2016) | Whole genome sequencing | D and fD statistics | Genomic region upstream of the gene optix (known to control red wing patterning) and 39 additional genomic regions | Divergence between H. beschkei and H. numata ~1.93 mya | East-central South America | |
Darwin’s finches | ||||||
Species from different Galápagos islands (Lamichhaney et al., 2015) | Whole genome sequencing | PSMC, NeighborNet, D statistic; genome scan | ALX1 gene encodes a transcription factor controlling craniofacial development, in particular beak shape | ~0.5 mya | Several time points post speciation | Galápagos islands |
Mice | ||||||
From Mus spretus into M. musculus subsp. domesticus (Song et al., 2011) | [No large genome- wide dataset] Sanger sequencing | >10 Mb region including the molecular target of anticoagulants vkorc1; M. spretus-specific alleles can cause resistance to rodenticides | ~1.5–3 mya | Africa and Europe | ||
Between M. musculus subsp. domesticus and M. musculus subsp. musculus (Staubach et al., 2012) | SNP genotyping | Genome scan | No general pattern of types of genes subject to introgression | 300,000–500,000 ya (Guénet and Bonhomme, 2003) | Maximum ~3,000 ya (Cucchi et al., 2005) | Eastern and southern Germany, western Czech Republic |
Humans Other studies identified genomic regions with evidence of adaptive introgression between archaic and modern humans (from Neanderthals into modern humans: e.g., Vernot and Akey, 2014; Sankararaman et al., 2014 / from Denisovans into Tibetans: e.g., Jeong et al., 2014; Huerta-Sánchez et al., 2014) | ||||||
From Neanderthals and Denisovans into modern humans (Racimo et al., 2017) | Whole genome sequencing; reevaluation of published data | D and fD statistics, RD, U and Q95 statistics, Haplostrips | Lipid metabolism, pigmentation and innate immunity | Divergence between the Neanderthal/ Denisovan lineage and modern humans 170,000–700,000 ya (Meyer et al., 2012) | Between Neanderthals and modern humans 47,000– 65,000 ya (Sankararaman et al., 2012) | Eurasia |
Yeast | ||||||
Between Saccharomyces cerevisiae and S. uvarum (Dunn et al., 2013) | Whole genome sequencing, colony-PCR and qRT-PCR | MEP2 fusion gene encodes for a high-affinity ammonium permease, which may be adaptive under nitrogen-poor environments with ammonium as only nitrogen source | ~20 mya (Kellis et al., 2003) | de novo creation of hybrid followed by continuous ammonium limitation | Experimental populations | |
Albugo candida | ||||||
Between the ancestors of three races; an adaptive radiation in progress (McMullan et al., 2015) | Whole genome sequencing, RNAseq | HybridCheck | Effector alleles that may result in a fitness advantage on most potential hosts, enabling host jumps | Multiple time points; oldest event ~200,000 ya, most recent event ~2,200 ya | Infection with virulent A. candida suppresses host immunity enabling co-colonization by otherwise non-virulent races |
Hybrid partners . | Genomic data types . | Methods applied . | Adaptive traits or genes influenced . | Start of divergence of the hybridizing taxa . | Estimated time of secondary contact . | Presence of hybrid zone(s) . |
---|---|---|---|---|---|---|
Arabidopsis | ||||||
Between Arabidopsis arenosa and A. lyrata (Schmickl and Koch, 2011; Arnold et al., 2016) | Whole genome sequencing (Arnold et al., 2016) | Fastsimcoal2, fD statistic, HybridCheck; genome scan | Genes controlling specific ion homeostasis-related and drought adaptation traits (Arnold et al., 2016) | ~1.63 mya (Hohmann et al., 2015) | Eastern Austrian Forealps (Schmickl and Koch, 2011) | |
Sunflowers | ||||||
From Helianthus debilis subsp. cucumerifolius into H. annuus subsp. annuus (Whitney et al., 2015) | SNP dataset (QTL mapping) | Primarily abiotic tolerance traits allowing rapid growth and reproduction before summer heat and drought, but also traits affecting herbivore resistance | Divergence between the ancestors of H. debilis and H. annuus ~2 mya (Renaut et al., 2013) | Maximum ~18,000 ya (Scascitelli et al., 2010) | South-central USA | |
Senecio | ||||||
From diploid Senecio squalidus into tetraploid S. vulgaris (Kim et al., 2008) | [No large genome- wide dataset] Sanger sequencing, RNA in situ hybridization | Genes RAY1 and RAY2 that control ray florets in radiate flower heads, which effects attractiveness to pollinators and outcrossing rate | S. squalidus is a homoploid hybrid with an origin less than ~150,000 ya (Filatov et al., 2016) | Maximum ~300 ya (S. squalidus was introduced to the Oxford Botanic Garden around 300 ya) | United Kingdom | |
Poplars | ||||||
From Populus balsamifera into P. trichocarpa (Suarez- Gonzalez et al., 2016) | Whole genome sequencing, RNAseq | Genome scan | Numerous genes, particularly in a telomeric region (PRR5 and COMT1); related to response to far red light, RNA processing and ATPase activity | ~76,000 ya (Levsen et al., 2012) | Northwestern North America | |
Mussels | ||||||
From Mytilus edulis into M. galloprovincialis (Fraïsse et al., 2014; Fraïsse et al., 2016) | AFLP and few other markers, Sanger sequencing (Fraïsse et al., 2014); target enrichment (Fraïsse et al., 2016) | Genome scan | Actin gene mac-1 (Fraïsse et al., 2014; but not Fraïsse et al., 2016) | ~2.5 mya (Roux et al., 2014) | ~0.7 mya (Roux et al., 2014) | In Europe from the Mediterranean Sea to the North Sea |
Drosophila | ||||||
From Drosophila simulans into D. sechellia (Garrigan et al., 2012; Brand et al., 2013) | Garrigan et al., 2012: whole genome sequencing; Brand et al., 2013: Sanger sequencing of the candidate region identified by Garrigan et al., 2012 | Genome scan (Garrigan et al., 2012) | A 15 kb region (Garrigan et al., 2012) – the target of selection appears to be a regulatory sequence; evidence for a trans-specific selective sweep (Brand et al., 2013), i.e., the 15 kb region harbours a polymorphism that is adaptive in both species | ~200,000–300,000 ya (Garrigan et al., 2012) | 2,600–18,600 ya | Seychelles |
From D. yakuba into D. santomea (Llopart et al., 2014; Beck et al., 2015) | Llopart et al., 2014: Sanger sequencing and high-throughput sequencing of the mitochondrial genome; Beck et al., 2015: Sanger sequencing | Potential cytonuclear co-introgression of genes involved in the oxidative phosphorylation pathway; strong signal of introgression in the three nuclear genes composing subunit V of the cytochrome c oxidase complex (Beck et al., 2015) | ~400,000 ya (Llopart et al., 2002) | ~14,000 ya (complete replacement of the mitochondrial genome) and ~2,500 years ago (Llopart et al., 2014) | São Tomé (invasion by D. yakuba) | |
Anopheles | ||||||
Between Anopheles arabiensis and the ancestor of A. gambiae and A. coluzzii (Fontaine et al., 2015) | Whole genome sequencing | D and DFOIL statistics, window-based gene tree distribution across chromosomes | Insecticide-resistance alleles might have introgressed into A. arabiensis with the 2La inversion introgressed from the ancestor of A. gambiae and A. coluzzii | Divergence between A. gambiae and A. coluzzii from its ancestor ~2 mya | ~0.5 mya | Central Africa |
From A. gambiae into A. coluzzii (Norris et al., 2015) | SNP genotyping, whole genome sequencing | Traits enhancing vectorial capacity: A. coluzzii inherited the entire A. gambiae- associated 2L divergence island including the knockdown resistance SNP L1014F in the kdr gene (L1014F has been present in A. gambiae populations for at least 20 years) | Periods of isolation interrupted by hybridization; recent introgression reported in 2002 and 2006 (Lee et al., 2013b ) | Selinkenyi, Mali | ||
Heliconius butterflies Other studies that focused on speciation with ongoing gene flow and adaptive introgression: e.g. Pardo-Diaz et al., 2012; The Heliconius Genome Consortium, 2012; Nadeau et al., 2013; Kronforst et al., 2013; Martin et al., 2013) | ||||||
Between subspecies of Heliconius erato and H. melpomene (Nadeau et al., 2014) | RAD-seq | Structure; genome scan | Loci that control wing color patterns (for both mimicry and mate recognition); the aim of this study was to identify novel loci underlying phenotypic variation by means of association mapping in hybrid zones | ~11.8 mya (Kozak et al., 2015) | Several time points post speciation (Martin et al., 2013) | Parallel zones in the Andes |
Between H. beschkei, H. numata and H. numata subsp. nanna (Zhang et al., 2016) | Whole genome sequencing | D and fD statistics | Genomic region upstream of the gene optix (known to control red wing patterning) and 39 additional genomic regions | Divergence between H. beschkei and H. numata ~1.93 mya | East-central South America | |
Darwin’s finches | ||||||
Species from different Galápagos islands (Lamichhaney et al., 2015) | Whole genome sequencing | PSMC, NeighborNet, D statistic; genome scan | ALX1 gene encodes a transcription factor controlling craniofacial development, in particular beak shape | ~0.5 mya | Several time points post speciation | Galápagos islands |
Mice | ||||||
From Mus spretus into M. musculus subsp. domesticus (Song et al., 2011) | [No large genome- wide dataset] Sanger sequencing | >10 Mb region including the molecular target of anticoagulants vkorc1; M. spretus-specific alleles can cause resistance to rodenticides | ~1.5–3 mya | Africa and Europe | ||
Between M. musculus subsp. domesticus and M. musculus subsp. musculus (Staubach et al., 2012) | SNP genotyping | Genome scan | No general pattern of types of genes subject to introgression | 300,000–500,000 ya (Guénet and Bonhomme, 2003) | Maximum ~3,000 ya (Cucchi et al., 2005) | Eastern and southern Germany, western Czech Republic |
Humans Other studies identified genomic regions with evidence of adaptive introgression between archaic and modern humans (from Neanderthals into modern humans: e.g., Vernot and Akey, 2014; Sankararaman et al., 2014 / from Denisovans into Tibetans: e.g., Jeong et al., 2014; Huerta-Sánchez et al., 2014) | ||||||
From Neanderthals and Denisovans into modern humans (Racimo et al., 2017) | Whole genome sequencing; reevaluation of published data | D and fD statistics, RD, U and Q95 statistics, Haplostrips | Lipid metabolism, pigmentation and innate immunity | Divergence between the Neanderthal/ Denisovan lineage and modern humans 170,000–700,000 ya (Meyer et al., 2012) | Between Neanderthals and modern humans 47,000– 65,000 ya (Sankararaman et al., 2012) | Eurasia |
Yeast | ||||||
Between Saccharomyces cerevisiae and S. uvarum (Dunn et al., 2013) | Whole genome sequencing, colony-PCR and qRT-PCR | MEP2 fusion gene encodes for a high-affinity ammonium permease, which may be adaptive under nitrogen-poor environments with ammonium as only nitrogen source | ~20 mya (Kellis et al., 2003) | de novo creation of hybrid followed by continuous ammonium limitation | Experimental populations | |
Albugo candida | ||||||
Between the ancestors of three races; an adaptive radiation in progress (McMullan et al., 2015) | Whole genome sequencing, RNAseq | HybridCheck | Effector alleles that may result in a fitness advantage on most potential hosts, enabling host jumps | Multiple time points; oldest event ~200,000 ya, most recent event ~2,200 ya | Infection with virulent A. candida suppresses host immunity enabling co-colonization by otherwise non-virulent races |
Introgression between divergent populations or closely related species does not generally homogenize divergence levels across the entire genome (Turner et al., 2005; Coleman et al., 2006; Harr, 2006; Michel et al., 2010; Hohenlohe et al., 2012; Via, 2012; Renaut et al., 2013; Malinsky et al., 2015; Marques et al., 2016; reviewed in Wolf and Ellegren, 2017). Genomic islands of divergence (GIsD), also known as genomic islands of speciation, manifest as regions of elevated divergence in a ‘sea’ of neutral, non-differentiated background that has been homogenized by a history of gene flow. These GIsD are typically attributed to loci under divergent selection contributing to adaptation to the local environment or the establishment of reproductive isolation relatively independent of the external environment (Wu and Ting, 2004; Orr et al., 2004; Rieseberg and Blackman, 2010; Nosil and Schluter, 2011). One example is the adaptation to different foraging behaviours in Darwin’s finches, which might have largely been driven by the ALX1 gene that encodes a transcription factor affecting craniofacial development. Variation in this gene is strongly associated with beak shape diversity across Darwin’s finches and the medium ground finch (Lamichhaney et al., 2015). GIsD have been reported in many cases of recent species divergence with ongoing gene flow (Martin et al., 2013; Jónsson et al., 2014; Supple et al., 2015; Rougeux et al., 2016; Royer et al., 2016; Morales et al., 2017; Kumar et al., 2017). The region of divergence typically extends away from the selected locus due to physical linkage, allowing neutral polymorphisms to hitchhike along with a selected polymorphism, namely divergence hitchhiking, and become part of the GIsD. In many cases, this can obscure signals of selection and lead to ambiguity over the exact genetic loci under selection. It is therefore advisable to incorporate linked selection as a null model for the identification of genomic regions exhibiting pronounced differentiation (Burri et al., 2015).
Low recombination rates in some genomic regions can also create GIsD. Regions of suppressed recombination are frequently pericentromeric and often involve structural changes such as inversions that can physically block recombination. They are thought to sometimes play a role in adaptive genomic divergence (Rieseberg, 2001; Noor et al., 2001; Feder and Nosil, 2009), for example in the case of the 2L inversion divergence island, which was introgressed from Anopheles gambiae into A. coluzzii and harbours a suite of insecticide-resistance alleles (Lee et al., 2013a; Norris et al., 2015). However, it is questionable if most regions of very low recombination contain loci under divergent selection or whether they were simply established due to the stochastic effects of genome structure and genetic drift (Turner and Hahn, 2010). If the amount of gene flow is very low, for example due to geographic isolation, then large blocks of pronounced genomic differentiation may arise from genetic drift. This could be further accentuated by variable mutation rates and low recombination rates (Noor and Bennett, 2009; White et al., 2010; Cruickshank and Hahn, 2014).
Traits impacted by adaptive introgression
The traits impacted by introgression are as diverse as the organisms that have been studied (Table 1). In plants, for example, a broad array of abiotic tolerance traits have been affected, for example ion homeostasis and drought adaptations (Arabidopsis arenosa, Arnold et al., 2016; sunflowers, Whitney et al., 2015), but also traits that control biotic interactions between plants and herbivores (sunflowers, Whitney et al., 2015) or pollinators (Senecio, RAY1 and RAY2 genes, Kim et al., 2008), and others (poplars, PRR5 and COMT1 genes, Suarez-Gonzalez et al., 2016). In animals, clear examples that include traits influenced by adaptive introgression include insecticide resistance in Anopheles, the single nucleotide polymorphism L1014F in the gene kdr (Fontaine et al., 2015; Norris et al., 2015) and rodenticide resistance in mice, the vkorc1 gene (Song et al., 2011). Other introgressed alleles of potentially adaptive value stem from the actin gene mac-1 in mussels (Fraïsse et al., 2014), several loci that control wing colour patterns for both mimicry and mate recognition, amongst them the optix locus in Heliconius butterflies (Nadeau et al., 2014; Zhang et al., 2016), the beak shape-associated locus ALX1 in Darwin’s finches (Lamichhaney et al., 2015), variants that control lipid metabolism, pigmentation and innate immunity in humans (Racimo et al., 2017), but also loci of so far uncharacterized adaptive value in Drosophila (Garrigan et al., 2012; Brand et al., 2013) and mice (Staubach et al., 2012). An interesting case of introgression between Drosophila species was reported to be trans-specific, meaning that the introgressed 15 kb region harbours a mutation that appears adaptive in both species despite their different genomic backgrounds and ecological requirements (Brand et al., 2013). In addition, cytonuclear co-introgression has been shown for some Drosophila species (Beck et al., 2015). In the fungal genus Saccharomyces experimental hybridisation followed by ammonium limitation resulted in the origin of an interspecific MEP2 fusion gene, which encodes for a high affinity ammonium permease that may be adaptive under nitrogen-poor environments with ammonium as the only nitrogen source (Dunn et al., 2013).
Adaptive allele mining in hybrid zones
In secondary contact scenarios, PaIS come into contact following a period of reproductive isolation. This process is often induced by geographic and environmental factors, such as habitat shifts due to climate changes, for example oscillations during Pleistocene and Anthropocene, or catastrophic human-mediated events such as fires, flooding, road construction or introduction of alien species (Crispo et al., 2011). Such secondary contact between divergent but interfertile populations frequently results in hybrid zones where the otherwise geographically distinct distribution ranges of the two species overlap, permitting the production of offspring of mixed ancestry (Barton and Hewitt, 1985; Barton and Hewitt, 1989; Harrison, 1990). In the fungus-like pathogen Albugo candida a spectacular example of adaptive introgression after secondary contact has recently been reported, in which infection of a host with virulent A. candida suppresses host immunity enabling co-colonization by otherwise non-virulent races (McMullan et al., 2015). This creates hybrid zones allowing sexual reproduction and gene flow between isolated races; effector alleles are transferred that exhibit a fitness advantage on many potential hosts, facilitating host jumps. Then once a new hybrid race has established, it rapidly reproduces asexually without continued exchange with other races.
Hybrid zones represent natural laboratories (Hewitt, 1988) for the study of introgression dynamics as well as their evolutionary significance. They offer venues to observe allele flow between partly isolated populations and allow hybrid fitness to be estimated in nature. As described above, alleles under divergent selection are relatively inhibited from introgressing between populations. This barrier to the exchange of certain alleles generates an allele frequency gradient, or cline, across the hybrid zone, which partly blocks the homogenization of the hybridizing populations. Such a cline might be detected as a phenotypic transition from one parental species to the other, which should coincide with the underlying allelic cline. Hybrid zones can be understood as natural mapping experiments to determine the alleles responsible for complex traits that differentiate parental populations (Buerkle and Lexer, 2008; Crawford and Nielsen, 2013). Over generations, recombination breaks down linkage, allowing alleles at discrete loci to be associated with phenotypes and environmental variables (Crawford and Nielsen, 2013). This is especially clear in large hybrid populations with varying levels of admixture that are established along environmental gradients, such as in spruce (Hamilton et al., 2013; Hamilton et al., 2015). Substantial levels of clinal variation and allele-environment associations with climatic variables such as temperature and precipitation were found in hybrid zones of Sitka, white, and Engelmann spruce (Hamilton et al., 2015), which suggests that species integrity is maintained through exogenous selection in parental habitats and that hybridisation might facilitate fine-scale adaptation of the species along environmental gradients.
However, genotype-phenotype-environment associations along clines have rarely been addressed using dense genome-wide markers. Highlights include work in mice (Turner and Harr, 2014; Pallares et al., 2014; Pallares et al., 2016), for which genes involved in craniofacial shape variation were found that act in a polygenic manner (Pallares et al., 2016). The Heliconius and Helianthus systems are other examples of association mapping across hybrid zones, although Helianthus showed spurious associations in early generation hybrids due to linkage disequilibrium (Rieseberg and Buerkle, 2002). Heliconius in contrast is a relatively old hybrid zone and close to linkage equilibrium, with phenotypic variation largely controlled by major effect loci, for which smaller sample sizes are sufficient (Nadeau et al., 2014). With emerging probabilistic frameworks to infer allele frequencies in low coverage sequencing data it should be possible in the future to also address minor effect loci, for which larger sample sizes are required, even in polyploids (Blischak et al., 2017).
Selective landscapes vary widely in different types of hybrid zones. If selection acts on genotypes of mixed ancestry independent of spatial variation in selection pressures, the hybrid zone can be considered a ‘tension zone’ (Barton and Hewitt, 1985; Barton and Hewitt, 1989). In a tension zone selection is usually endogenous, for example the hybrid zone between two Senecio species on Mount Etna (Brennan et al., 2009), with occasional exceptions such as exogenous frequency-dependent selection by predatory birds in Heliconius butterflies (Mallet et al., 1990). Alternatively, Endler (1977) postulated spatially varying exogenous selection along a geographic-environmental gradient. According to another model, the bounded hybrid superiority model (Moore, 1977), selection in intermediate habitats can favour individuals of mixed ancestry, which seems to be the case for the hybrid zone between Sitka and white spruce (Hamilton et al., 2013). In mosaic hybrid zones, parental populations are distributed in a patchy landscape. Examples of such hybrid zones are frequently found in parasites that rely on patchily distributed hosts, for example in Albugo candida (McMullan et al., 2015), but also in non-parasitic species such as mussels (Fraïsse et al., 2014), which may reflect a common scenario of complex environmental mosaics in nature (Harrison, 1986). In a tension zone, with endogenous selection on the hybrids, the hybrid zone can move freely if there are asymmetries in selection, dispersal, or population density and will finally arrest at a geographic barrier or in an area of low population density (Hewitt, 1975; Barton, 1979; Barton and Hewitt, 1985). Hybrid zones might move in response to climate change or co-varying environmental factors (Taylor et al., 2015).
Population genomic approaches to infer introgression in natural populations
A convergence of recent work on new methods of refined demographic inference and methods to describe the genomic architecture of introgression have great potential for the discovery of introgressed alleles (Table 2). Demographic inference methods evaluate population structure. They provide a good estimate of admixture between populations. They estimate fine-scale demographic histories, such as population history, including fluctuations in effective population size over evolutionary time, partly based on rare polymorphisms, or they infer parameter values, such as population split times and migration rates, and test hypotheses for comparison with alternative demographic scenarios. The genomic architecture of introgression is observed using metrics that determine the level of allele sharing and linkage disequilibrium, along with visualization methods, including topology weighting of trees across the genome (Table 2). Genome scans assay for regions exhibiting unusual levels of divergence among populations or species that represent candidate selected alleles (Nielsen et al., 2005; Yant et al., 2013; Lotterhos and Whitlock, 2015; Jensen et al., 2016). There are two main approaches to identify candidate loci: taking outliers of differentiation metrics or performing explicit hypothesis testing to determine whether the value is significantly greater than expected by chance or under neutrality. Genome scans can be confounded by other modes of selection and by effects of demography and are therefore best interpreted as hypothesis generators, providing candidate loci and processes to be tested in downstream functional studies, where possible. It may also be more confidently concluded that introgression is adaptive if a combination of methods, including descriptors of the genomic architecture of introgression and appropriately chosen selection metrics, converge on particular loci. In some cases, compelling a priori evidence that the trait is under similar directional selection in both species is also available. The combination of using methods assessing the genomic architecture of introgression and population genomics metrics was successfully applied when mining for adaptively introgressed candidate alleles for mediating adaptation to serpentine soils in Arabidopsis (Arnold et al., 2016). In Heliconius butterflies there was a priori evidence for wing colour patterns involved in mimicry and mate recognition to be under directional selection in the hybridising species, and it was then shown that the introgressed alleles are responsible for this trait (Pardo-Diaz et al., 2012).
Analysis Type . | Software . | Description . | References . |
---|---|---|---|
Demographic inference | Population structure | ||
adegenet | Evaluation of population clustering using principal component and discriminant analysis | Jombart, 2008; Jombart et al., 2010 | |
STRUCTURE; fastSTRUCTURE; fineSTRUCTURE; ADMIXTURE; SpaceMix | Estimation of (fine-scale) population structure, taking admixture into account | Pritchard et al., 2000; Hubisz et al., 2009; Raj et al., 2014; Lawson et al., 2012; Alexander et al., 2009; Bradburd et al., 2016 | |
Demographic history | |||
IM; IMa2 | Fitting an isolation with migration model to haplotype data from two populations (IM) or up to ten populations (IMa2) | Hey and Nielsen, 2007; Hey J. 2010a ; Hey J. 2010b | |
dadi (diffusion approximations for demographic inference) | Inference of the demographic history of multiple populations from SNP frequency data | Gutenkunst et al., 2009 | |
PSMC (pairwise sequentially Markovian coalescent model); MSMC (multiple sequentially Markovian coalescent model) | Inference of fluctuations in effective population size over time from a single genome sequence (PSMC) or multiple genome sequences (MSMC) | Li and Durbin, 2011; Schiffels and Durbin, 2014 | |
simcoal2; fastsimcoal2 | Inference of parameter values, such as population split times and migration rates, and testing of hypotheses to compare to alternative, neutral demographic scenarios | Laval and Excoffier, 2004; Excoffier et al., 2013 | |
Rarecoal | Inference of population history and fine-scale ancestry from rare variants | Schiffels et al., 2016 | |
Network and tree methods | |||
NeighbourNet (SplitsTree4) | Visualization of reticulate relationships in the form of splits networks | Huson and Bryant, 2006 | |
TreeMix | Inference of patterns of population splits and admixture | Pickrell and Pritchard, 2012 | |
Phylo-Net | Coalescent-based species tree and evolutionary network method; testing the number of introgression events | Than et al., 2008 | |
Genomic architecture of introgression | Introgression detection | ||
D statistic; f statistic | Detection of SNP window-based or genome-wide evidence of shared alleles (ABBA-BABA test) in a four-taxon case | Comparison of two genomes: Green et al., 2010; Durand et al., 2011 / comparison of two populations: Kronforst et al., 2013; Smith and Kronforst, 2013 | |
DFOIL statistic | D statistic for a symmetric five-taxon phylogeny; determination of the directionality of introgression | Pease and Hahn, 2015 | |
fD statistic | Refinement of the f statistic (Green et al., 2010) by being less sensitive to differences in diversity along the genome | Martin et al., 2015 | |
RD, U, Q95 statistics | Identification of genomic windows that are likely to have undergone adaptive introgression | Racimo et al., 2017 | |
Methods with additional visualization aspect | |||
HybridCheck | Detection of introgressed genomic blocks and visualization of the heterogeneous, mosaic-like genome structure; dating of introgressed blocks | Ward and van Oosterhout, 2016 | |
Haplostrips | Plot of haplotype structure at candidate regions for adaptive introgression | Marnetto et al., in prep. (cited in Racimo et al., 2017) | |
Twisst | Topology weighting of SNP window-based trees across the genome | Martin and Van Belleghem, 2017 |
Analysis Type . | Software . | Description . | References . |
---|---|---|---|
Demographic inference | Population structure | ||
adegenet | Evaluation of population clustering using principal component and discriminant analysis | Jombart, 2008; Jombart et al., 2010 | |
STRUCTURE; fastSTRUCTURE; fineSTRUCTURE; ADMIXTURE; SpaceMix | Estimation of (fine-scale) population structure, taking admixture into account | Pritchard et al., 2000; Hubisz et al., 2009; Raj et al., 2014; Lawson et al., 2012; Alexander et al., 2009; Bradburd et al., 2016 | |
Demographic history | |||
IM; IMa2 | Fitting an isolation with migration model to haplotype data from two populations (IM) or up to ten populations (IMa2) | Hey and Nielsen, 2007; Hey J. 2010a ; Hey J. 2010b | |
dadi (diffusion approximations for demographic inference) | Inference of the demographic history of multiple populations from SNP frequency data | Gutenkunst et al., 2009 | |
PSMC (pairwise sequentially Markovian coalescent model); MSMC (multiple sequentially Markovian coalescent model) | Inference of fluctuations in effective population size over time from a single genome sequence (PSMC) or multiple genome sequences (MSMC) | Li and Durbin, 2011; Schiffels and Durbin, 2014 | |
simcoal2; fastsimcoal2 | Inference of parameter values, such as population split times and migration rates, and testing of hypotheses to compare to alternative, neutral demographic scenarios | Laval and Excoffier, 2004; Excoffier et al., 2013 | |
Rarecoal | Inference of population history and fine-scale ancestry from rare variants | Schiffels et al., 2016 | |
Network and tree methods | |||
NeighbourNet (SplitsTree4) | Visualization of reticulate relationships in the form of splits networks | Huson and Bryant, 2006 | |
TreeMix | Inference of patterns of population splits and admixture | Pickrell and Pritchard, 2012 | |
Phylo-Net | Coalescent-based species tree and evolutionary network method; testing the number of introgression events | Than et al., 2008 | |
Genomic architecture of introgression | Introgression detection | ||
D statistic; f statistic | Detection of SNP window-based or genome-wide evidence of shared alleles (ABBA-BABA test) in a four-taxon case | Comparison of two genomes: Green et al., 2010; Durand et al., 2011 / comparison of two populations: Kronforst et al., 2013; Smith and Kronforst, 2013 | |
DFOIL statistic | D statistic for a symmetric five-taxon phylogeny; determination of the directionality of introgression | Pease and Hahn, 2015 | |
fD statistic | Refinement of the f statistic (Green et al., 2010) by being less sensitive to differences in diversity along the genome | Martin et al., 2015 | |
RD, U, Q95 statistics | Identification of genomic windows that are likely to have undergone adaptive introgression | Racimo et al., 2017 | |
Methods with additional visualization aspect | |||
HybridCheck | Detection of introgressed genomic blocks and visualization of the heterogeneous, mosaic-like genome structure; dating of introgressed blocks | Ward and van Oosterhout, 2016 | |
Haplostrips | Plot of haplotype structure at candidate regions for adaptive introgression | Marnetto et al., in prep. (cited in Racimo et al., 2017) | |
Twisst | Topology weighting of SNP window-based trees across the genome | Martin and Van Belleghem, 2017 |
Analysis Type . | Software . | Description . | References . |
---|---|---|---|
Demographic inference | Population structure | ||
adegenet | Evaluation of population clustering using principal component and discriminant analysis | Jombart, 2008; Jombart et al., 2010 | |
STRUCTURE; fastSTRUCTURE; fineSTRUCTURE; ADMIXTURE; SpaceMix | Estimation of (fine-scale) population structure, taking admixture into account | Pritchard et al., 2000; Hubisz et al., 2009; Raj et al., 2014; Lawson et al., 2012; Alexander et al., 2009; Bradburd et al., 2016 | |
Demographic history | |||
IM; IMa2 | Fitting an isolation with migration model to haplotype data from two populations (IM) or up to ten populations (IMa2) | Hey and Nielsen, 2007; Hey J. 2010a ; Hey J. 2010b | |
dadi (diffusion approximations for demographic inference) | Inference of the demographic history of multiple populations from SNP frequency data | Gutenkunst et al., 2009 | |
PSMC (pairwise sequentially Markovian coalescent model); MSMC (multiple sequentially Markovian coalescent model) | Inference of fluctuations in effective population size over time from a single genome sequence (PSMC) or multiple genome sequences (MSMC) | Li and Durbin, 2011; Schiffels and Durbin, 2014 | |
simcoal2; fastsimcoal2 | Inference of parameter values, such as population split times and migration rates, and testing of hypotheses to compare to alternative, neutral demographic scenarios | Laval and Excoffier, 2004; Excoffier et al., 2013 | |
Rarecoal | Inference of population history and fine-scale ancestry from rare variants | Schiffels et al., 2016 | |
Network and tree methods | |||
NeighbourNet (SplitsTree4) | Visualization of reticulate relationships in the form of splits networks | Huson and Bryant, 2006 | |
TreeMix | Inference of patterns of population splits and admixture | Pickrell and Pritchard, 2012 | |
Phylo-Net | Coalescent-based species tree and evolutionary network method; testing the number of introgression events | Than et al., 2008 | |
Genomic architecture of introgression | Introgression detection | ||
D statistic; f statistic | Detection of SNP window-based or genome-wide evidence of shared alleles (ABBA-BABA test) in a four-taxon case | Comparison of two genomes: Green et al., 2010; Durand et al., 2011 / comparison of two populations: Kronforst et al., 2013; Smith and Kronforst, 2013 | |
DFOIL statistic | D statistic for a symmetric five-taxon phylogeny; determination of the directionality of introgression | Pease and Hahn, 2015 | |
fD statistic | Refinement of the f statistic (Green et al., 2010) by being less sensitive to differences in diversity along the genome | Martin et al., 2015 | |
RD, U, Q95 statistics | Identification of genomic windows that are likely to have undergone adaptive introgression | Racimo et al., 2017 | |
Methods with additional visualization aspect | |||
HybridCheck | Detection of introgressed genomic blocks and visualization of the heterogeneous, mosaic-like genome structure; dating of introgressed blocks | Ward and van Oosterhout, 2016 | |
Haplostrips | Plot of haplotype structure at candidate regions for adaptive introgression | Marnetto et al., in prep. (cited in Racimo et al., 2017) | |
Twisst | Topology weighting of SNP window-based trees across the genome | Martin and Van Belleghem, 2017 |
Analysis Type . | Software . | Description . | References . |
---|---|---|---|
Demographic inference | Population structure | ||
adegenet | Evaluation of population clustering using principal component and discriminant analysis | Jombart, 2008; Jombart et al., 2010 | |
STRUCTURE; fastSTRUCTURE; fineSTRUCTURE; ADMIXTURE; SpaceMix | Estimation of (fine-scale) population structure, taking admixture into account | Pritchard et al., 2000; Hubisz et al., 2009; Raj et al., 2014; Lawson et al., 2012; Alexander et al., 2009; Bradburd et al., 2016 | |
Demographic history | |||
IM; IMa2 | Fitting an isolation with migration model to haplotype data from two populations (IM) or up to ten populations (IMa2) | Hey and Nielsen, 2007; Hey J. 2010a ; Hey J. 2010b | |
dadi (diffusion approximations for demographic inference) | Inference of the demographic history of multiple populations from SNP frequency data | Gutenkunst et al., 2009 | |
PSMC (pairwise sequentially Markovian coalescent model); MSMC (multiple sequentially Markovian coalescent model) | Inference of fluctuations in effective population size over time from a single genome sequence (PSMC) or multiple genome sequences (MSMC) | Li and Durbin, 2011; Schiffels and Durbin, 2014 | |
simcoal2; fastsimcoal2 | Inference of parameter values, such as population split times and migration rates, and testing of hypotheses to compare to alternative, neutral demographic scenarios | Laval and Excoffier, 2004; Excoffier et al., 2013 | |
Rarecoal | Inference of population history and fine-scale ancestry from rare variants | Schiffels et al., 2016 | |
Network and tree methods | |||
NeighbourNet (SplitsTree4) | Visualization of reticulate relationships in the form of splits networks | Huson and Bryant, 2006 | |
TreeMix | Inference of patterns of population splits and admixture | Pickrell and Pritchard, 2012 | |
Phylo-Net | Coalescent-based species tree and evolutionary network method; testing the number of introgression events | Than et al., 2008 | |
Genomic architecture of introgression | Introgression detection | ||
D statistic; f statistic | Detection of SNP window-based or genome-wide evidence of shared alleles (ABBA-BABA test) in a four-taxon case | Comparison of two genomes: Green et al., 2010; Durand et al., 2011 / comparison of two populations: Kronforst et al., 2013; Smith and Kronforst, 2013 | |
DFOIL statistic | D statistic for a symmetric five-taxon phylogeny; determination of the directionality of introgression | Pease and Hahn, 2015 | |
fD statistic | Refinement of the f statistic (Green et al., 2010) by being less sensitive to differences in diversity along the genome | Martin et al., 2015 | |
RD, U, Q95 statistics | Identification of genomic windows that are likely to have undergone adaptive introgression | Racimo et al., 2017 | |
Methods with additional visualization aspect | |||
HybridCheck | Detection of introgressed genomic blocks and visualization of the heterogeneous, mosaic-like genome structure; dating of introgressed blocks | Ward and van Oosterhout, 2016 | |
Haplostrips | Plot of haplotype structure at candidate regions for adaptive introgression | Marnetto et al., in prep. (cited in Racimo et al., 2017) | |
Twisst | Topology weighting of SNP window-based trees across the genome | Martin and Van Belleghem, 2017 |
Horizontal gene transfer events as a source of adaptive novelty
Alleles conferring an adaptive advantage may also be exchanged between reproductively isolated species via HGT. In prokaryotes HGT between distantly related species is well established as a source of novelty and a key driver of adaptation, most notably in the acquisition of antibiotic resistance and genes conferring pathogenicity (Gillings, 2017). Similarly, HGT has been shown to be prevalent among single-celled eukaryotes (Keeling and Palmer, 2008; Andersson, 2009). This penchant for HGT in prokaryotes has an ongoing effect on multicellular eukaryotes; HGT between the mitochondria of distantly related plant species is rampant (Won and Renner, 2003; Bergthorsson et al., 2003; Mower et al., 2010; Rice et al., 2013;). HGT in the plastid genome seems relatively infrequent, although there are cases (Rice et al., 2006; Park et al., 2007). This may be due to the fact that plant mitochondria have a mechanism for the active uptake of DNA and frequently fuse (Richardson and Palmer, 2007; Rice et al., 2013), while plastids lack this tendency. HGT into the nuclear genomes of multicellular eukaryotes is less frequent. For any foreign DNA to be heritable it must be integrated into the germline, which is separated from the somatic cells and often protected from the environment by elaborate structures. Despite this, HGT does occur in the nuclear genomes of multicellular eukaryotes; fungi (Ambrose et al., 2014), arthropods (Wybouw et al., 2016), nematodes (Danchin et al., 2010), mosquitos (Klasson et al., 2009), fish (Sun et al., 2015), sea anemones (Starcevic et al., 2008) and a broad range of plants (Bock, 2010) have all been shown to contain nuclear genes of HGT origin from diverse sources.
Many possible mechanisms for genetic transfers by HGT have been suggested. Lifestyle traits that allow intimate contact between unrelated species may increase the likelihood of HGT. For example, close contact between a host and its parasite, which can involve the exchange of macromolecules including mRNAs (Kim et al., 2014), presents an opportunity for genetic exchange (Yoshida et al., 2010; Xi et al., 2012; Zhang et al., 2013; Zhang et al., 2014; Davis and Xi, 2015; Yang et al., 2016) as does a reproductive cycle in which components of the germline are more exposed to the environment or even free living, such as those of bryophytes, lycophytes, and ferns (Li et al., 2014). It has also been demonstrated that rare cases of grafting between unrelated species can result in the transformation of cells at the graft site (Bergthorsson, 2003; Stegemann, 2009; Stegemann et al. 2012). There are also many vectors that may transport genetic material between species, for example bacteria, viruses or mobile genetic elements. It has been established that pathogenic bacteria can transform eukaryotic host cells through the injection of proteins and/or genetic material (Lacroix and Citovsky, 2016). HGT of selfish genetic elements, such as transposons, is rampant and could mediate the movement of host DNA. Indeed, a study of group I introns in angiosperm mitochondria found 32 separate instances of HGT into plants (Cho et al., 1998). Another study showed that 65% of the plant genomes analyzed contained at least one instance of HGT of a long terminal repeat retrotransposon (El Baidouri et al., 2014). This likely has an important evolutionary function for selfish genetic elements, allowing them to escape resistant host genomes that effectively silence them. Finally, despite the complexity involved in shielding gametes from the environment in seed plants, it has been suggested that exposure to foreign pollen could result in small windows of opportunity for illegitimate pollination and HGT into the germline (Keeling and Palmer, 2008; Bock, 2010; Christin et al., 2012).
A key outstanding question, however, is what, if any, is the adaptive impact of these HGT events? Arguably genes that do not confer an adaptive benefit are expected to decay by neutral drift, eventually to be relegated to pseudogene status and then lost altogether (Keeling and Palmer, 2008; Soucy, 2015). Many factors could render a gene acquired by HGT from a divergent organism useless, or even deleterious. A novel genetic background could abrogate interactions essential for gene function, while incompatible and divergent codon biases, transcriptional elements or intron/exon splice sites could also inactivate foreign genes. Indeed, the fate of many genes acquired by HGT in eukaryotes is decay (Bergthorsson, 2003; Mower et al., 2010; Rice, 2013,Mahelka et al., 2017). However, a large body of evidence suggests that this is not always the case (Emiliani et al., 2009; Danchin et al., 2010; Yue et al., 2012; Christin et al., 2012; Acuña et al., 2012; Zhang et al., 2013; Yang et al., 2013; Li et al., 2014; Ambrose et al., 2014; Prentice et al., 2015; Sun et al., 2015; Yin et al., 2016). The transcriptional and developmental regulation of horizontally transferred loci, alongside evidence for purifying or positive selection, suggests that many such events are adaptive.
Phylogenetic analysis is the gold standard for HGT inference (Keeling and Palmer, 2008; Brock, 2009; Soucy, 2015). It relies on detecting incongruence between individual locus trees and species phylogenies. Often these cases are obvious. However, alternative evolutionary processes as well as sampling and analytic errors can produce incongruent gene trees and these must be considered in each case. Several potential confounding causes include gene duplication and subsequent loss, inadequate taxonomic sampling, historical allopolyploidization and long branch attraction (Keeling and Palmer, 2008; Brock, 2009; Soucy 2015). Further the possibility exists of intracellular gene transfer from the mitochondrial and plastid genomes; thus, genes of alpha-proteobacterial or cyanobacterial origin should be excluded from analysis (Huang and Gogarten, 2008; Yang et al., 2013). However, diverse evidence classes can corroborate candidate HGT events, such as codon usage differences, intron structure or GC content.
HGT and the emergence of land plants
Strong evidence implicates ancient HGT as a key source of adaptive novelty when the pioneering ancestor of green plants adapted to terrestrial environments (Huang and Gogarten, 2008; Emiliani et al., 2009; Yue et al., 2012; Yue et al., 2013; Yang et al., 2013). This lineage experienced an array of novel abiotic and biotic challenges upon colonisation of land, including desiccation, UV irradiation, and microbial attack. Analysis of the moss Physcomitrella patens identified 39 gene families that had been acquired by HGT from prokaryotes, fungi or viruses after the split between plants and green algae, 35 of which were shared with seed plants. These loci are involved in a broad range of plant-specific processes including biosynthesis, defence, stress tolerance, vascular development, and seed germination (Yue et al., 2012; Yue et al., 2013). In contrast to seed plants, the gametophytes and zygotes of mosses are more exposed to the environment, presenting an opportunity for the integration of foreign DNA. Another example is the phenylpropanoid pathway that produces compounds such as lignin and flavonoids, critical components in plant structure and defence against microbes and UV (Emiliani et al., 2009). The common ancestor of land plants acquired the enzyme phenylpropanoid that performs the first critical step in this pathway via HGT from what was ultimately a bacterial source (Emiliani et al., 2009). Another example is represented by the L-Ala-D/L-Glu epimerases (AEEs), which are ubiquitous in land plants and were initially acquired by HGT from prokaryotes (Yang et al., 2013). The fixation of AEEs in land plants was driven by positive selection and was specific to land plants; they have not been found in any other eukaryotes including the extant progenitors of land plants, red and green algae (Yang et al., 2013). Acquisition of genes by HGT prior to the split between red alga and green plants is also an important source of genes involved in the functionality of plastids (Huang and Gogarten, 2008).
HGT mediating pathogen resistance and environmental adaptation
HGT is also a source of novelty that is exploited in pathogen resistance. For example, a gene important for virus resistance in domestic tomato is derived from the fusion of two genes, both of which were acquired by HGT ultimately from bacteria, although one of the genes appears to have passed through a fungus (Yang et al., 2016). The obligatory parasite Phelipanche aegyptiaca acquired an albumin 1 gene, known to function as a storage protein and insect toxin, by HGT from legumes (Zhang et al., 2013). Structural predictions, the conservation of key functional residues, and evidence of purifying selection all suggest that the gene serves an adaptive role. This HGT event was likely the result of historical host-parasite interactions. Another example is the fungus Epichloë, an intracellular plant symbiont, which repeatedly was conferred insect resistance to its hosts by an insect toxin acquired by the fungus via HGT from bacteria (Ambrose et al., 2014).
The reverse is also true; HGT in plants, fungi, and insects has allowed enhanced pathogenic exploitation of plants. In the fungus responsible for apple canker, Valsa mali, multiple genes acquired by HGT from bacteria and fungi were implicated in pathogenicity with putative roles in the avoidance of host immune responses and the degradation of host tissues (Yin et al., 2016). There is evidence that the berry borer beetle Hypothenemus hampei adapted to its role as a pathogen of coffee beans by the acquisition of the HhMAN1 locus (Acuña et al., 2012). The enzyme encoded is capable of breaking down galactomannan, the main storage polysaccharide in coffee beans and likely allows the beetles to exploit coffee beans as a food source. HGT has allowed plant-parasitic root knot nematodes to acquire a repertoire of plant cell wall degradation enzymes that facilitate parasitism from bacteria (Danchin et al., 2010). Finally, a study of HGT in multiple parasitic plant lineages showed that genes acquired from the host via HGT are not only evolving under purifying or positive selection but are most likely to be expressed in the haustorium, the interface between host and parasite, implying they may have an adaptive role in host-parasite interactions (Yang et al., 2016).
HGT has also been implicated in mediating repeated adaptation to stringent environmental conditions. C4 photosynthesis is a more efficient photosynthetic pathway in hot arid conditions that has evolved multiple times from C3 progenitors. In the Alloteropsis grasses, two key C4 photosynthesis proteins, phosphoenolpyruvate carboxylase and phosphoenolpyruvate carboxykinase, have been acquired by HGT at least four times from distantly related plants in close ecological contact with the grasses (Christin et al., 2012). These pre-adapted C4 genes likely replaced their suboptimal homologues, rapidly optimizing the C4 pathway in the recipient. A metabolic enzyme with a key role in glucose metabolism that was acquired by plant-to-plant HGT has been shown to be associated with fine-scale biotic and abiotic environmental differences in the grass Festuca ovina (Prentice et al., 2015).
A particularly striking example of adaptation mediated by HGT involves the photoreceptor neochrome, a chimeric photoreceptor originating from a gene fusion that is thought to play a pivotal role in the enhanced phototropic response of ferns, a key adaptive trait that allowed their diversification following the advent of angiosperms (Kawai et al., 2003; Schneider et al., 2004; Kanegae et al., 2006; Schuettpelz and Pryer, 2009). This protein, previously thought to have arisen independently in ferns and algae, was initially acquired by the fern lineage through HGT from hornworts followed by subsequent inter-fern HGT (Li et al., 2014). As discussed above, both ferns and hornworts have lifestyle traits that likely make them more susceptible to HGT. Finally, there is evidence of HGT in commercially important species: domesticated sweet potato contains four transcribed Agrobacterium genes (Kyndt et al., 2015) and the silkworm Bombyx mori contains 10 expressed genes from bacterial sources, with putative functions in disease resistance and metabolism (Zhu et al., 2011).
Thus, HGT has had a profound impact on the genesis of many important adaptive plant traits such as the acquisition of endosymbionts, the origin of C4 photosynthesis, and the emergence of terrestrial plants. Compared with prokaryotes, HGT in eukaryotes is considered rare but the impact of such HGT events on the evolutionary trajectories of their recipients can be large. In this era of high-throughput sequencing technologies and especially whole genome sequencing, cases of HGT in multicellular eukaryotes are increasingly likely to be identified.
Near-term perspectives on adaptive allele mining using adaptive introgression
When phenotype-driven, allele mining for introgressed loci is a powerful tool for the identification of strong candidate alleles underlying particular adaptations. Indeed, we are at a watershed moment in the history of these approaches. There stands behind us a rich history, with long established model systems poised to be married to modern population genomics. It is now possible to detect loci under divergent selection, to test if the selected alleles have been introgressed and to associate these candidate alleles with phenotypes in ultra-high genomic resolution. In cases where these studies focus on clear phenotypes, it is obvious that such mixtures of approaches will engender rapid developments in understanding the mechanisms of adaptation (Box 1). Further, they stand to reveal the historic and geographic context of adaptive introgression mediating complex traits in the rich complexity presented by nature.
Abbreviations
- GIsD
genomic islands of divergence
- HGT
horizontal gene transfer
- PaIS
parapatrically isolated species
- TEs
transposable elements.
Acknowledgements
LY acknowledges funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement 679056) and BBSRC Institute Strategic Programme Grant (BB/J004588/1). RS acknowledges funding from the Czech Science Foundation (GAČR), grant number 16-15134Y. The authors thank Patrick Monnahan, Jordan Koch, and Pirita Paajanen for discussion.
References
Comments