Abstract

Crop domestication and the subsequent expansion of crops have long been thought of as a linear process from a wild ancestor to a domesticate. However, evidence of gene flow from locally adapted wild relatives that provided adaptive alleles into crops has been identified in multiple species. Yet, little is known about the evolutionary consequences of gene flow during domestication and the interaction of gene flow and genetic load in crop populations. We study the pseudo-cereal grain amaranth that has been domesticated three times in different geographic regions of the Americas. We quantify the amount and distribution of gene flow and genetic load along the genome of the three grain amaranth species and their two wild relatives. Our results show ample gene flow between crop species and between crops and their wild relatives. Gene flow from wild relatives decreased genetic load in the three crop species. This suggests that wild relatives could provide evolutionary rescue by replacing deleterious alleles in crops. We assess experimental hybrids between the three crop species and found genetic incompatibilities between one Central American grain amaranth and the other two crop species. These incompatibilities might have created recent reproductive barriers and maintained species integrity today. Together, our results show that gene flow played an important role in the domestication and expansion of grain amaranth, despite genetic species barriers. The domestication of plants was likely not linear and created a genomic mosaic by multiple contributors with varying fitness effects for today’s crops.

Introduction

Evolution and speciation has long been viewed as a linear process with a single ancestor giving rise to one or more derived species. Genomic data of large samples have revealed ancestry of different species within modern populations in a number of species (Harris and Nielsen 2016; Niu et al. 2019; Chomicki et al. 2020; Kozak et al. 2021; Orlando et al. 2021; Lv et al. 2022). Particularly in plants where reproductive barriers are often weak, the potential for the exchange of genetic material between related species is rather high (Ostevik et al. 2016; Osuna-Mascaró et al. 2023; Sheidai and Koohdar 2023). Yet, the observation of exchanging genetic material between species is often seen as an exception and as a minor contribution to the genomic composition of a species.

Gene flow describes the process of exchanging genetic information between populations or even species (Rieseberg and Burke 2001). Although interbreeding between populations might be frequent, the manifestation of gene flow between locally adapted populations or even species is thought to be rare as it would decrease fitness. Nevertheless, beneficial gene flow has been shown to be an important source of variation for local adaptation (Crispo 2008; Sexton et al. 2011; Ellstrand 2014; Tigano and Friesen 2016; López-Goldar and Agrawal 2021). At least partial fertility of hybrids is required, and compatibility between donor and recipient determines the intensity of gene flow (Aguillon et al. 2022). In the course of speciation, the ability to form viable hybrids can be lost, and reproductive barriers that prevent gene flow can evolve. Hence, hybrid incompatibility can hinder gene flow and lead to reproductive isolation. Yet, reproductive isolation in plants is often incomplete or is circumvented by intermediate populations, allowing for gene flow even between different species.

The domestication of crops and animals led to a major transition in human lifestyle and had a profound impact on the genetic makeup of the domesticates (Doebley 2006). Crop domestication can be seen as rapid evolution, often leading to speciation. Even more than speciation in general, crop domestication has long been described as a linear process starting from one wild species evolving through strong directional selection into a domesticate. However, this view has been challenged in recent years, where gene flow from wild relatives have been documented in a number of crops, including maize (Ross-Ibarra et al. 2009), rice (Yang et al. 2012), barley (Civáň et al. 2021), sorghum (Sagnard et al. 2011), tomato (Razifard et al. 2020), Brassica (Saban et al. 2023), and others (Luo et al. 2007; Liu et al. 2019; Page et al. 2019; Ding et al. 2022). Reproductive isolation of the crop from its wild relatives would ensure the maintenance of domestication traits, hence, the success of domestication (Dempewolf et al. 2012). Gene flow from wild relatives would have led to a reduction of domestication-related phenotypic changes. Early generations of crop-wild hybrids would be rather unfit as wild plants or crops, as their adaptive traits strongly differ (Janzen et al. 2019; Stetter 2020). Yet, gene flow from wild relatives could have increased the genetic variation in early crops, which could have been beneficial to increase adaptive potential (Smith et al. 2019). In addition, gene flow from locally adapted wild relatives has been shown to have provided alleles that allowed the crop population to establish in novel environments (Van Heerwaarden et al. 2011; Hufford et al. 2013; Wang et al. 2021).

Plant domestication has likely been driven by demographic changes and directional selection. Domestication bottlenecks reduced the effective population size, leading to an accumulation of mildly deleterious alleles in the population (Gaut et al. 2018). Selection on major effect domestication genes might have allowed hitchhiking of linked mildly deleterious alleles (Sedivy et al. 2017). Together these effects increased genetic load—the accumulation of deleterious alleles—in the crop population (Bertorelle et al. 2022). Several studies have assessed the accumulation of genetic load in domesticates compared with their wild relatives, for example, rice (Lu et al. 2006; Xu et al. 2006; Nabholz et al. 2014), maize (Gaut et al. 2015; Rodgers-Melnick et al. 2015), and soybean (Kim et al. 2021). Whereas an accumulation of genetic load has been detected in many domesticated species, no increase has been detected in sorghum, potentially due to the transition to selfing in the crop (Lozano et al. 2021). Gene flow from wild relatives with larger effective population size into crop populations might have reduced genetic load in crops (Stetter 2020). In sorghum, gene flow between early landraces resulted in decreased genetic load across landraces, but no variation in genetic load was observed among landraces with or without gene flow (Smith et al. 2019; Lozano et al. 2021). However, such evolutionary rescue by gene flow from wild relatives into domesticates has received little attention. We study the effects of gene flow on genetic load in a three times domesticated crop and its wild relatives to understand the evolutionary role of gene flow and genetic load during crop domestication.

Grain amaranth is a nutritious pseudo-cereal from the Americas that has been domesticated three times from one ancestral species (A. hybridus). Two grain amaranths were domesticated in Central America (A. cruentus, A. hypochondriacus) and one in South America (A. caudatus) (Sauer 1967). The taxonomic complexity of the Amaranthus genus led to different domestication scenarios for the crop (Sauer 1967; Kietlinski et al. 2014; Stetter et al. 2017). Population genetic and genome-wide selection signals suggest subpopulations of A. hybridus as ancestors for all three grain amaranths (Stetter et al. 2020; fig. 1A and supplementary fig. 1, Supplementary Material online). In South America, the closely related A. quitensis was potentially involved in the domestication process of A. caudatus and genome-wide signals of gene flow between grain amaranths and their wild relatives have been detected previously (Kietlinski et al. 2014; Stetter et al. 2020). Whereas the three grain amaranth species have been cultivated as a crop in different regions of America for thousands of years, all three lack key domestication traits (Stetter et al. 2017). A potential reason for the lack of domestication traits might be continuous gene flow from wild relatives that prevented domestication traits to fix (Stetter 2020). Understanding the underlying genomic signatures of gene flow and selection could improve the understanding of the evolutionary history of crops (Meyer et al. 2012).

In this study, we use population-wide whole-genome sequencing data and reveal the mosaic signature of gene flow along the genome of domesticated amaranths. We found strong signals of gene flow even between the three geographically isolated domesticated species. Besides post-domestication gene flow between crops, we also observed high levels of genetic exchange between the South American wild species and the local crop. A hybridization experiment between three crops indicates genetic incompatibility between A. cruentus and the other two grain amaranths. This reproductive barrier might be a contributing factor to the different rates of gene flow observed between species despite their geographic distances. Gene flow from the wild ancestor A. hybridus into the domesticated amaranths reduced genetic load in the crops, but only a few positively selected regions were exchanged through gene flow between crop species. Gene flow might be an important source of genetic variation for crops, not only to provide adaptive alleles but also to reduce genetic load and allow further selection.

Results

Strong Post-Domestication Exchange between Crops and between Crops and Their Wild Relatives

Gene flow likely played an important role during the domestication of different crops (Janzen et al. 2019). In grain amaranth, genome-wide signals of gene flow between species have been previously reported (Stetter et al. 2017, 2020). In order to quantify gene flow among different domesticated and wild populations, we measured ancient admixture using the D-statistic (ABBA-BABA) for all possible tree topologies using ANGSD (Korneliussen et al. 2014) in whole-genome sequencing data of six population samples of domesticated (caudatus: A. caudatus; cruentus: A. cruentus; hypochondriacus: A. hypochondriacus); and wild amaranth (hybridus_CA: A. hybridus from Central America; hybridus_SA: A. hybridus from South America and quitensis: A. quitensis). We identified gene flow between crop species and between crops and their wild relatives. We found ample gene flow even between geographically distant crop species (fig. 1B). The strongest signal was identified for the Central American crop hypochondriacus and the South American caudatus. This signal was robust even when changing the third species in the test (fig. 1B and supplementary table S1, Supplementary Material online). The test also identified gene flow between the South American crop species caudatus and the second Central American grain amaranth cruentus. However, the strength of gene flow between them was lower. This is also shown when the three grain species were tested in the same tree, where a significant level of gene flow between caudatus and hypochondriacus was identified (D=0.22; fig. 1B, second row), showing that the signal of gene flow between the two species is higher than the shared variation of the three crops. As both Central American grain amaranths were domesticated from Central American hybridus and the exact subpopulations of the ancestor that gave rise to each crop remain unknown, we could not test directly for gene flow between the two Central American grain species. The high level of exchange of genetic material between species was also shown by tree topology tests using Twisst (Martin and Belleghem 2017). Although the expected tree topology along the genome, with the two Central American crops being closest, was the most common, the other two alternative topologies were only slightly less abundant (fig. 1C). Altogether, more gene flow was observed between two allopatric crop species, caudatus and hypochondriacus, less gene flow was observed with cruentus.

Genome-wide signals of gene flow. (A) Schematic history of amaranth domestication. Amaranth has likely been domesticated three times independently from different subpopulations of A. hybridus (Central America (hybridus_CA) and South America(hybridus_SA)). A. quitensis is speculated to be an intermediate population between A. hybridus_SA and A. caudatus. Colors are consistent with the legend in B. (B) Gene flow between amaranth populations (exchanging pairs highlighted in yellow). The D-value indicates the strength of gene flow. Only significant signals of gene flow are shown. (C) Genome-wide summary of tree topologies along the genome inferred by Twisst. The proportion of each of the three topologies observed along the genome is shown in bars.
Fig. 1.

Genome-wide signals of gene flow. (A) Schematic history of amaranth domestication. Amaranth has likely been domesticated three times independently from different subpopulations of A. hybridus (Central America (hybridus_CA) and South America(hybridus_SA)). A. quitensis is speculated to be an intermediate population between A. hybridus_SA and A. caudatus. Colors are consistent with the legend in B. (B) Gene flow between amaranth populations (exchanging pairs highlighted in yellow). The D-value indicates the strength of gene flow. Only significant signals of gene flow are shown. (C) Genome-wide summary of tree topologies along the genome inferred by Twisst. The proportion of each of the three topologies observed along the genome is shown in bars.

We tested whether isolation with gene flow between hypochondriacus and caudatus was a better fitting model than a simple split without further gene flow by simulating demographic histories using Fastsimcoal2 (Excoffier et al. 2013). We investigated three alternative models: population split without gene flow, population split with one-time gene flow and population split with continuous gene flow (supplementary fig. S2, Supplementary Material online). The model for a population split with continuous gene flow was obtained as the best model (supplementary table S2, Supplementary Material online), suggesting that this is the most suitable scenario (supplementary fig. S2, Supplementary Material online). In addition, this model predicts population splits estimates that coincide with previous observations (Stetter et al. 2020).

In addition to gene flow between crop species, we examined gene flow signals between the crops and their wild relatives. We found the strongest signal of gene flow between the sympatric South American grain crop caudatus and the local wild quitensis with a D-value of 0.71. When testing a scenario where we assumed quitensis as the ancestor of caudatus, which had been suggested previously (Sauer 1967), significant gene flow from South American hybridus was detected (supplementary figs. S3 and S4, Supplementary Material online), indicating the intermediate role of A. quitensis between wild and domesticated species.

Fine-scale Gene Flow Reveals Diverse Local Ancestry of Grain Amaranths

The genome-wide and population-wide gene flow analysis already showed the complex pattern of exchange of genetic material between the Amaranthus species. We found evidence of gene flow for distant and closely related species. To understand the species complex as a whole, we inferred the local ancestry (Lawson et al. 2012) along the genome of each individual using finestrucuture v4.1 (Lawson et al. 2012) and summarized them by population (fig. 2). We observed that the Central American grain species cruentus and hypochondriacus had less admixed backgrounds but shared ancestry tracks depending on the different individuals. Both had the highest donated portion from the wild hybridus_CA (fig. 2). Hypochondriacus had a heterogeneous contribution from the other species to different individuals (although with proportions less than 0.016 total), whereas cruentus had a more homogeneous contribution among individuals in the population. In South America, the overall pattern observed with population-wide introgression tests was also confirmed by the individual-based test (supplementary figs. S3 and S4, Supplementary Material online). The South American populations caudatus and quitensis shared large amounts of ancestry, but this strongly varied between the individuals, between 10.8% and 18.5% of quitensis ancestry in caudatus individuals (fig. 2A), which also varied along the genome (fig. 2B).

Variable ancestry across individuals and along the genome. (A) Contribution of donor populations to individual recipients. Values within boxplots represent contributions by different donor populations to each individual of the recipient population. The Y-axis scale differs between plots. The schematic geographic range of populations. (B) Population scale ancestry proportion along genomic positions of Scaffold 4. The proportion of the most likely donor population at a given SNP across all individuals in the recipient population. Each plot represents a recipient population, and colors represent donor populations. Exemplary scaffold, all scaffolds in supplementary fig. S5, Supplementary Material online. Donor colors according to fig. 1.
Fig. 2.

Variable ancestry across individuals and along the genome. (A) Contribution of donor populations to individual recipients. Values within boxplots represent contributions by different donor populations to each individual of the recipient population. The Y-axis scale differs between plots. The schematic geographic range of populations. (B) Population scale ancestry proportion along genomic positions of Scaffold 4. The proportion of the most likely donor population at a given SNP across all individuals in the recipient population. Each plot represents a recipient population, and colors represent donor populations. Exemplary scaffold, all scaffolds in supplementary fig. S5, Supplementary Material online. Donor colors according to fig. 1.

We further wanted to understand whether gene flow is variable along the genome. Comparing ancestry signals along the genome of the different species showed that even the same region can have multiple donors in a species (fig. 2B and supplementary fig. S5, Supplementary Material online). Using D-value in genomic windows, we scanned the genomes for gene flow signals in the trees with significant genome-wide signals. The previous comparison between crops showed the presence of gene flow between hypochondriacus and caudatus and between cruentus and caudatus (fig. 1). In the local scan, we observed similar signals as for global gene flow analysis; stronger gene flow between caudatus and hypochondriacus (D>0.992 for top 1% windows) than between caudatus and cruentus (D<0.887 for bottom 1% windows) (fig. 3). Windows representing significant gene flow from cruentus or hypochondriacus with caudatus did not overlap, suggesting that gene flow occurred independently between species.

Fine-scale gene flow along the genome between domesticated amaranth populations. We used the D-value to calculate gene flow along the genome in 1000 SNPs windows. Each dot represents the D-value for a tree comparing A. caudatus with A. hypochondriacus or A. cruentus. Positive values are indicative of gene flow between A. caudatus and A. hypochondriacus and negative values between A. caudatus and A. cruentus. The top 0.1 selective sweep regions detected in A. caudatus. Orange dots denote overlaps between selective sweep and top gene flow signal.
Fig. 3.

Fine-scale gene flow along the genome between domesticated amaranth populations. We used the D-value to calculate gene flow along the genome in 1000 SNPs windows. Each dot represents the D-value for a tree comparing A. caudatus with A. hypochondriacus or A. cruentus. Positive values are indicative of gene flow between A. caudatus and A. hypochondriacus and negative values between A. caudatus and A. cruentus. The top 0.1 selective sweep regions detected in A. caudatus. Orange dots denote overlaps between selective sweep and top gene flow signal.

To understand the potential reason for the observed high levels of gene flow between grain amaranth species, we combined gene flow signals along the genome with selection scan results (Gonçalves-Dias and Stetter 2021). Despite the genome-wide distribution of gene flow between crop species, only a few selective sweeps overlapped with outlier windows of gene flow between caudatus and the other two crop species. We found 13 overlapping windows in total, eight in regions of gene flow between hypochondriacus and caudatus and five between caudatus and cruentus. Despite the relatively low total number, the overlap was higher than expected by chance (p=0.02), suggesting beneficial gene flow between geographically distant crop relatives.

Introgression from Wild Ancestor Mitigates Increased Genetic Load in Domesticated Grain Amaranths

In many crop species, a reduction in overall genetic diversity between wild relatives and the crop has been observed. This has been associated with population bottlenecks and directional selection during domestication (Gaut et al. 2018). Increased genetic drift and hitchhiking with selected alleles can lead to a higher genetic load in the domesticated species (Lu et al. 2006; Wang et al. 2017). We calculated GERP scores for the A. hypochondriacus reference genome from whole-genome alignments with 15 diverse plant species of different relatedness as a proxy for deleterious alleles (supplementary fig. S6, Supplementary Material online). We observed that two domesticated species (caudatus and cruentus) had a significantly higher total genetic load than their wild ancestor hybridus. The third crop species, hypochondriacus, had a higher load than hybridus_SA but lower than hybridus_CA (fig. 4A). This pattern remained even when using the A. cruentus reference genome (Ma et al. 2021) to calculate GERP scores, showing that the difference is likely not the result of reference bias when calculating GERP scores (supplementary fig. S7, Supplementary Material online). The South American relative quitensis showed as high total load as the domesticated species. Partitioning of total genetic load from fixed and segregating sites showed that the domesticated species had a high fixed load but a lower segregating load than their wild ancestor (fig. 4B and C). Quitensis also showed high fixed load and low segregating load, in agreement with the small effective population size and low genetic diversity documented previously (Stetter et al. 2020).

Genetic load in domesticated and wild amaranth. (A–C) Genetic load was calculated as the sum of GERP scores for the derived allele per individual. (A) Total genetic load per individual in domesticated and wild populations. Different letters above the box plot represent significant differences. (B) Segregating genetic load in each population per individual and (C) Fixed genetic load within each population. (D–E) Accumulation of genetic load in selective sweep regions and control regions (rest of the genome). (D) Mean load of sweep/non-sweep region (including all sites in region); (E) Mean effect (GERP score) of deleterious allele in region (only deleterious sites). Asterisks above box plot represent the significance level. (F) Relative introgressed genetic load; load in introgressed regions relative to introgression received from the donor. A value greater than one represents increased load through introgression, whereas a value lower than one shows a reduction in load through introgression compared with the amount of introgression by the donor. A value of 1 shows, the expectation of equal load and introgression proportion (denoted by red dotted line). The asterisks above the box plot represent the significance level for one-sample t-test. (*P-value<0.05, **P-value<0.01, ***P-value<0.001)
Fig. 4.

Genetic load in domesticated and wild amaranth. (AC) Genetic load was calculated as the sum of GERP scores for the derived allele per individual. (A) Total genetic load per individual in domesticated and wild populations. Different letters above the box plot represent significant differences. (B) Segregating genetic load in each population per individual and (C) Fixed genetic load within each population. (D–E) Accumulation of genetic load in selective sweep regions and control regions (rest of the genome). (D) Mean load of sweep/non-sweep region (including all sites in region); (E) Mean effect (GERP score) of deleterious allele in region (only deleterious sites). Asterisks above box plot represent the significance level. (F) Relative introgressed genetic load; load in introgressed regions relative to introgression received from the donor. A value greater than one represents increased load through introgression, whereas a value lower than one shows a reduction in load through introgression compared with the amount of introgression by the donor. A value of 1 shows, the expectation of equal load and introgression proportion (denoted by red dotted line). The asterisks above the box plot represent the significance level for one-sample t-test. (*P-value<0.05, **P-value<0.01, ***P-value<0.001)

To investigate if hitchhiking of deleterious alleles with selected loci led to the overall increase in genetic load in the domesticated species, we compared genetic load within selective sweep regions (Gonçalves-Dias and Stetter 2021) with that of random non-sweep regions. We observed that the mean load per site in sweep regions was significantly lower than in control regions for all domesticates (fig. 4D). However, the mean GERP score per deleterious site in the sweep regions showed higher values than deleterious sites in control regions, suggesting a hitchhiking effect (fig. 4E). This suggests that strongly deleterious alleles might accumulate within selective sweep regions due to hitchhiking, but mildly deleterious alleles fix or increase in frequency due to increased genetic drift.

The abundant gene flow between grain amaranths and their wild relatives is expected to impact the patterns of genetic load. To evaluate the potential effect of gene flow on the accumulation of deleterious alleles, we measured the total load accumulated within introgressed regions from other species into a recipient species as the proportion of introgressed load received per individual. The expected value if introgressed regions carry the same amount of load as random regions would be one, whereas the value is higher than one for deleterious introgression and less than one for beneficial introgression. The analysis showed that introgressed regions from A. hybridus into the domesticated reduced genetic load, whereas introgression between crops donated higher load in the recipient population (fig. 4F). Introgression from domesticated donors into the wild species resulted in a higher genetic load. These results might suggest that gene flow from populations with higher effective population sizes (wild relatives) could provide evolutionary rescue for the smaller populations (crops).

Hybrid Incompatibilities between Grain Amaranths

All gene flow begins with the inter-mating between individuals of different populations. To further understand the process of gene flow and differences in observed levels of gene flow, we crossed multiple inbred lines of the three crop species. All three amaranth species are mostly selfing, but outcrossing is possible and occurs in the field. We selected three accessions from each crop species and crossed them within and between species, including selfings. All crosses produced viable F1 seeds. Selfings and intra-specific F1 plants grew without complications and set seeds (fig. 5, supplementary table S3, Supplementary Material online). The inter-specific F1 plants of the 5 combinations between A. caudatus and A. hypochondriacus developed healthy and fertile plants. For crosses between A. cruentus and A. hypochondriacus, one of the three combinations led to a lethal phenotype, with unhealthy seedlings that did not survive the juvenile stage (fig. 5). Yet, both parents of this cross produced healthy and fertile offspring with other crossing partners, suggesting that incompatibility is the result of a specific allelic combination rather than an individual unfit parent (supplementary table S3, Supplementary Material online). All of the six combinations between A. caudatus and A. cruentus resulted in the lethal phenotype, suggesting a genetic incompatibility between these species. The difference in hybrid compatibility between grain amaranth species likely contributed to the observed patterns of gene flow.

Proportion of lethal F1 phenotypes. The lower triangle shows the number of lethal combinations between accessions out of the total number of combinations. We considered a cross as “lethal” when all F1 seedlings died within 20 days after planting. The upper triangle shows example images of phenotypes of inter-specific combinations. cau: A. caudatus, cru: A. cruentus, hyp: A. hypochondriacus.
Fig. 5.

Proportion of lethal F1 phenotypes. The lower triangle shows the number of lethal combinations between accessions out of the total number of combinations. We considered a cross as “lethal” when all F1 seedlings died within 20 days after planting. The upper triangle shows example images of phenotypes of inter-specific combinations. cau: A. caudatus, cru: A. cruentus, hyp: A. hypochondriacus.

Discussion

Crop populations that we observe today are the result of different evolutionary processes. Whereas selection and demographic changes have been extensively studied, gene flow and its role in the fitness of crops has only received attention in recent years. This is partially due to technical advances in plant genomics, but might also have resulted from the conceptual assumption of a linear process from one wild ancestor. The complex makeup of modern grain amaranth shows that gene flow between crop populations, even over long geographic distances, was prevalent (fig. 1). Gene flow between crop lineages of species that were domesticated multiple times has also contributed to diversity in rice (Yang et al. 2012), tomato (Razifard et al. 2020) and common bean (Rendón-Anaya et al. 2017). Not only does such gene flow between closely related crop populations occur, it is also heterogeneous between individuals and along the genome (fig. 2).

A potential reason for lower genetic exchange between specific pairs of grain amaranth could have been the reported difference in chromosome number between A. cruentus with 17 chromosomes in comparison to 16 chromosomes in the other 4 species (EJ and Poggio 1994; Ma et al. 2021). Yet, this would only lead to infertile F1 plants, rather than necrotic, non-viable plants that die in the seedling stage. Instead, A. cruentus formed fertile hybrids with two out of three A. hypochondriacus accessions, suggesting incompatibility not because of difference in chromosome numbers but rather a genetic incompatibility (fig. 5). Our crossing experiment revealed differential genetic incompatibility between grain amaranth species, consistent with previous observations on inter-specific hybrid necrosis (Gupta and Gudu 1991). A potential one-locus underdominance model of hybrid incompatibility would require strong genetic drift in both populations (Wu and Ting 2004), which could be the result of previously demonstrated domestication bottlenecks in grain amaranth (Stetter et al. 2020). The observed reproductive barrier could also be the result of a Dobzhansky–Muller incompatibility (Muller 1942), including more than one locus. Given the large geographic distance between incompatible crop species (A. caudatus in South America and A. cruentus in Mesoamerica) the reproductive barriers likely evolved through neutral processes rather than selection against gene flow. The incomplete barrier between A. cruentus and A. hypochondriacus might allow further insights into the progression of reproductive isolation during crop domestication (Tenaillon et al. 2023). The genetic mechanism for incompatibility warrants further investigation, as this has practical implications for potential hybrid breeding using different crop species as heterotic pools. The complete compatibility between A. hypochondriacus and A. caudatus is reflected in higher gene flow signals than between these species and A. cruentus (figs. 1 and 5). Therefore, A. hypochondriacus and A. caudatus would be the most promising heterotic pools for future amaranth breeding.

Given the high prevalence of incompatibility between A. cruentus and A. caudatus, gene flow might have occurred early during the 8,000 year-long domestication history if strong reproductive barriers only developed later in the process. If the hybrid incompatibility arose at the advent of amaranth domestication, it can be expected that gene flow between A. cruentus and A. caudatus was likely beneficial, given the high fitness disadvantage of F1 hybrids (Janzen et al. 2019; Aguillon et al. 2022). We found a low but significant number of introgressed selective sweeps that might represent such beneficial gene flow between amaranth crop species. Currently, there are only a few domestication-related QTL known in grain amaranth that would allow linking introgressed regions to phenotypic changes during domestication. The previously reported QTL for the seed color change during amaranth domestication did not show signals of introgression consistent with the previously reported repeated selection for the trait in the three crop species (Stetter et al. 2020). More quantitative genetic and functional analyses could reveal additional QTLs that could indicate the adaptive potential of introgressed regions between grain amaranths.

Adaptive gene flow from wild relatives into crops has previously been associated with environmental adaptation in crops. For instance, in maize, the introgression of the wild relative Zea mays spp. mexicana has been associated with the adaptation of maize to highland conditions and colder climates (Wang et al. 2017). Recent work even suggests a prevalent role of Zea mays spp. mexicana in the domestication of maize (Yang et al. 2023). Although we cannot associate gene flow from wild relatives with positive selection, we found decreased genetic load in regions that were introgressed from wild relatives (fig. 4). This could be due to higher effective population size and higher genetic diversity of wild relatives (Stetter et al. 2020). The wild ancestor A. hybridus showed lower genetic load than the domesticates (fig. 4), which might be the result of less demographic change during the recent past (Stetter et al. 2020). Post-domestication gene flow between crops and their wild ancestor could consequently reduce the frequency of deleterious alleles. A similar correlation of genetic load and gene flow from a wild relative has also been shown in maize and sunflower, where gene flow regions from the wild relative showed reduced genetic load (Wang et al. 2017; Huang et al. 2023). Similarly, work in humans has shown that gene flow from a relative with a small population size (Neanderthal) into a population with a larger population size (modern humans) led to increased genetic load (Harris and Nielsen 2016), as we observe for gene flow from domesticated amaranths into A. hybridus. The accumulation of genetic load in populations with small effective population sizes can even lead to the extinction of populations or species as a whole (Rogers and Slatkin 2017). Hence, gene flow from relatives with large population size not only provides adaptive variation but can also lead to the evolutionary rescue of the small population (Carlson et al. 2014). Despite the amelioration of genetic load through gene flow with wild relatives, crop-wild hybrids are expected to perform poorly as crops. Gene flow, therefore, needs to have an overall beneficial effect that is higher than the destruction of domestication traits (Stetter 2020). This might be particularly possible in crops like grain amaranth where the domestication syndrome is only weakly pronounced (Stetter et al. 2020).

For crops that maintain high gene flow with their relatives and are phenotypically less differentiated from wild plants, as is the case for grain amaranth, the borders between wild and domesticate might be fluid. Strong gene flow between wild and domesticated crops might also allow the domesticate to return to the wild and be viable without human intervention again. Such feralization has been observed for a number of plant and animal species (Gering et al. 2019). We found particularly strong signals of gene flow between grain amaranth and its wild relatives in South America (fig. 1B). The close relationship between wild species and crop in South America has led to different hypotheses for the domestication of A. caudatus suggesting A. quitensis as potential wild ancestor (Sauer 1967). Although this cannot be completely ruled out, previous work using genome-wide makers data suggested A. hybridus as ancestor for all three grain amaranths (Kietlinski et al. 2014; Stetter et al. 2020). The high and genome-wide equally distributed signal of gene flow between A. caudatus and A. quitensis (figs. 1 and 2) together with the low population size and signs of a strong population bottleneck in A. quitensis (Stetter et al. 2020) might indicate a feralized status of this species. The clarification of the status of A. quitensis will need further work with multiple populations of this species, local crop, and wild relatives.

Overall, we show that the relationships between species are beyond linear, with exchanges between populations despite large geographic distances and the reintroduction of genetic material that was potentially lost during speciation. Even with observed genetic incompatibilities and high genetic differentiation between the crop species, we found strong signals of gene flow between grain amaranths and between the crops and their wild relatives. Recurrent gene flow from the wild relative into the crops might have allowed evolutionary rescue, counteracting the loss of diversity, but likely hindered the fixation of domestication traits leading to the incomplete domestication syndrome observed today for grain amaranth.

Materials and Methods

We studied whole-genome resequencing data of 108 domesticated and wild amaranth accessions. The raw reads are available from European Nucleotide Archive (project numbers PRJEB30531) (Stetter et al. 2020). The accessions included the three domesticated amaranth; 33 A. caudatus L. (caudatus), 21 A. cruentus L. (cruentus), and 21 A. hypochondriacus L. (hypochondriacus); as well as 5 wild A. hybridus L. from Central America (hybridus_CA), 4 A. hybridus L. from South America (hybridus_SA), and 24 A. quitensis Kunth. (quitensis) (supplementary table S5, Supplementary Material online). A. tuberculatus was used as outgroup for the study (ERR3220318), (Kreiner et al. 2019). Raw reads were aligned to amaranth reference genome (Lightfoot et al. 2017) using bwa-mem2 (v 2.2.1) (Vasimuddin et al. 2019).

Variant Calling

For variant calling, we utilized ANGSD (v.0921) (Korneliussen et al. 2014), with -refA. hypochondriacus V2.1 reference genome (Lightfoot et al. 2017) - doCounts 1, doGeno 3dovcf 1, gl 2, dopost 2, domajorminor 1, and domaf 1. We filtered for missing data and mapping quality using minInd 73 (max 30 missing data), minQ 20, minMapQ 30, only_proper_pairs 1, trim 0, SNP_pval 1e-6, setMaxDepthInd 150, and setminDepth 73. The resulting VCF file was phased using Beagle (v 5.2) (Browning et al. 2021) using default parameters. For linkage disequilibrium (LD) pruning, we used Plink (v 1.9) (Purcell et al. 2007) using windows of 50 kb with 5 kb steps and a r2 threshold of 0.3. The resulting VCF file had a total of 13,330,082 sites.

Gene Flow Analysis

We inferred gene flow between populations using D-statistic implemented in ANGSD (v.0921) (Korneliussen et al. 2014). We inferred population-wide statistics with the abbababba function for calculations of D per individual and abbababba2 for calculations between populations. For both tests, A. tuberculatus was used as outgroup (H4). Only trees with a significant Z-score (absolute value above 3) were included in the results. For fine-scale analysis along the genome, we employed Dsuite (Malinsky et al. 2021). We used the function Dinvestigate in windows of 100 SNPs to calculate D between trios along the genome. We also utilized the function Dtrios to verify the concordance of the global genome with the results obtained from ANGSD. To overlap regions with gene flow between crop species with selection signals, windows with significant signals of gene flow (1% outlier values) were overlapped with selective sweeps signals in the recipient population. Selective sweeps were previously identified in Gonçalves-Dias and Stetter (2021). The overlaps were tested for significance using a hypergeometric test (pyhper function in R 4.2).

Topology Inference

We used Twisst (Martin and Belleghem 2017) to infer the topology of each trio along the genome in windows of 100 SNPs, utilizing A. tuberculatus as an outgroup. For each window, a topology is assigned and a summary of the proportional windows in which each topology appeared is then obtained. This inference allows a blind observation of the relationship between species. In the case, where topologies that differ from a neutral expectation are present in high proportions suggests gene flow between species. We inferred the topology for trios, between which a putative gene flow signal was identified.

Local Ancestry Inference

We inferred local ancestry for each individual using finestructure v4.1 (Lawson et al. 2012). We used a uniform recombination rate generated with the perl script makeuniformrecfile.pl. The program was run using parameter -f 0 0 which considers the populations and iterates through all individuals. For each individual, all five species (including individuals of the same species) were used as donors, which allowed to differentiate ancestry from all the species. We further assigned the most likely donor for each genomic region of an individual. The donor for each position per individual was assigned based on the most likely donor population with a likelihood larger than 0.5. Sites that could not meet these likelihood thresholds were called “ambiguous.” Using these thresholds, the proportion of donated region per individual was calculated. In addition, the proportion along the genome within each population was summarized.

Demographic Modeling

To estimate whether the identified scenario of gene flow fits best to our data we used simulations using Fastsimcoal2 (Excoffier et al. 2013). The joint site frequency spectrum (SFS) was generated using non-coding SNPs having no missing value in any of the individuals and a minimum coverage of five reads using a python program easySFS (https://github.com/isaacovercast/easySFS). First, the program was run on preview mode (–preview) to identify the true sample size and the best sample size selected was used for the projection (-proj) to generate the joint SFS. Three different models namely, two-population split, two-population split with one migration event and two-population split with continuous gene flow (supplementary fig. S2, Supplementary Material online) were applied to the observed joint SFS. The models were compared using Akaike’s Information Criterion (AIC). The best parameter estimate was calculated based on 100 independent runs with 200,000 coalescent simulations and 40 cycles of likelihood maximization algorithm. The 95 percent confidence interval of each parameter was estimated based on 50 non-parametric bootstrapping datasets. Each of the 50 bootstrapped datasets was run 50 times to estimate the best run. These 50 best run parameters were then used to estimate the confidence interval using the boot package in R (Canty and Ripley 2017).

Genetic Load

We used Genomic Evolutionary Rate Profiling (GERP) (Davydov et al. 2010) scores to account for the effect of deleterious alleles at each site. We aligned 15 repeat-masked genomes of angiosperm species spanning a large taxonomic range to the reference genome of A. hypochondriacus (v2.1) (Lightfoot et al. 2017). We followed the pipeline of Wu et al. (2022). Briefly, we aligned genomes of 16 divergent species: Beta vulgaris EL10 1.0, Brachypodium distachyon v2.1, Chenopodium quinoa v1.0, Glycine max Wm82.a4.v1, Helianthus annuus r1.2, Medicago truncatula Mt4.0v1, Mimulus guttatus v2.0, Oryza sativa v7.0, Phaseolus vulgaris v2.1, Populus trichocarpa v4.1, Setaria viridis v2.1, Solanum lycopersicum ITAG4.0, Sorghum bicolor v3.1.1, Spinacia oleracea (Monoe Viroflay), and Vitis vinifera v2.1 from phytozome (Goodstein et al. 2012) to the A. hypochondriacus reference genome using the LAST aligner (Kiełbasa et al. 2011). The tree topology for the species was extracted from the NCBI-phylogeny using the ete3 toolkit (v3.1.2) (Huerta-Cepas et al. 2016). Phylofit from phast package (Siepel et al. 2005) was used to calculate the branch length of the tree along with four-fold degenerate sites from A. hypochondriacus reference annotation file to generate a neutral model. All pairwise alignments were merged using ROAST (https://github.com/multiz/multiz) (Blanchette et al. 2004). GERP++ (Davydov et al. 2010) was used to calculate the GERP scores for each site using gerpcol with -j option that projects out the reference genome to avoid bias in the calculation. All sites with negative GERP values were set to 0 as negative values are not informative and misleading. The ancestral allele for each site was defined on the basis of three outgroup species closest to Amaranthus in the phylogenetic tree, that is, Beta vulgaris, Chenopodium quinoa, and Spinacia oleracea. Variants among these three species were called from the roast multiple alignment file (maf) using mafFilter (version 1.3.1) (Dutheil et al. 2014). Sites not covered by any of the three outgroup species were removed. For the remaining sites, the major allele among the three species was called the ancestral allele. In case of discrepancy for a majority rule, the allele for Spinacia oleracea was called as the ancestral allele.

From a total of 13.2 million SNPs, we extracted 1,429,744 sites for which the GERP score and the ancestral alleles could be assigned. We then polarized the SNPs and removed sites where neither the reference nor the alternate allele matched the ancestral one. This yielded a total of 1,145,566 SNPs that were used for genetic load calculation. The GERP score from sites having derived alleles was then summed to calculate the total genetic load using an additive model. The genetic load was calculated for each accession individually. The significance of variation for the load in each population was then analyzed using an ANOVA followed by TukeyHSD in R (https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/TukeyHSD).

To account for the reference bias, the GERP score was also calculated using A. cruentus genome (Ma et al. 2021) as reference, using the same method as described above. The variants file of individuals that used A. hypochondriacus as reference was lifted to the A. cruentus genome using liftover module of mafFilter (version 1.3.1) (Dutheil et al. 2014). The individual genetic load was then calculated using GERP score from A. cruentus genome for each lifted site as described above.

To overlap genetic load and selective sweeps, the sweep regions for each of the populations were overlapped with the sites with GERP score. Variants from those regions were extracted for the individuals of the respective populations. Any other regions of the genome that were not under the sweep region were considered as control region. To account for the difference in the sizes of the sweep and control regions, we divided the total load by the total number of sites used in the analysis. Differences in genetic load between sweep and non-sweep regions were then tested using a t-test for each population.

In order to calculate the introgressed load, GERP scores were summed for sites having the derived allele donated by a different population. The introgressed load was expressed as a ratio of load contributed by the introgressed species to the percentage of introgression sites. A value greater than one predicts the contribution of a higher load due to gene flow. The significance of deviation from the expected value was analyzed using one-sample t-test against the null-expectation of equal contribution of 1.

Experimental Hybridization between Grain Amaranth Species

We selected genetically and morphologically defined accessions for each of the three crop species (A. caudatus, A. cruentus, and A. hypochondriacus) to assess their cross-compatibility. We crossed 9 parental accessions (three per species) to create 25 inter- and intra-specific combinations and examined multiple crosses per combination (supplementary table S3, Supplementary Material online). The parental lines were previously selfed for at least three generations to ensure homozygosity. As the three grain amaranths are mostly selfing, we hand-emasculated the female parent and bagged parents together. Successful crossing was ensured by PCR using diverging primer pairs (supplementary table S4, Supplementary Material online). To determine the hybrid survival rate, the hybrids were grown alongside their parental accessions in a greenhouse in Cologne (Germany) under long day conditions (16h light, 8h dark) at 25C. We evaluated the survival of hybrid plants from at least three offsprings per cross. We considered a cross as “lethal” when all F1 seedlings died within 20 days after planting.

Supplementary Material

Supplementary data are available at Molecular Biology and Evolution online.

Acknowledgments

We acknowledge funding by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy – EXC-2048/1 – Project ID 390686111 and grant STE 2654/5 to MGS by the DFG.

Author Contributions

M.G.S. conceived the study. J.G.D. processed the data and conducted genome-wide analysis of gene flow and local ancestry inferences. A.S. performed genetic load analysis and demographic modeling. C.G. performed experimental crosses and executed the incompatibility study together with J.G.D. J.G.D., A.S., and C.G. prepared figures and tables. M.G.S., J.G.D., and A.S. wrote the manuscript. All authors discussed the results, edited and approved the manuscript.

Data Availability

Genomic data is available through Stetter et al. (2020) and the associated ENA project. All scripts used in the analysis are available on https://github.com/cropevolution/GeneFlowLoadRescue.

Conflict of interest statement: The authors declare that they have no competing interests.

References

Aguillon
SM
,
Dodge
TO
,
Preising
GA
,
Schumer
M
.
2022
.
Introgression
.
Curr Biol
.
32
:
R865
R868
.

Bertorelle
G
,
Raffini
F
,
Bosse
M
,
Bortoluzzi
C
,
Iannucci
A
,
Trucchi
E
,
Morales
HE
,
van Oosterhout
C
.
2022
.
Genetic load: genomic estimates and applications in non-model animals
.
Nat Rev Genet
.
23
:
492
503
.

Blanchette
M
,
Kent
W.
,
Riemer
C
,
Elnitski
L
,
Smit
A.
,
Roskin
KM
,
Baertsch
R
,
Rosenbloom
K
,
Clawson
H
,
Green
ED
, et al.
2004
.
Aligning multiple genomic sequences with the threaded blockset aligner
.
Genome Res
.
14
:
708
715
.

Browning
BL
,
Tian
X
,
Zhou
Y
,
Browning
SR
.
2021
.
Fast two-stage phasing of large-scale sequence data
.
Am J Hum Genet
.
108
:
1880
1890
.

Canty
A
,
Ripley
B
.
2017
.
Package ‘boot’. Bootstrap Functions. CRAN R Proj
.

Carlson
SM
,
Cunningham
CJ
,
Westley
PA
.
2014
.
Evolutionary rescue in a changing world
.
Trends Ecol Evol
.
29
:
521
530
.

Chomicki
G
,
Schaefer
H
,
Renner
SS
.
2020
.
Origin and domestication of Cucurbitaceae crops: insights from phylogenies, genomics and archaeology
.
New Phytol
.
226
:
1240
1255
.

Civáň
P
,
Drosou
K
,
rmisen-Gimenez
D
,
uchemin
W
,
Salse
J
,
Brown
TA
.
2021
Episodes of gene flow and selection during the evolutionary history of domesticated barley
.
BMC Genom
22
:
1
17
.

Crispo
E
.
2008
.
Modifying effects of phenotypic plasticity on interactions among natural selection, adaptation and gene flow
.
J Evol Biol
.
21
:
1460
1469
.

Davydov
EV
,
Goode
DL
,
Sirota
M
,
Cooper
GM
,
Sidow
A
,
Batzoglou
S
,
Wasserman
WW
.
2010
.
Identifying a high fraction of the human genome to be under selective constraint using GERP++
.
PLoS Comput Biol
.
6
:
e1001025
.

Dempewolf
H
,
Hodgins
KA
,
Rummell
SE
,
Ellstrand
NC
,
Rieseberg
LH
.
2012
.
Reproductive isolation during domestication
.
Plant Cell
.
24
:
2710
2717
.

Ding
Y-M
,
Cao
Y
,
Zhang
W-P
,
Chen
J
,
Liu
J
,
Li
P
,
Renner
SS
,
Zhang
D-Y
,
Bai
W-N
.
2022
.
Population-genomic analyses reveal bottlenecks and asymmetric introgression from Persian into iron walnut during domestication
.
Genome Biol
.
23
:
1
18
.

Doebley
J
.
2006
.
Unfallen grains: how ancient farmers turned weeds into crops
.
Science
312
:
1318
1319
.

Dutheil
JY
,
Gaillard
S
,
Stukenbrock
EH
.
2014
.
Maffilter: a highly flexible and extensible multiple genome alignment files processor
.
BMC Genom
.
15
:
1
10
.

EJ
G
,
Poggio
L
.
1994
.
Karyological studies in grain amaranths
.
Cytologia
59
:
25
30
.

Ellstrand
NC
.
2014
.
Is gene flow the most important evolutionary force in plants?
Am J Bot
.
101
:
737
753
.

Excoffier
L
,
Dupanloup
I
,
Huerta-Sánchez
E
,
Sousa
VC
,
Foll
M
.
2013
.
Robust demographic inference from genomic and SNP data
.
PLoS Genet
.
9
:
e1003905
.

Gaut
BS
,
Díez
CM
,
Morrell
PL
.
2015
.
Genomics and the contrasting dynamics of annual and perennial domestication
.
Trends Genet
.
31
:
709
719
.

Gaut
BS
,
Seymour
DK
,
Liu
Q
,
Zhou
Y
.
2018
.
Demography and its effects on genomic variation in crop domestication
.
Nat Plants
.
4
:
512
520
.

Gering
E
,
Incorvaia
D
,
Henriksen
R
,
Conner
J
,
Getty
T
,
Wright
D
.
2019
.
Getting back to nature: feralization in animals and plants
.
Trends Ecol Evol
.
34
:
1137
1151
.

Gonçalves-Dias
J
,
Stetter
MG
.
2021
.
PopAmaranth: a population genetic genome browser for grain amaranths and their wild relatives
.
G3
.
11
(
7
):
jkab103
.

Goodstein
DM
,
Shu
S
,
Howson
R
,
Neupane
R
,
Hayes
RD
,
Fazo
J
,
Mitros
T
,
Dirks
W
,
Hellsten
U
,
Putnam
N
, et al.
2012
.
Phytozome: a comparative platform for green plant genomics
.
Nucleic Acids Res
.
40
:
D1178
D1186
.

Gupta
V
,
Gudu
S
.
1991
.
Interspecific hybrids and possible phylogenetic relations in grain amaranths
.
Euphytica
.
52
:
33
38
.

Harris
K
,
Nielsen
R
.
2016
.
The genetic cost of Neanderthal introgression
.
Genetics
203
:
881
891
.

Huang
K
,
Jahani
M
,
Gouzy
J
,
Legendre
A
,
Carrere
S
,
Lázaro-Guevara
JM
,
González Segovia
EG
,
Todesco
M
,
Mayjonade
B
,
Rodde
N
, et al.
2023
.
The genomics of linkage drag in inbred lines of sunflower
.
Proc Natl Acad Sci U S A
.
120
:e2205783119.

Huerta-Cepas
J
,
Serra
F
,
Bork
P
.
2016
.
ETE 3: reconstruction, analysis, and visualization of phylogenomic data
.
Mol Biol Evol
.
33
:
1635
1638
.

Hufford
MB
,
Lubinksy
P
,
Pyhäjärvi
T
,
Devengenzo
MT
,
Ellstrand
NC
,
Ross-Ibarra
J
.
2013
.
The genomic signature of crop-wild introgression in maize
.
PLoS Genet
.
9
:
e1003477
.

Janzen
GM
,
Wang
L
,
Hufford
MB
.
2019
.
The extent of adaptive wild introgression in crops
.
New Phytol
.
221
:
1279
1288
.

Kiełbasa
SM
,
Wan
R
,
Sato
K
,
Horton
P
,
Frith
MC
.
2011
.
Adaptive seeds tame genomic sequence comparison
.
Genome Res
.
21
:
487
493
.

Kietlinski
KD
,
Jimenez
F
,
Jellen
EN
,
Maughan
PJ
,
Smith
SM
,
Pratt
D
.
2014
.
Relationships between the weedy Amaranthus hybridus (Amaranthaceae) and the grain amaranths
.
Crop Sci
.
54
:
220
228
.

Kim
M-S
,
Lozano
R
,
Kim
JH
,
Bae
DN
,
Kim
S-T
,
Park
J-H
,
Choi
MS
,
Kim
J
,
Ok
H-C
,
Park
S-K
, et al.
2021
.
The patterns of deleterious mutations during the domestication of soybean
.
Nat Commun
.
12
:
97
.

Korneliussen
TS
,
Albrechtsen
A
,
Nielsen
R
.
2014
.
ANGSD: analysis of next generation sequencing data
.
BMC Bioinform
.
15
:
356
.

Kozak
KM
,
Joron
M
,
McMillan
WO
,
Jiggins
CD
.
2021
.
Rampant genome-wide admixture across the Heliconius radiation
.
Genome Biol Evol
.
13
:
evab099
.

Kreiner
JM
,
Giacomini
DA
,
Bemm
F
,
Waithaka
B
,
Regalado
J
,
Lanz
C
,
Hildebrandt
J
,
Sikkema
PH
,
Tranel
PJ
,
Weigel
D
, et al.
2019
.
Multiple modes of convergent adaptation in the spread of glyphosate-resistant Amaranthus tuberculatus
.
Proc Natl Acad Sci U S A
.
116
:
21076
21084
.

Lawson
D
,
Hellenthal
G
,
Myers
S
,
Falush
D
.
2012
.
Inference of population structure using dense haplotype data
.
PLoS Genet
.
8
:
e1002453
.

Lightfoot
DJ
,
Jarvis
DE
,
Ramaraj
T
,
Lee
R
,
Jellen
EN
,
Maughan
PJ
.
2017
.
Single-molecule sequencing and Hi-C-based proximity-guided assembly of amaranth (Amaranthus hypochondriacus) chromosomes provide insights into genome evolution
.
BMC Biol
.
15
:
74
.

Liu
S
,
Cornille
A
,
Decroocq
S
,
Tricon
D
,
Chague
A
,
Eyquard
J-P
,
Liu
W-S
,
Giraud
T
,
Decroocq
V
.
2019
.
The complex evolutionary history of apricots: species divergence, gene flow and multiple domestication events
.
Mol Ecol
.
28
:
5299
5314
.

López-Goldar
X
,
Agrawal
AA
.
2021
.
Ecological interactions, environmental gradients, and gene flow in local adaptation
.
Trends Plant Sci
.
26
:
796
809
.

Lozano
R
,
Gazave
E
,
dos Santos
JPR
,
Stetter
MG
,
Valluru
R
,
Bandillo
N
,
Fernandes
SB
,
Brown
PJ
,
Shakoor
N
,
Mockler
TC
, et al.
2021
.
Comparative evolutionary genetics of deleterious load in sorghum and maize
.
Nat Plants
.
7
:
17
24
.

Lu
J
,
Tang
T
,
Tang
H
,
Huang
J
,
Shi
S
,
Wu
C-I
.
2006
.
The accumulation of deleterious mutations in rice genomes: a hypothesis on the cost of domestication
.
Trends Genet
.
22
:
126
131
.

Luo
M-C
,
Yang
Z-L
,
You
FM
,
Kawahara
T
,
Waines
JG
,
Dvorak
J
.
2007
.
The structure of wild and domesticated emmer wheat populations, gene flow between them, and the site of emmer domestication
.
Theor Appl Genet
.
114
:
947
959
.

Lv
F-H
,
Cao
Y-H
,
Liu
G-J
,
Luo
L-Y
,
Lu
R
,
Liu
M-J
,
Li
W-R
,
Zhou
P
,
Wang
X-H
,
Shen
M
, et al.
2022
.
Whole-genome resequencing of worldwide wild and domestic sheep elucidates genetic diversity, introgression, and agronomically important loci
.
Mol Biol Evol
.
39
:
msab353
.

Ma
X
,
Vaistij
F
,
Li
Y
,
Jansen van Rensburg
W
,
Harvey
S
,
Bairu
MW
,
Venter
SL
,
Mavengahama
S
,
Ning
Z
,
Graham
IA
.
2021
.
A chromosome-level Amaranthus cruentus genome assembly highlights gene family evolution and biosynthetic gene clusters that may underpin the nutritional value of this traditional crop
.
Plant J
.
107
:
613
628
.

Malinsky
M
,
Matschiner
M
,
Svardal
H
.
2021
.
Dsuite – Fast D -statistics and related admixture evidence from VCF files
.
Mol Ecol Resour
.
21
:
584
595
.

Martin
SH
,
Belleghem
SMVan
.
2017
.
Exploring evolutionary relationships across the genome using topology weighting
.
Genetics
206
:
429
438
.

Meyer
RS
,
DuVal
AE
,
Jensen
HR
.
2012
.
Patterns and processes in crop domestication: an historical review and quantitative analysis of 203 global food crops
.
New Phytol
.
196
:
29
48
.

Muller
HJ
.
1942
.
Isolating mechanisms, evolution, and temperature
.
Biol Symp
.
6
:
71
125
.

Nabholz
B
,
Sarah
G
,
Sabot
F
,
Ruiz
M
,
Adam
H
,
Nidelet
S
,
Ghesquière
A
,
Santoni
S
,
David
J
,
Glémin
S
.
2014
.
Transcriptome population genomics reveals severe bottleneck and domestication cost in the African rice (Oryza glaberrima)
.
Mol Ecol
.
23
:
2210
2227
.

Niu
S
,
Song
Q
,
Koiwa
H
,
Qiao
D
,
Zhao
D
,
Chen
Z
,
Liu
X
,
Wen
X
.
2019
.
Genetic diversity, linkage disequilibrium, and population structure analysis of the tea plant (Camellia sinensis) from an origin center, guizhou plateau, using genome-wide snps developed by genotyping-by-sequencing
.
BMC Plant Biol
.
19
:
1
12
.

Orlando
L
,
Allaby
R
,
Skoglund
P
,
Der Sarkissian
C
,
Stockhammer
PW
,
Ávila-Arcos
MC
,
Fu
Q
,
Krause
J
,
Willerslev
E
,
Stone
AC
.
2021
.
Ancient DNA analysis
.
Nat Rev Methods Primers
.
1
:
14
.

Ostevik
KL
,
Andrew
RL
,
Otto
SP
,
Rieseberg
LH
.
2016
.
Multiple reproductive barriers separate recently diverged sunflower ecotypes
.
Evolution
70
:
2322
2335
.

Osuna-Mascaró
C
,
Rubio de Casas
R
,
Ómez
JM
,
Loureiro
J
,
Castro
S
,
Landis
JB
,
Hopkins
R
,
Perfectti
F
.
2023
Hybridization and introgression are prevalent in southern European Erysimum (brassicaceae) species
.
Ann Bot
.
131
:
171
184
.

Page
A
,
Gibson
J
,
Meyer
RS
,
Chapman
MA
.
2019
.
Eggplant domestication: pervasive gene flow, feralization, and transcriptomic divergence
.
Mol Biol Evol
.
36
:
1359
1372
.

Purcell
S
,
Neale
B
,
Todd-Brown
K
,
Thomas
L
,
Ferreira
MAR
,
Bender
D
,
Maller
J
,
Sklar
P
,
de Bakker
PIW
,
Daly
MJ
.
2007
.
PLINK: a tool set for whole-genome association and population-based linkage analyses
.
Am J Hum Genet
.
81
:
559
575
.

Razifard
H
,
Ramos
A
,
Valle
AL
,
Bodary
C
,
Goetz
E
,
Manser
EJ
,
Li
X
,
Zhang
L
,
Visa
S
,
Tieman
D
, et al.
2020
.
Genomic evidence for complex domestication history of the cultivated tomato in Latin America
.
Mol Biol Evol
.
37
(
4
):
1118
1132
.

Rendón-Anaya
M
,
Montero-Vargas
JM
,
Saburido-Álvarez
S
,
Vlasova
A
,
Capella-Gutierrez
S
,
Ordaz-Ortiz
JJ
,
Mario Aguilar
O
,
Vianello-Brondani
RP
,
Santalla
M
,
Delaye
L
, et al.
2017
.
Genomic history of the origin and domestication of common bean unveils its closest sister species
.
Genome Biol
.
18
:
1
17
.

Rieseberg
LH
,
Burke
JM
.
2001
.
The biological reality of species: gene flow, selection, and collective evolution
.
Taxon
.
50
:
47
67
.

Rodgers-Melnick
E
,
Bradbury
PJ
,
Elshire
RJ
,
Glaubitz
JC
,
Acharya
CB
,
Mitchell
SE
,
Li
C
,
Li
Y
,
Buckler
ES
.
2015
.
Recombination in diverse maize is stable, predictable, and associated with genetic load
.
Proc Natl Acad Sci U S A
.
112
:
3823
3828
.

Rogers
RL
,
Slatkin
M
.
2017
.
Excess of genomic defects in a woolly mammoth on Wrangel island
.
PLoS Genet
.
13
:
e1006601
.

Ross-Ibarra
J
,
Tenaillon
M
,
Gaut
BS
.
2009
.
Historical divergence and gene flow in the genus Zea
.
Genetics
181
:
1399
1413
.

Saban
JM
,
Romero
AJ
,
Ezard
TH
,
Chapman
MA
.
2023
.
Extensive crop-wild hybridisation during Brassica evolution, and selection during the domestication and diversification of Brassica crops
.
Genetics
223
:
iyad027
.

Sagnard
F
,
Deu
M
,
Dembélé
D
,
Leblois
R
,
Touré
L
,
Diakité
M
,
Calatayud
C
,
Vaksmann
M
,
Bouchet
S
,
Mallé
Y
, et al.
2011
.
Genetic diversity, structure, gene flow and evolutionary relationships within the Sorghum bicolor wild–weedy–crop complex in a western African region
.
Theor Appl Genet
.
123
:
1231
1246
.

Sauer
JD
.
1967
.
The grain amaranths and their relatives: a revised taxonomic and geographic survey
.
Ann Mo Bot Gard
.
54
:
103
137
.

Sedivy
EJ
,
Wu
F
,
Hanzawa
Y
.
2017
.
Soybean domestication: the origin, genetic architecture and molecular bases
.
New Phytol
.
214
:
539
553
.

Sexton
JP
,
Strauss
SY
,
Rice
KJ
.
2011
.
Gene flow increases fitness at the warm edge of a species’ range
.
Proc Natl Acad Sci U S A
.
108
:
11704
11709
.

Sheidai
M
,
Koohdar
F
.
2023
.
Evidence for ancient introgression and gene flow in the genus Tamarix L. (amicaceae): a computational approach
.
Genetic Resources and Crop Evolution
.
70
:
1
9
.

Siepel
A
,
Bejerano
G
,
Pedersen
JS
,
Hinrichs
AS
,
Hou
M
,
Rosenbloom
K
,
Clawson
H
,
Spieth
J
,
Hillier
LW
,
Richards
S
, et al.
2005
.
Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes
.
Genome Res
.
15
:
1034
1050
.

Smith
O
,
Nicholson
WV
,
Kistler
L
,
Mace
E
,
Clapham
A
,
Rose
P
,
Stevens
C
,
Ware
R
,
Samavedam
S
,
Barker
G
, et al.
2019
.
A domestication history of dynamic adaptation and genomic deterioration in sorghum
.
Nat Plants
.
5
:
369
379
.

Stetter
MG
.
2020
.
Limits and constraints to crop domestication
.
Am J Bot
.
107
:
1617
1621
.

Stetter
MG
,
Müller
T
,
Schmid
KJ
.
2017
.
Genomic and phenotypic evidence for an incomplete domestication of South American grain amaranth (Amaranthus caudatus)
.
Mol Ecol
.
26
:
871
886
.

Stetter
MG
,
Vidal-Villarejo
M
,
Schmid
KJ
.
2020
.
Parallel seed color adaptation during multiple domestication attempts of an ancient new world grain
.
Mol Biol Evol
.
37
:
1407
1419
.

Tenaillon
MI
,
Burban
E
,
Huynh
S
,
Wojcik
A
,
Thuillet
A-C
,
Manicacci
D
,
Gérard
PR
,
Alix
K
,
Belcram
H
,
Cornille
A
, et al.
2023
.
Crop domestication as a step towards reproductive isolation
.
Am J Bot
.
110
(
7
):
e16173
.

Tigano
A
,
Friesen
VL
.
2016
.
Genomics of local adaptation with gene flow
.
Mol Ecol
.
25
:
2144
2164
.

van Heerwaarden
J
,
Doebley
J
,
Briggs
WH
,
Glaubitz
JC
,
Goodman
MM
,
de Jesus Sanchez Gonzalez
J
,
Ross-Ibarra
J
.
2011
.
Genetic signals of origin, spread, and introgression in a large sample of maize landraces
.
Proc Natl Acad Sci U S A
.
108
:
1088
1092
.

Vasimuddin
M
,
Misra
S
,
Li
H
,
Aluru
S
.
2019
.
Efficient architecture-aware acceleration of BWA-MEM for multicore systems
. In
2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
. IEEE. p.
314
324
.

Wang
L
,
Beissinger
TM
,
Lorant
A
,
Ross-Ibarra
C
,
Ross-Ibarra
J
,
Hufford
MB
.
2017
.
The interplay of demography and selection during maize domestication and expansion
.
Genome Biol
.
18
:
1
13
.

Wang
L
,
Josephs
EB
,
Lee
KM
,
Roberts
LM
,
Rellán-Álvarez
R
,
Ross-Ibarra
J
,
Hufford
MB
.
2021
.
Molecular parallelism underlies convergent highland adaptation of maize landraces
.
Mol Biol Evol
.
38
:
3567
3580
.

Wu
Y
,
Johnson
L
,
Song
B
,
Romay
C
,
Stitzer
M
,
Siepel
A
,
Buckler
E
,
Scheben
A
.
2022
.
A multiple alignment workflow shows the effect of repeat masking and parameter tuning on alignment in plants
.
Plant Genome
.
15
:
e20204
.

Wu
C-I
,
Ting
C-T
.
2004
.
Genes and speciation
.
Nat Rev Genet
.
5
:
114
122
.

Xu
J-L
,
Wang
J-M
,
Sun
Y-Q
,
Wei
L-J
,
Luo
R-T
,
Zhang
M-X
,
Li
Z-K
.
2006
.
Heavy genetic load associated with the subspecific differentiation of japonica rice (Oryza sativa ssp. japonica L.)
.
J Exp Bot
.
57
:
2815
2824
.

Yang
C-c
,
Kawahara
Y
,
Mizuno
H
,
Wu
J
,
Matsumoto
T
,
Itoh
T
.
2012
.
Independent domestication of asian rice followed by gene flow from japonica to indica
.
Mol Biol Evol
.
29
:
1471
1479
.

Yang
N
,
Wang
Y
,
Liu
X
,
Jin
M
,
Vallebueno-Estrada
M
, et al.
2023
.
2023 Two teosintes made modern maize
.
bioRxiv 2023-01
.

Author notes

José Gonçalves-Dias, Akanksha Singh and Markus G Stetter contributed equally to this work.

This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected]
Associate Editor: Camilla Whittington
Camilla Whittington
Associate Editor
Search for other works by this author on:

Supplementary data