-
PDF
- Split View
-
Views
-
Cite
Cite
Yao Li, Xingwang Zhang, Lu Wang, Victoria L Sork, Lingfeng Mao, Yanming Fang, Influence of Pliocene and Pleistocene climates on hybridization patterns between two closely related oak species in China, Annals of Botany, Volume 129, Issue 2, 1 February 2022, Pages 231–245, https://doi.org/10.1093/aob/mcab140
Close - Share Icon Share
Abstract
Contemporary patterns of genetic admixture reflect imprints of both ancient and recent gene flow, which can provide us with valuable information on hybridization history in response to palaeoclimate change. Here, we examine the relationships between present admixture patterns and past climatic niche suitability of two East Asian Cerris oaks (Quercus acutissima and Q. chenii) to test the hypothesis that the mid-Pliocene warm climate promoted while the Pleistocene cool climate limited hybridization among local closely related taxa.
We analyse genetic variation at seven nuclear microsatellites (1111 individuals) and three chloroplast intergenic spacers (576 individuals) to determine the present admixture pattern and ancient hybridization history. We apply an information-theoretic model selection approach to explore the associations of genetic admixture degree with past climatic niche suitability at multiple spatial scales.
More than 70 % of the hybrids determined by Bayesian clustering analysis and more than 90 % of the individuals with locally shared chloroplast haplotypes are concentrated within a mid-Pliocene contact zone between ~30°N and 35°N. Climatic niche suitabilities for Q. chenii during the mid-Pliocene Warm Period [mPWP, ~3.264–3.025 million years ago (mya)] and during the Last Glacial Maximum (LGM, ~0.022 mya) best explain the admixture patterns across all Q. acutissima populations and across those within the ancient contact zone, respectively.
Our results highlight that palaeoclimate change shapes present admixture patterns by influencing the extent of historical range overlap. Specifically, the mid-Pliocene warm climate promoted ancient contact, allowing widespread hybridization throughout central China. In contrast, the Pleistocene cool climate caused the local extinction of Q. chenii, reducing the probability of interspecific gene flow in most areas except those sites having a high level of ecological stability.
Introduction
Contemporary patterns of genetic admixture reflect imprints of both ancient and recent hybridization, which can provide us with valuable information on the temporal and spatial dynamics of a hybrid zone in response to climate change (Buggs, 2007; Hewitt, 2011; Taylor et al., 2014, 2015; Gompert et al., 2017; Ryan et al., 2018). At a geological time scale, sister species that have diverged for a long time may have experienced dramatic range shifts (Hewitt, 2004). They could produce a hybrid zone that potentially has moved hundreds to thousands of kilometres over its lifetime (Wang et al., 2014; Zohren et al., 2016; Wielstra, 2019). The historical movement of a hybrid zone would leave behind a molecular footprint because selectively neutral alleles are expected to introgress from a receding species into an advancing species (Currat et al., 2008; Wielstra et al., 2017). On the other hand, a particular hybrid zone may have formed more than once as climatic fluctuations could lead to repeated range contractions and expansions of sister lineages, causing their ranges to come into contact many times (Hewitt, 2004, 2011). The signatures of ancient introgression among species that have coexisted in the past can still be detected even if the ancestors or parental species have suffered local extinction (Kim et al., 2018; Ortego et al., 2018; Crowl et al., 2020). Given that present admixture patterns contain memories of early times that were particularly affected by range dynamics, it is possible to use such information, together with other lines of evidence from fossil records and species distribution modelling, to reconstruct the hybridization history among closely related species within the context of palaeoclimate change (Hewitt, 2011; Gompert et al., 2017; Wielstra, 2019).
Oaks (Quercus) are a classic system for studying natural hybridization and introgression in plants because they often exchange alleles without disrupting species integrity (Cannon and Petit, 2020). Ongoing hybridization has been widely detected within most sections of the genus, such as sect. Quercus (Lepais et al., 2009; Zeng et al., 2011; Ortego et al., 2014; Burge et al., 2019), sect. Lobatae (Dodd and Afzal-Rafii, 2004; Peñaloza-Ramírez et al., 2010; Moran et al., 2012), sect. Ilex (Neophytou et al., 2011) and sect. Cyclobalanopsis (An et al., 2017). Genomic signals of historical hybridization can be preserved over long periods of time, which can be detected in the genomes of modern oak taxa (e.g. Crowl et al., 2020). For oaks on different continents, the history of ancient gene exchange differs depending on both climatic and topographic contexts that influence range dynamics and demographic processes (Kremer and Hipp, 2020). In Europe and eastern North America, the temperate white oaks experienced extensive introgression when postglacial recolonization restored genetic contacts among populations from isolated refugia (Leroy et al., 2017, 2020; Petit et al., 2002a). This process created opportunities for cytoplasmic capture, allowing the range-wide spread of a small number of shared chloroplast haplotypes (Whittemore and Schaal, 1991; Petit et al., 2002b). In contrast, the western North American oaks maintained a relatively stable distribution and only experienced local migration across glacial–interglacial cycles, thus leaving a patchy distribution of a diverse set of haplotypes (Grivet et al., 2006; Gugger et al., 2013). The stable climate also enabled the long-term persistence of populations and preservation of hybridization events. Numerous cases reported here come from allopatric species that have coexisted in the past, associated with range shifts occurring at a much early time (e.g. the Pliocene or Pleistocene; Maze, 1968; Manos et al., 1999; McVay et al., 2017; Kim et al., 2018; Ortego et al., 2018).
East Asia is one of the global hotspots of oak diversity, including more than 100 species placed into four sections (Kremer and Hipp, 2020). Previous studies have stressed the role of ancient and recent gene flow in shaping the present admixture pattern among native oaks (Zeng et al., 2011; Tamaki and Okada, 2014; An et al., 2017), but the impact of palaeoclimate on hybridization history remains unclear. Compared with Europe and North America, East Asia was less affected by the Pleistocene glaciations (Feng et al., 2016). The relatively stable climate, interacting with complex landscapes, has contributed to the long-term in-situ survival of local populations across multiple refugia (Zeng et al., 2015; Zhang et al., 2015, 2018; Li et al., 2019a). Considering this history, we expect that the current admixture patterns among East Asian oaks would reflect the genetic imprints of ancient hybridization affected by historical climate change, probably before the Quaternary.
In this study, we reconstruct the hybridization history between two East Asian Cerris oaks, Quercus acutissima and Q. chenii, to assess the influence of palaeoclimate on present admixture patterns. We chose to focus on these particular taxa because they are sister species estimated to have diverged ~30–10 million years ago (mya) (Simeone et al., 2018; Hipp et al., 2020). Such a long evolutionary history makes it feasible to examine the impact of pre-Quaternary climate change. Currently, Q. acutissima is among the dominant tree species of the East Asian warm temperate deciduous forests (Fujiwara and Harada, 2015), while Q. chenii exhibits a much narrower distribution in East China. Diagnostic morphological features include leaf and acorn size. Nevertheless, a variety of Q. chenii showing an intermediate phenotype was identified in their overlapping ranges, suggesting that putative hybrids may occur (Liu, 1992).
Fossil evidence indicates that Q. chenii was a common tree species during the mid-Pliocene Warm Period (mPWP, ~3.264–3.025 mya) at more northerly latitudes than its current range limit. However, it later suffered local extinction as a result of both Pleistocene cooling and strengthening of the East Asian winter monsoon (Momohara, 2010, 2016). The range dynamics of Q. chenii would affect the hybridization history by triggering contact zone shifts. Here, we predict that the mid-Pliocene warm climate promoted while the Pleistocene cool climate limited the hybridization by influencing the extent of historical range overlap. To test this hypothesis, we first analyse the genetic variation found in bi-parentally inherited nuclear microsatellites (nSSRs) and maternally inherited chloroplast (cp) DNA sequences to determine the putative hybrids and present admixture pattern. Second, we combine the evidence from fossil records, demographic inference and ecological niche modelling (ENM) to explore the role of ancient gene flow in shaping the hybridization history. Finally, we apply an information-theoretic model selection approach to examine the associations between genetic admixture degree and past climatic niche suitability. If ancient hybridization has left imprints on admixture patterns, we expect to detect a positive association between past climatic niche suitability and degree of genetic admixture because hybrids are more likely to occur in habitats that have been suitable for parental species (Ortego et al., 2014).
MATERIAL AND METHODS
Sampling
Between May 2014 and September 2017, we sampled 30 populations of Quercus acutissima (n = 696 individuals) and 18 populations of Quercus chenii (n = 415 individuals), encompassing the majority of the distributions of both species in China (Fig. 1A and Supplementary Data Table S1). Leaf tissues were quickly dried with silica gel and stored at room temperature in the laboratory. Spatially explicit information was recorded for each population using a handheld GPS unit (Magellan, USA). Voucher specimens of all individuals were deposited in the Herbarium of Nanjing Forestry University (HNFU).
Results of STRUCTURE, PCA, NewHybrids and INTROGRESSION analyses based on the genetic variation at seven nuclear microsatellite loci. (A) Geographical distributions of two genetic clusters corresponding to Quercus acutissima (blue) and Q. chenii (orange) as inferred by STRUCTURE analyses. Circle area is proportional to the sample size. (B) Number of putative hybrids at each site. In A and B, orange and green areas indicate the climatically suitable areas for Q. chenii during the mid-Pliocene Warm Period (mPWP) and during the present, respectively. They were created from the logistic outputs of MAXENT using the maximum sensitivity plus specificity threshold (Gugger et al., 2013). Population codes (Q. acutissima: 1–30; Q. chenii: 31–48) are described in Supplementary Data Table S1. *Populations inside the ancient contact zone. (C) Histogram of an individual’s probability of assignment to Q. acutissima (blue) and Q. chenii (orange) in STRUCTURE analyses. Each vertical bar represents one individual. (D) PCA result for Q. acutissima, Q. chenii and their hybrids. (E) Posterior probability (PP) assigned to Q. acutissima for parental individuals of Q. acutissima (group PA, n = 642) and hybrids morphologically assigned to Q. acutissima (group HA, n = 54), and to Q. chenii for parental individuals of Q. chenii (group PC, n = 395) and hybrids morphologically assigned to Q. chenii (group HC, n = 20). The PP values are averaged over ten independent runs of NewHybrids analyses. (F) Maximum-likelihood hybrid index (H) estimated by the R package introgression for the PA, HA, HC and PC groups.
Ecological niche modelling
We used MAXENT 3.4.1 (Phillips et al., 2018) to estimate the distributions of Q. acutissima and Q. chenii for four periods whose climate data are available: the present, Last Glacial Maximum (LGM, ~ 0.022 mya), Marine Isotope Stage 19 (MIS19, ~ 0.787 mya, an analogue for the Holocene) in the Pleistocene and mPWP. Species occurrence records were collected from field observations as well as from specimen databases (Chinese Virtual Herbarium, http://www.cvh.ac.cn/; Specimen Resources Sharing Platforms for Education, http://mnh.scu.edu.cn/; Plant Photo Bank of China, http://ppbc.iplant.cn/). After removing duplicates and retaining only one record among all locations falling within the same 2.5′ × 2.5′ grid, a total of 457 and 54 presence points were obtained for Q. acutissima and Q. chenii, respectively. We selected four bioclimatic variables that show low to moderate correlations (| r | < 0.7) to prevent potential over-fitting: annual mean temperature (bio1), temperature seasonality (bio4), annual precipitation (bio12) and precipitation seasonality (bio15). Seasonal temperature extremes were not chosen because they are not available for the mPWP and MIS19. Environmental layers of the four variables were retrieved from the WorldClim 1.4 (Hijmans et al., 2005) and PaleoClim (Brown et al., 2018) databases at a grid resolution of 2.5 arc-minutes. Palaeoclimate data were generated using the downscaled outputs of general circulation models, including the Community Climate System Model version 4 (CCSM4) and the Hadley Centre Coupled Model version 3 (HadCM3; Hill, 2015). All the layers were clipped to the same spatial range (15°–45°N, 90°–145°E) using the package raster 2.8-19 (Hijmans, 2019) in R (R Core Team, 2018). ENM analyses were performed with default settings. Model performance was evaluated using the area under the receiver operating characteristic (ROC) curve (AUC) estimated by 10-fold cross-validation. Niche suitability maps were created based on the logistic outputs of MAXENT.
DNA extraction, sequencing and microsatellite genotyping
Total genomic DNA was extracted from 30 mg leaf tissue of each individual using the Tiangen Plant Genomic DNA Kit (Tiangen, Beijing, China). Three chloroplast intergenic spacers (atpB-rbcL, psbA-trnH, trnS-trnG; Supplementary Data Table S2) were sequenced for 401 individuals of Q. acutissima and 175 individuals of Q. chenii following the protocols of Zhang et al. (2015). All the 1111 samples were genotyped using seven nuclear microsatellite markers (Supplementary Data Table S3) previously developed for other oak species. Polymerase chain reactions (PCRs) and genotyping were performed as described by Zhang et al. (2018). Deviation from Hardy–Weinberg equilibrium (HWE), linkage disequilibrium between each locus pair, and the presence of null alleles was tested for each population following Zhang et al. (2018).
Nuclear microsatellite diversity and differentiation
We estimated allelic richness (AR) and genetic diversity within populations (HS) for each population using Fstat 2.9.3.2 (Goudet, 2001). We calculated FST and the standardized measure G′ ST to determine the levels of genetic differentiation across all populations with MSA 4.05 (Dieringer and Schlötterer, 2003). Genetic statistics AR, HS and FST were compared between Q. acutissima and Q. chenii using 10 000 permutations with Fstat. We also performed an analysis of molecular variance (AMOVA) to examine the patterns of genetic variance partitioned among species, among populations within species and within populations using Arlequin 3.5 (Excoffier and Lischer, 2010). The significance of fixation indices (FCT, FSC and FST) was tested by 10 000 permutations.
Genetic structure and hybrid identification
To detect the underlying genetic structure and potential hybrids between the two species, we employed Bayesian clustering analyses. The analyses were performed using an admixture model with correlated allele frequencies in the program STRUCTURE 2.3.4 (Pritchard et al., 2000). Ten independent runs were conducted for each number of clusters (K) varying from 1 to 10. In each run, we used 1 000 000 Markov chain Monte Carlo (MCMC) iterations following a burn-in step of 100 000 replications. Both log probabilities of X|K (Pritchard et al., 2000) and the ΔK method (Evanno et al., 2005) were implemented in Structure Harvester (Earl and vonHoldt, 2012) to determine the best-supported number of clusters (Janes et al., 2017). Optimal alignments of inferred clusters across different replicates and different values of K were identified by the CLUMPAK pipeline (Kopelman et al., 2015). We expected that the most likely number of clusters was K = 2, and the two clusters corresponded to Q. acutissima and Q. chenii. In this case, individuals with a probability of membership (Q) to the cluster of the morphologically inferred parental species (i.e. QACU for Q. acutissima, QCHE for Q. chenii) ≥ 0.9 were classified as parental individuals, while those with Q < 0.9 were considered to be hybrids (Burgarella et al., 2009; Lyu et al., 2018).
Complementary to Bayesian clustering analyses, we applied three different methods to validate the results of STRUCTURE. First, we performed principal component analysis (PCA) using the R package ade4 1.7-13 (Dray and Dufour, 2007). PC scores on the first axis were compared between parental individuals and hybrids using the pairwise Wilcoxon rank-sum test. P-values were adjusted through the Holm–Bonferroni method (Holm, 1979). Second, we employed a Bayesian statistical approach implemented in NewHybrids 1.1 beta (Anderson and Thompson, 2002) to distinguish parental individuals from hybrids. According to the 3rd criterion of Burgarella et al. (2009), only individuals with posterior probability (PP) assigned to the parental species class ≥0.9 were classified as parental individuals. This analysis was repeated ten times with a 50 000-iteration burn-in followed by 150 000 steps. The Jeffreys-like priors were selected for both the mixing proportions and the allele frequencies. Finally, we calculated the maximum-likelihood hybrid index (H) for each individual using the R package introgress 1.2.3 (Gompert and Buerkle, 2010). This index is an estimate of the proportion of alleles that were inherited from one of the two parental species (Buerkle, 2005). Parental groups identified by the Bayesian clustering analyses were specified as parental populations. Members of these groups should have a value of H near 0 or 1.
Gene flow and demographic history
We inferred the temporal pattern of interspecific gene flow under an approximate Bayesian computation (ABC) framework. Four distinct demographic models were statistically compared: strict isolation (SI), isolation with migration (IM), ancient migration (AM) and secondary contact (SC) (Fig. 2). The SI model assumed that no gene flow occurred after the initial divergence of the two species at TDIV, whereas the IM model assumed that they continuously exchanged alleles after TDIV. Under the AM model, gene flow occurred after TDIV but stopped at TAM. Under the SC model, gene flow occurred after TSC following an early stage of strict isolation. For each model, we assumed that both the ancestral and the daughter populations had constant effective population sizes (NACU, NCHE, NANC) sampled from a log-uniform distribution between 10 and 1 000 000. A previous oak phylogenomic study has indicated that Q. acutissima and Q. chenii diverged ~30–10 mya (Hipp et al., 2020). Given an average generation time of 100 years for both species (Zeng et al., 2015), we used a uniform distribution from 100 000 to 300 000 for TDIV, and a log-uniform distribution from 1 to 300 000 for TAM (<TDIV) and TSC (<TDIV). The effective migration rate (4Nem) was drawn from a uniform distribution between 0 and 500.
Four demographic scenarios tested by approximate Bayesian computation (ABC): strict isolation (SI), isolation with migration (IM), ancient migration (AM) and secondary contact (SC). NACU, NCHE and NANC are effective population sizes of Quercus acutissima, Q. chenii and their ancestor, respectively. TDIV is the number of generations since the initial divergence. TAM is the number of generations since the two species stopped exchanging alleles. TSC is the number of generations since the two species began exchanging alleles. Dark grey arrows denote the effective migration rate from Q. acutissima to Q. chenii (4NCHEmA→C) and that in the opposite direction (4NACUmC→A).
We used a generalized stepwise mutation (GSM) model for the seven microsatellite loci (Estoup et al., 2002). This model included two parameters, mutation rate (μ) and geometric parameter (PGSM). We set mean μ across loci to 5 × 10−4 mutations per locus per generation (Aoki et al., 2019), and sampled individual locus μ from a Gamma distribution with shape and rate parameters. The value of shape was drawn from a uniform distribution between 0.5 and 2, and the rate parameter was calculated by shape/mean μ. We used a uniform distribution between 0 and 0.8 for mean PGSM and assumed individual locus PGSM followed a Beta distribution β(a, b), where a = 0.5 + 199 × mean PGSM and b = a × (1 − mean PGSM)/mean PGSM (Excoffier et al., 2005).
We simulated genotype data for 100 diploid individuals per population to reduce the computational costs. Model parameter values were sampled from the prior distributions using ABCsampler in ABCtoolbox (Wegmann et al., 2010). One million replicates were run for each scenario with fastsimcoal2 2.6.0.3 (Excoffier and Foll, 2011; Excoffier et al., 2013). Seventeen summary statistics were computed for both observed and simulated datasets in Arlsumstat 3.5.2 (Excoffier and Lischer, 2010). The observed dataset included 100 individuals randomly subsampled for each population, with sampling biases checked following Aoki et al. (2019). We employed both the multinomial logistic regression (MLR) approach in the R package abc 2.1 (Csilléry et al., 2012) and the random forest (RF) approach in the R package abcrf 1.8.1 (Pudlo et al., 2016) to estimate the posterior probability and classification error rate of each model. The goodness of fit (GoF) was assessed using the gfit function in the R package abc. The 1 % (i.e. 10 000) simulated datasets closest to the observed data were chosen to infer the posterior distributions of model parameters through a local linear regression approach implemented in the R package abc. Finally, posterior predictive checks were conducted by comparing the summary statistics of the observed dataset with those calculated for the 1000 new simulations generated from the model’s posterior (Csilléry et al., 2012).
Genetic admixture pattern
We used an information-theoretic model selection approach to assess the relationships between genetic admixture and three sets of explanatory variables for all Q. acutissima populations and those within the ancient contact zone. The probability of membership to the genetic cluster of Q. chenii (QCHE) was used to measure the degree of admixture. The predictor matrices included: (1) presence of Q. chenii in a given locality (Presence; 0/1); (2) distance to the nearest occurrence record of Q. chenii; and (3) climatic niche suitability (S) for both parental species during the present, LGM, MIS19 and mPWP. We constructed a series of general linear models (GLMs) with a normal error structure and an identity link function. Each model included one or two predictors and excluded those variables showing strong multicollinearity problems (variance inflation factor > 2). We selected among competing models using Akaike’s information criterion corrected for small sample size (AICc; Burnham and Anderson, 1998). Models with the lowest AICc and those with ΔAICc (AICc of a model − minimum AICc) ≤ 2 were considered equally plausible. We computed model-averaged estimates for each variable when multiple candidate models were supported. These analyses were performed at both population and individual levels using the R package AICcmodavg 2.2-2 (Mazerolle, 2019). To assess the modality of the frequency distributions of QACU values, we conducted Hartigans’ dip test (Hartigan and Hartigan, 1985) using the R package diptest 0.75-7 (Martin, 2016).
Chloroplast DNA sequence analyses
Sequences of the three cpDNA fragments were proofread, aligned and concatenated using BioEdit 7.2.5 (Hall, 1999) and FasParser 2.1.1 (Sun, 2017). Indels and inversions were coded as binary characters (Simmons and Ochoterena, 2000; Young and Healy, 2003). Chloroplast haplotypes were determined by DnaSP 5.10 (Librado and Rozas, 2009). A median-joining network was constructed with PopART 1.7 (Leigh and Bryant, 2015). Genetic diversity statistics were estimated for each species (average genetic diversity within populations, hS; total genetic diversity, hT) and each population (haplotype diversity, Hd) using Permut 2.0 (Pons and Petit, 1996) and DnaSP. The phylogeographical structure was assessed by testing the significance of the difference between two measures of population differentiation (GST and NST) with 10 000 permutations in Permut. AMOVA was performed using Arlequin with 10 000 permutations.
RESULTS
Ecological niche modelling
The mean AUC values (± s.d.) on the test dataset indicated a good fit of ENMs to the observed occurrence data of both species (Q. acutissima: 0.849 ± 0.021; Q. chenii: 0.968 ± 0.014; Supplementary Data Fig. S1). The models showed that Q. acutissima has had a relatively stable distribution since the mPWP, while Q. chenii has experienced more pronounced range shifts (Fig. 3). Furthermore, the two species have had a mid-Pliocene contact zone beyond the current northern range limit of Q. chenii (between ~30°N and 35°N; Fig. 3). Eleven Q. acutissima populations and nine Q. chenii populations were found to be within this region (Fig. 1A), most of which showed moderate to high mPWP climatic suitability for both species (Q. acutissima: SmPWP = 0.200 ± 0.089; Q. chenii: SmPWP = 0.806 ± 0.170). However, along with the following range contraction and southward retreat of Q. chenii (Fig. 3), this ancient contact zone may have disappeared during the cooler period of the Early Pleistocene.
MAXENT prediction maps showing the climatically suitable areas of Quercus acutissima (A–D) and Q. chenii (E–H) during the mid-Pliocene Warm Period (mPWP), Marine Isotope Stage 19 (MIS19) in the Pleistocene, Last Glacial Maximum (LGM) and the present. Plus signs represent the sampling sites of Q. chenii. Solid and open dots represent the sampling sites of Q. acutissima inside and outside the ancient contact zone, respectively.
Nuclear microsatellite diversity and differentiation
The frequency of null alleles was estimated to be lower than the threshold of 0.1 at each of the seven loci across populations (Supplementary Data Table S3). No significant linkage disequilibrium was detected for all locus pairs in each population (P < 0.05 after Bonferroni correction), and no evidence of deviation from HWE was observed for each population of both species (P < 0.05 after Bonferroni correction; Supplementary Data Table S4). Higher within-population genetic diversity was detected in Q. acutissima compared to Q. chenii (P = 0.001 for both AR and HS; Supplementary Data Table S5). Intraspecific genetic differentiation was significant but lower for both species (Q. acutissima: FST = 0.057, P < 0.001; Q. chenii: FST = 0.057, P < 0.001), and the levels did not differ substantially between the two taxa (P = 0.983; Supplementary Data Table S5). Interspecific differentiation was moderate and significant (FCT = 0.146, P < 0.00001), with 14.6 % of the variation partitioned among species, and the most (80.5 %) partitioned within populations (Supplementary Data Table S6).
Genetic structure and hybrid identification
Bayesian clustering analyses indicated an optimal value of K = 2 (Supplementary Data Fig. S2), separating all populations into two distinct genetic clusters congruent with the observed phenotypes of Q. acutissima and Q. chenii (Fig. 1C). Considering a threshold value of Q = 0.9, we identified 74 putative hybrids of which 54 were morphologically assigned to Q. acutissima (i.e. group HA) and 20 were classified as Q. chenii (i.e. group HC) (Supplementary Data Table S4). Although hybrids were widely detected in 28 populations, nearly 75.9 % of those in group HA (41/54) and 55.0 % of those in HC (11/20) were found to occur in the ancient contact zone recovered by the ENM analyses (Fig. 1B). A much higher proportion of hybrids was detected inside the contact zone than outside this zone for Q. acutissima (inside: 14.7 %, 41/279; outside: 3.1 %, 13/417), but not for Q. chenii (inside: 5.3 %, 11/209; outside: 4.4 %, 9/206). Groups HA and HC showed similar levels of genetic admixture (mean ± s.d., HA: QCHE = 0.290 ± 0.214, HC: QACU = 0.250 ± 0.142; Wilcoxon rank-sum test: W = 565, P = 0.766). Only five individuals (four in HA and one in HC) exhibited ~50 % admixture (0.4 < Q < 0.6) for both parental species.
The results of hybrid identification were supported by three different analyses. First, PCA separated all the samples into two distinct groups corresponding to the two parental species (Fig. 1D). No overlap along PC1 was detected between parental individuals of Q. acutissima (i.e. group PA, n = 642 individuals) and parental individuals of Q. chenii (i.e. group PC, n = 395 individuals). Putative hybrids in groups HA and HC were characterized by intermediate scores along PC1 (pairwise Wilcoxon rank-sum test: all P values < 0.001 after Holm–Bonferroni correction; Supplementary Data Table S7). Second, all the samples in group HC and 44 individuals (81.5 %) in group HA were classified as hybrids (PP assigned to the corresponding parental species class <0.9) in the NewHybrids analysis. Although ten of those in HA were identified as parental individuals of Q. acutissima, the PP values were found to be close to the threshold value of 0.9 (0.901–0.939). Almost all the samples in PA (99.8 %) and PC (98.2 %) were assigned to the parental groups successfully (Fig. 1E). Finally, the INTROGRESS analysis showed that compared with parental individuals of Q. acutissima (mean ± s.d., H = 0.034 ± 0.068) and parental individuals of Q. chenii (H = 0.971 ± 0.058), members in both hybrid groups exhibited intermediate values of the maximum-likelihood hybrid index (HA: H = 0.383 ± 0.141, HC: H = 0.654 ± 0.097) (Fig. 1F). All pairwise comparisons for H were highly significant (Wilcoxon rank-sum test: all P values < 0.001 after the Holm–Bonferroni correction; Supplementary Data Table S7), supporting the separation among the parental and hybrid individuals identified by Bayesian clustering analyses.
Gene flow and demographic history
ABC analyses favoured the AM model as it received the highest posterior probability when using both the MLR (0.950) and RF (0.791) approaches to select the best model (Supplementary Data Tables S8 and S9). Cross-validations indicated that the overall prior error rate was moderate (MLR: 0.334, RF: 0.332), while the risk of selecting the AM model instead of others was much lower (average type II error, MLR: 0.052, RF: 0.054; Supplementary Data Table S10). A comparison of variable importance showed that FST provided the most information in distinguishing the various models (Supplementary Data Fig. S3). This statistic was more easily underestimated under the SC and IM models, but overestimated under the SI model (Supplementary Data Fig. S4). Posterior predictive checks confirmed that the observed values of all the 17 summary statistics can be reproduced well under the AM scenario (Supplementary Data Figs S5–S8). The GoF test further demonstrated the ability of this model to generate simulations close to the observed data (P = 0.990; Supplementary Data Fig. S9). Under the AM model, all the posterior distributions of NACU, NCHE, TAM and mean PGSM contained a single peak different from the prior distributions (Supplementary Data Fig. S10). Posterior modes (95 % HPD) of NACU and NCHE were 4688 (676–19 324) and 6898 (261–16 642), respectively. The posterior mode (95 % HPD) of TAM was 7831 (42–51 749) generations ago. The posterior mode (95 % HPD) of mean PGSM was 0.135 (0.021–0.354) (Table 1). Posterior distributions of other parameters (NANC, TDIV, 4NCHEmA→C, 4NACUmC→A and shape) were found to be poorly differentiated from the prior distributions (Supplementary Data Fig. S10), indicating that the current microsatellite dataset cannot provide enough information for these parameters.
Prior and posterior distributions of model parameters under the ancient migration (AM) scenario.
| Parameter . | Prior [type (min, max)] . | Posterior . | . | . | . | . |
|---|---|---|---|---|---|---|
| . | . | Median . | Mean . | Mode . | 95 % HPD . | . |
| . | . | . | . | . | Lower . | Upper . |
| NACU | loguniform (10, 1000 000) | 4440.17 | 4223.77 | 4688.13 | 676.24 | 19 324.13 |
| NCHE | loguniform (10, 1000 000) | 5025.74 | 3921.93 | 6897.63 | 260.92 | 16 641.79 |
| NANC | loguniform (10, 1000 000) | 2926.85 | 2966.18 | 16 084.22 | 13.41 | 738 074.16 |
| TDIV | uniform (100 000, 300 000) | 204 728.78 | 203 128.13 | 281 470.94 | 104 866.99 | 296 033.33 |
| TAM | loguniform (1, 300 000) | 3390.00 | 2414.35 | 7830.69 | 41.85 | 51 748.77 |
| 4NCHEmA→C | uniform (0, 500) | 215.43 | 226.59 | 38.31 | 8.05 | 485.73 |
| 4NACUmC→A | uniform (0, 500) | 232.93 | 238.64 | 38.40 | 7.36 | 491.02 |
| mean PGSM | uniform (0, 0.8) | 0.17 | 0.17 | 0.13 | 0.02 | 0.35 |
| shape | uniform (0.5, 2) | 1.10 | 1.15 | 0.92 | 0.55 | 1.94 |
| Parameter . | Prior [type (min, max)] . | Posterior . | . | . | . | . |
|---|---|---|---|---|---|---|
| . | . | Median . | Mean . | Mode . | 95 % HPD . | . |
| . | . | . | . | . | Lower . | Upper . |
| NACU | loguniform (10, 1000 000) | 4440.17 | 4223.77 | 4688.13 | 676.24 | 19 324.13 |
| NCHE | loguniform (10, 1000 000) | 5025.74 | 3921.93 | 6897.63 | 260.92 | 16 641.79 |
| NANC | loguniform (10, 1000 000) | 2926.85 | 2966.18 | 16 084.22 | 13.41 | 738 074.16 |
| TDIV | uniform (100 000, 300 000) | 204 728.78 | 203 128.13 | 281 470.94 | 104 866.99 | 296 033.33 |
| TAM | loguniform (1, 300 000) | 3390.00 | 2414.35 | 7830.69 | 41.85 | 51 748.77 |
| 4NCHEmA→C | uniform (0, 500) | 215.43 | 226.59 | 38.31 | 8.05 | 485.73 |
| 4NACUmC→A | uniform (0, 500) | 232.93 | 238.64 | 38.40 | 7.36 | 491.02 |
| mean PGSM | uniform (0, 0.8) | 0.17 | 0.17 | 0.13 | 0.02 | 0.35 |
| shape | uniform (0.5, 2) | 1.10 | 1.15 | 0.92 | 0.55 | 1.94 |
Under the AM model, the initial divergence of Quercus acutissima and Q. chenii occurred at TDIV. The two newly formed populations continued to exchange alleles until time TAM (TAM < TDIV). NACU, NCHE and NANC are effective population sizes of Q. acutissima, Q. chenii and the ancestral population, respectively. 4NCHEmA→C and 4NACUmC→A are effective migration rates between the two species. Mean PGSM and shape are parameters of the generalized stepwise mutation (GSM) model. Note that the posterior estimate values were converted from log to linear scale.
Prior and posterior distributions of model parameters under the ancient migration (AM) scenario.
| Parameter . | Prior [type (min, max)] . | Posterior . | . | . | . | . |
|---|---|---|---|---|---|---|
| . | . | Median . | Mean . | Mode . | 95 % HPD . | . |
| . | . | . | . | . | Lower . | Upper . |
| NACU | loguniform (10, 1000 000) | 4440.17 | 4223.77 | 4688.13 | 676.24 | 19 324.13 |
| NCHE | loguniform (10, 1000 000) | 5025.74 | 3921.93 | 6897.63 | 260.92 | 16 641.79 |
| NANC | loguniform (10, 1000 000) | 2926.85 | 2966.18 | 16 084.22 | 13.41 | 738 074.16 |
| TDIV | uniform (100 000, 300 000) | 204 728.78 | 203 128.13 | 281 470.94 | 104 866.99 | 296 033.33 |
| TAM | loguniform (1, 300 000) | 3390.00 | 2414.35 | 7830.69 | 41.85 | 51 748.77 |
| 4NCHEmA→C | uniform (0, 500) | 215.43 | 226.59 | 38.31 | 8.05 | 485.73 |
| 4NACUmC→A | uniform (0, 500) | 232.93 | 238.64 | 38.40 | 7.36 | 491.02 |
| mean PGSM | uniform (0, 0.8) | 0.17 | 0.17 | 0.13 | 0.02 | 0.35 |
| shape | uniform (0.5, 2) | 1.10 | 1.15 | 0.92 | 0.55 | 1.94 |
| Parameter . | Prior [type (min, max)] . | Posterior . | . | . | . | . |
|---|---|---|---|---|---|---|
| . | . | Median . | Mean . | Mode . | 95 % HPD . | . |
| . | . | . | . | . | Lower . | Upper . |
| NACU | loguniform (10, 1000 000) | 4440.17 | 4223.77 | 4688.13 | 676.24 | 19 324.13 |
| NCHE | loguniform (10, 1000 000) | 5025.74 | 3921.93 | 6897.63 | 260.92 | 16 641.79 |
| NANC | loguniform (10, 1000 000) | 2926.85 | 2966.18 | 16 084.22 | 13.41 | 738 074.16 |
| TDIV | uniform (100 000, 300 000) | 204 728.78 | 203 128.13 | 281 470.94 | 104 866.99 | 296 033.33 |
| TAM | loguniform (1, 300 000) | 3390.00 | 2414.35 | 7830.69 | 41.85 | 51 748.77 |
| 4NCHEmA→C | uniform (0, 500) | 215.43 | 226.59 | 38.31 | 8.05 | 485.73 |
| 4NACUmC→A | uniform (0, 500) | 232.93 | 238.64 | 38.40 | 7.36 | 491.02 |
| mean PGSM | uniform (0, 0.8) | 0.17 | 0.17 | 0.13 | 0.02 | 0.35 |
| shape | uniform (0.5, 2) | 1.10 | 1.15 | 0.92 | 0.55 | 1.94 |
Under the AM model, the initial divergence of Quercus acutissima and Q. chenii occurred at TDIV. The two newly formed populations continued to exchange alleles until time TAM (TAM < TDIV). NACU, NCHE and NANC are effective population sizes of Q. acutissima, Q. chenii and the ancestral population, respectively. 4NCHEmA→C and 4NACUmC→A are effective migration rates between the two species. Mean PGSM and shape are parameters of the generalized stepwise mutation (GSM) model. Note that the posterior estimate values were converted from log to linear scale.
Genetic admixture pattern
The Q. acutissima populations inside the ancient contact zone showed a higher level of genetic admixture (QCHE) than those outside this zone, regardless of whether the two sympatric sites are included or not (t-test: all P values < 0.05; Fig. 4 and Supplementary Data Fig. S11). Both population- and individual-level GLMs indicated that the degree of admixture tended to be higher in localities where Q. chenii is present or those exhibiting more suitable climatic conditions for Q. chenii during the mPWP (Table 2; Supplementary Data Table S11 and Fig. S12). Within the ancient contact zone, we detected a positive effect of both the presence of Q. chenii in a given locality and the climatic niche suitability for Q. chenii during the Pleistocene (i.e. the MIS19 and LGM) (Table 2; Supplementary Data Table S11 and Fig. S12). Although the climatic niche suitability for Q. acutissima during the LGM was included in the individual-level model, it showed a non-significant effect with the unconditional 95 % confidence interval crossing zero (Table 2 and Supplementary Data Table S11). Hartigans’ dip test indicated that the ancient contact zone formed a sharp bimodal distribution of ancestry estimates (Fig. 5). Similar patterns were also observed for the two sympatric populations of Q. acutissima and Q. chenii, i.e. populations 23 and 45 (AQY and CQY), and populations 30, 42 and 43 (AWTM, CWTM, and CZN) (all P values < 0.05; Fig. 5).
General linear models (GLMs) showing relationships between genetic admixture and three sets of explanatory variables for all Quercus acutissima populations and those inside the ancient contact zone at both population and individual levels.
| Parameter . | Estimate . | USE . | Lower 95 % CI . | Upper 95 % CI . | Σωi . |
|---|---|---|---|---|---|
| All Q. acutissima populations (n = 28†) | |||||
| Intercept | 0.024* | 0.008 | 0.009 | 0.038 | – |
| Presence | 0.086* | 0.019 | 0.049 | 0.123 | 0.61 |
| SmPWP_CHE | 0.027* | 0.013 | 0.001 | 0.053 | 0.42 |
| All Q. acutissima individuals (n = 650†) | |||||
| Intercept | 0.021* | 0.006 | 0.010 | 0.033 | – |
| Presence | 0.080* | 0.015 | 0.050 | 0.109 | 0.77 |
| SmPWP_CHE | 0.032* | 0.012 | 0.009 | 0.055 | 0.77 |
| All Q. acutissima populations with the two sympatric ones removed (n = 26†) | |||||
| Intercept | 0.022* | 0.008 | 0.007 | 0.037 | – |
| SmPWP_CHE | 0.033* | 0.013 | 0.007 | 0.058 | 0.63 |
| SMIS19_CHE | −0.037 | 0.020 | −0.077 | 0.002 | 0.20 |
| SLGM_CHE | −0.155 | 0.097 | −0.346 | 0.036 | 0.12 |
| SLGM_ACU | −0.024 | 0.017 | −0.057 | 0.008 | 0.10 |
| SPRES_CHE | −0.055 | 0.042 | −0.137 | 0.027 | 0.08 |
| All Q. acutissima individuals with the two sympatric populations removed (n = 601†) | |||||
| Intercept | 0.023* | 0.007 | 0.010 | 0.037 | – |
| SmPWP_CHE | 0.038* | 0.011 | 0.016 | 0.059 | 0.65 |
| SMIS19_CHE | −0.043* | 0.020 | −0.082 | −0.005 | 0.34 |
| SLGM_CHE | −0.171 | 0.090 | −0.348 | 0.007 | 0.18 |
| SLGM_ACU | −0.024 | 0.014 | −0.052 | 0.004 | 0.13 |
| Q. acutissima populations inside the ancient contact zone (n = 11) | |||||
| Intercept | 0.046* | 0.010 | 0.026 | 0.066 | – |
| SLGM_CHE | 0.425* | 0.125 | 0.180 | 0.671 | 0.23 |
| Presence | 0.078* | 0.023 | 0.033 | 0.123 | 0.22 |
| SMIS19_CHE | 0.102* | 0.035 | 0.034 | 0.170 | 0.10 |
| Q. acutissima individuals inside the ancient contact zone (n = 279) | |||||
| Intercept | 0.070* | 0.026 | 0.018 | 0.121 | – |
| SLGM_CHE | 0.429* | 0.116 | 0.201 | 0.657 | 0.24 |
| Presence | 0.078* | 0.021 | 0.036 | 0.120 | 0.20 |
| SLGM_ACU | −0.082 | 0.052 | −0.183 | 0.018 | 0.25 |
| Parameter . | Estimate . | USE . | Lower 95 % CI . | Upper 95 % CI . | Σωi . |
|---|---|---|---|---|---|
| All Q. acutissima populations (n = 28†) | |||||
| Intercept | 0.024* | 0.008 | 0.009 | 0.038 | – |
| Presence | 0.086* | 0.019 | 0.049 | 0.123 | 0.61 |
| SmPWP_CHE | 0.027* | 0.013 | 0.001 | 0.053 | 0.42 |
| All Q. acutissima individuals (n = 650†) | |||||
| Intercept | 0.021* | 0.006 | 0.010 | 0.033 | – |
| Presence | 0.080* | 0.015 | 0.050 | 0.109 | 0.77 |
| SmPWP_CHE | 0.032* | 0.012 | 0.009 | 0.055 | 0.77 |
| All Q. acutissima populations with the two sympatric ones removed (n = 26†) | |||||
| Intercept | 0.022* | 0.008 | 0.007 | 0.037 | – |
| SmPWP_CHE | 0.033* | 0.013 | 0.007 | 0.058 | 0.63 |
| SMIS19_CHE | −0.037 | 0.020 | −0.077 | 0.002 | 0.20 |
| SLGM_CHE | −0.155 | 0.097 | −0.346 | 0.036 | 0.12 |
| SLGM_ACU | −0.024 | 0.017 | −0.057 | 0.008 | 0.10 |
| SPRES_CHE | −0.055 | 0.042 | −0.137 | 0.027 | 0.08 |
| All Q. acutissima individuals with the two sympatric populations removed (n = 601†) | |||||
| Intercept | 0.023* | 0.007 | 0.010 | 0.037 | – |
| SmPWP_CHE | 0.038* | 0.011 | 0.016 | 0.059 | 0.65 |
| SMIS19_CHE | −0.043* | 0.020 | −0.082 | −0.005 | 0.34 |
| SLGM_CHE | −0.171 | 0.090 | −0.348 | 0.007 | 0.18 |
| SLGM_ACU | −0.024 | 0.014 | −0.052 | 0.004 | 0.13 |
| Q. acutissima populations inside the ancient contact zone (n = 11) | |||||
| Intercept | 0.046* | 0.010 | 0.026 | 0.066 | – |
| SLGM_CHE | 0.425* | 0.125 | 0.180 | 0.671 | 0.23 |
| Presence | 0.078* | 0.023 | 0.033 | 0.123 | 0.22 |
| SMIS19_CHE | 0.102* | 0.035 | 0.034 | 0.170 | 0.10 |
| Q. acutissima individuals inside the ancient contact zone (n = 279) | |||||
| Intercept | 0.070* | 0.026 | 0.018 | 0.121 | – |
| SLGM_CHE | 0.429* | 0.116 | 0.201 | 0.657 | 0.24 |
| Presence | 0.078* | 0.021 | 0.036 | 0.120 | 0.20 |
| SLGM_ACU | −0.082 | 0.052 | −0.183 | 0.018 | 0.25 |
The model-averaged parameter estimates were computed among a set of candidate models with ΔAICC ≤ 2 (see Supplementary Data Table S11). USE, unconditional standard error; 95 % CI, unconditional 95 % confidence interval; Σωi, relative importance of a variable based on the sum of AICc weights of the models with ΔAICC ≤ 2; Presence, presence of Q. chenii in a given locality; S, climatic niche suitability for Q. acutissima (ACU) or Q. chenii (CHE) during the present (PRES), Last Glacial Maximum (LGM), Marine Isotope Stage 19 (MIS19) in the Pleistocene, and the mid-Pliocene Warm Period (mPWP).
*Significant estimate whose unconditional 95 % CI did not cross zero.
†Two Q. acutissima populations (ALYG and AQHD) were excluded from our analysis because they were below sea level during the mPWP (i.e. SmPWP values are missing).
General linear models (GLMs) showing relationships between genetic admixture and three sets of explanatory variables for all Quercus acutissima populations and those inside the ancient contact zone at both population and individual levels.
| Parameter . | Estimate . | USE . | Lower 95 % CI . | Upper 95 % CI . | Σωi . |
|---|---|---|---|---|---|
| All Q. acutissima populations (n = 28†) | |||||
| Intercept | 0.024* | 0.008 | 0.009 | 0.038 | – |
| Presence | 0.086* | 0.019 | 0.049 | 0.123 | 0.61 |
| SmPWP_CHE | 0.027* | 0.013 | 0.001 | 0.053 | 0.42 |
| All Q. acutissima individuals (n = 650†) | |||||
| Intercept | 0.021* | 0.006 | 0.010 | 0.033 | – |
| Presence | 0.080* | 0.015 | 0.050 | 0.109 | 0.77 |
| SmPWP_CHE | 0.032* | 0.012 | 0.009 | 0.055 | 0.77 |
| All Q. acutissima populations with the two sympatric ones removed (n = 26†) | |||||
| Intercept | 0.022* | 0.008 | 0.007 | 0.037 | – |
| SmPWP_CHE | 0.033* | 0.013 | 0.007 | 0.058 | 0.63 |
| SMIS19_CHE | −0.037 | 0.020 | −0.077 | 0.002 | 0.20 |
| SLGM_CHE | −0.155 | 0.097 | −0.346 | 0.036 | 0.12 |
| SLGM_ACU | −0.024 | 0.017 | −0.057 | 0.008 | 0.10 |
| SPRES_CHE | −0.055 | 0.042 | −0.137 | 0.027 | 0.08 |
| All Q. acutissima individuals with the two sympatric populations removed (n = 601†) | |||||
| Intercept | 0.023* | 0.007 | 0.010 | 0.037 | – |
| SmPWP_CHE | 0.038* | 0.011 | 0.016 | 0.059 | 0.65 |
| SMIS19_CHE | −0.043* | 0.020 | −0.082 | −0.005 | 0.34 |
| SLGM_CHE | −0.171 | 0.090 | −0.348 | 0.007 | 0.18 |
| SLGM_ACU | −0.024 | 0.014 | −0.052 | 0.004 | 0.13 |
| Q. acutissima populations inside the ancient contact zone (n = 11) | |||||
| Intercept | 0.046* | 0.010 | 0.026 | 0.066 | – |
| SLGM_CHE | 0.425* | 0.125 | 0.180 | 0.671 | 0.23 |
| Presence | 0.078* | 0.023 | 0.033 | 0.123 | 0.22 |
| SMIS19_CHE | 0.102* | 0.035 | 0.034 | 0.170 | 0.10 |
| Q. acutissima individuals inside the ancient contact zone (n = 279) | |||||
| Intercept | 0.070* | 0.026 | 0.018 | 0.121 | – |
| SLGM_CHE | 0.429* | 0.116 | 0.201 | 0.657 | 0.24 |
| Presence | 0.078* | 0.021 | 0.036 | 0.120 | 0.20 |
| SLGM_ACU | −0.082 | 0.052 | −0.183 | 0.018 | 0.25 |
| Parameter . | Estimate . | USE . | Lower 95 % CI . | Upper 95 % CI . | Σωi . |
|---|---|---|---|---|---|
| All Q. acutissima populations (n = 28†) | |||||
| Intercept | 0.024* | 0.008 | 0.009 | 0.038 | – |
| Presence | 0.086* | 0.019 | 0.049 | 0.123 | 0.61 |
| SmPWP_CHE | 0.027* | 0.013 | 0.001 | 0.053 | 0.42 |
| All Q. acutissima individuals (n = 650†) | |||||
| Intercept | 0.021* | 0.006 | 0.010 | 0.033 | – |
| Presence | 0.080* | 0.015 | 0.050 | 0.109 | 0.77 |
| SmPWP_CHE | 0.032* | 0.012 | 0.009 | 0.055 | 0.77 |
| All Q. acutissima populations with the two sympatric ones removed (n = 26†) | |||||
| Intercept | 0.022* | 0.008 | 0.007 | 0.037 | – |
| SmPWP_CHE | 0.033* | 0.013 | 0.007 | 0.058 | 0.63 |
| SMIS19_CHE | −0.037 | 0.020 | −0.077 | 0.002 | 0.20 |
| SLGM_CHE | −0.155 | 0.097 | −0.346 | 0.036 | 0.12 |
| SLGM_ACU | −0.024 | 0.017 | −0.057 | 0.008 | 0.10 |
| SPRES_CHE | −0.055 | 0.042 | −0.137 | 0.027 | 0.08 |
| All Q. acutissima individuals with the two sympatric populations removed (n = 601†) | |||||
| Intercept | 0.023* | 0.007 | 0.010 | 0.037 | – |
| SmPWP_CHE | 0.038* | 0.011 | 0.016 | 0.059 | 0.65 |
| SMIS19_CHE | −0.043* | 0.020 | −0.082 | −0.005 | 0.34 |
| SLGM_CHE | −0.171 | 0.090 | −0.348 | 0.007 | 0.18 |
| SLGM_ACU | −0.024 | 0.014 | −0.052 | 0.004 | 0.13 |
| Q. acutissima populations inside the ancient contact zone (n = 11) | |||||
| Intercept | 0.046* | 0.010 | 0.026 | 0.066 | – |
| SLGM_CHE | 0.425* | 0.125 | 0.180 | 0.671 | 0.23 |
| Presence | 0.078* | 0.023 | 0.033 | 0.123 | 0.22 |
| SMIS19_CHE | 0.102* | 0.035 | 0.034 | 0.170 | 0.10 |
| Q. acutissima individuals inside the ancient contact zone (n = 279) | |||||
| Intercept | 0.070* | 0.026 | 0.018 | 0.121 | – |
| SLGM_CHE | 0.429* | 0.116 | 0.201 | 0.657 | 0.24 |
| Presence | 0.078* | 0.021 | 0.036 | 0.120 | 0.20 |
| SLGM_ACU | −0.082 | 0.052 | −0.183 | 0.018 | 0.25 |
The model-averaged parameter estimates were computed among a set of candidate models with ΔAICC ≤ 2 (see Supplementary Data Table S11). USE, unconditional standard error; 95 % CI, unconditional 95 % confidence interval; Σωi, relative importance of a variable based on the sum of AICc weights of the models with ΔAICC ≤ 2; Presence, presence of Q. chenii in a given locality; S, climatic niche suitability for Q. acutissima (ACU) or Q. chenii (CHE) during the present (PRES), Last Glacial Maximum (LGM), Marine Isotope Stage 19 (MIS19) in the Pleistocene, and the mid-Pliocene Warm Period (mPWP).
*Significant estimate whose unconditional 95 % CI did not cross zero.
†Two Q. acutissima populations (ALYG and AQHD) were excluded from our analysis because they were below sea level during the mPWP (i.e. SmPWP values are missing).
Spatial interpolation for the probability of membership of each Quercus acutissima population to the Q. chenii genetic cluster (QCHE) as inferred by STRUCTURE analyses. The interpolated surface was obtained using the inverse distance-weighted method in ArcGIS 10.3 (ESRI). Plus signs represent the sampling sites of Q. chenii. Solid and open dots represent the sampling sites of Q. acutissima inside and outside the ancient contact zone, respectively.
Distribution of ancestry estimates in the ancient contact zone (A) and two sympatric populations of Quercus acutissima and Q. chenii, i.e. populations 23 and 45 (AQY and CQY) (B), and populations 30, 42 and 43 (AWTM, CWTM and CZN; see Fig. 1) (C). The ancestry is based on the probability of membership to the genetic cluster of Q. acutissima (QACU). Hartigans’ dip statistic D and its P value were computed using the R package diptest. Colours of bars represent four groups: parental individuals of Q. acutissima (group PA, blue), parental individuals of Q. chenii (group PC, orange), hybrids morphologically assigned to Q. acutissima (group HA, green) and hybrids morphologically assigned to Q. chenii (group HC, red).
Chloroplast DNA sequence analyses
The lengths of consensus sequences after alignment of atpB-rbcL, psbA-trnH and trnS-trnG were 718, 645, and 607 bp, respectively. We identified 29 haplotypes based on 18 nucleotide substitutions, seven indels and a 32-bp-long inversion (Supplementary Data Table S12). Fourteen haplotypes were specific to Q. acutissima, ten were private to Q. chenii and only five were shared by the two species (Fig. 6). The most widespread shared haplotype (H1) was found to occur in 23 populations of Q. acutissima (76.7 %) and seven populations of Q. chenii (38.9 %), while the most narrowly distributed one (H17) was only detected in a sympatric population in East China (i.e. populations 23 and 45; Fig. 6A). Nearly 93.8 % of the individuals with locally shared haplotypes (H7, H10, H17 and H18) were found to occur in the ancient contact zone detected by the ENM analyses (Fig. 6A and Supplementary Data Table S4).
Geographical distribution (A) and median-joining network (B) of 29 chloroplast DNA haplotypes of Quercus acutissima and Q. chenii. (A) Circle size is proportional to the sample size. Orange and green areas indicate the climatically suitable areas for Q. chenii during the mid-Pliocene Warm Period (mPWP) and during the present, respectively. They were created from the logistic outputs of MAXENT using the maximum sensitivity plus specificity threshold (Gugger et al., 2013). Population codes (Q. acutissima: 1–30; Q. chenii: 31–48) are described in Supplementary Data Table S1. *Populations inside the ancient contact zone. (B) Circle size is proportional to the frequency of a haplotype across all populations. Small black dots indicate inferred intermediate haplotypes not detected in this investigation. Numbers in brackets on branches indicate the number of mutations between haplotypes when branches represent more than one mutation. Three outgroups include Q. phillyraeoides (phi.; GenBank accession number MK105462; Pang et al., 2019), Q. dolicholepis (dol.; KU240010; Yang et al., 2016) and Q. baronii. (bar.; KT963087; Yang et al., 2017). *The haplotypes shared by the two species.
The median-joining network grouped all the haplotypes into three non-species-specific clades (Fig. 6B). Neither species exhibited a significant phylogeographical structure (Q. acutissima: NST > GST, P = 0.482; Q. chenii: NST < GST, P = 0.453; Supplementary Data Table S5). Interspecific differentiation at cpDNA markers was weak (FCT = 0.025, P = 0.096), with only 2.5 % of the variation partitioned among species (Supplementary Data Table S6).
Discussion
Genetic legacy of ancient hybridization in a mid-Pliocene contact zone
Our current study provides evidence from both nSSR and cpDNA markers in support of the occurrence of hybridization between Q. acutissima and Q. chenii. We identified a total of 74 putative hybrids (~6.7 % of all the samples) based on the results of STRUCTURE. These findings were confirmed by a combination of PCA, NewHybrids analysis and maximum-likelihood hybrid index calculations. We also detected 112 individuals (~19.4 % of the tested samples) harbouring locally shared chloroplast haplotypes between the two species. The geographically associated pattern of shared haplotypes is more likely to be a result of chloroplast capture, i.e. chloroplast of one species being transferred to another through hybridization (Acosta and Premoli, 2010), rather than incomplete lineage sorting, which would be evidenced by a random distribution of shared polymorphisms between species (Zhou et al., 2017). Notably, we found that more than 70 % of the hybrids determined by STRUCTURE analysis and more than 90 % of the individuals with locally shared haplotypes are concentrated within a region between ~30°N and 35°N. These trees occur not only in the sympatric populations of eastern China but also in the western populations of Q. acutissima separated by ~400–600 km from the closest presence points of Q. chenii. Two hypotheses may explain our findings: (1) long-distance gene flow (Dodd and Afzal-Rafii, 2004; Ortego et al., 2017) and (2) historical hybridization in an ancient contact zone followed by the local extinction of Q. chenii (Ortego et al., 2014, 2018). In this case, we combine multiple lines of evidence to argue for the latter scenario.
First, ENMs show that Q. acutissima and Q. chenii may have had an ancient contact zone between ~30°N and 35°N during the mPWP (Figs 1A and 3). This interpretation is supported by the common presence of Q. chenii macrofossils in Late Pliocene sediments (~3.6–2.6 mya) at more northerly latitudes (~34–35°N, the Osaka and Tokai Groups of central Japan) than the species’ current range limit (Momohara, 2010, 2016). Following this warm period, vegetation in northern China experienced a transition from open deciduous woodland with steppe to spruce–pine conifer forest (Liu et al., 2002; Salzmann et al., 2008). Together with other thermophilous subtropical tree species (Liu et al., 2002; Qin et al., 2011; Li et al., 2019b), Q. chenii has suffered local extinction at the northern margin of its distribution (last occurrence at ~2.5 mya in the Tokai Group, Japan; Momohara, 2016) as a result of both climate cooling and strengthening of the winter monsoon in East Asia (Li et al., 2004; Ge et al., 2013). Such a change coincides with the ENM predictions that Q. chenii is likely to have retreated southward to its current distribution at least since the Early Pleistocene (Fig. 3), implying that the ancient contact zone may have disappeared along with the increased dryness and coldness during the glacial periods.
Second, our ABC analyses strongly support the scenario that there had been a long period of ancient contact followed by a recent period of strict isolation. This model is exactly in agreement with the expectation of historical hybridization followed by the local extinction of one parental species. Using an average generation time of 100–150 years for Q. acutissima and Q. chenii (Zeng et al., 2015; Zhang et al., 2018), we inferred that the mode of the time when the two species stopped exchanging alleles was 1.175–0.783 mya (Table 1). Although the estimated date must be interpreted with extreme caution, given the uncertainty in mutation models and generation time (Lande et al., 2003), it is consistent with the ENM results that the degree of range overlap decreased significantly during the Early Pleistocene (Fig. 3). It should also be noted that we did not consider the variation of population size or recurrent cycles of contacts and isolation driven by the Pleistocene climate oscillations (Sun et al., 2015). Thus, our demographic inference reflects a simplified scenario that allopatric divergence between the two species has been reinforced through isolation since the onset of the glacial periods.
Finally, we detect hybrids exhibiting a low level of admixture in almost all the allopatric populations of Q. acutissima inside the ancient contact zone (Figs 1 and 4). One possibility is that the two species have hybridized widely, followed by recurrent backcrosses with Q. acutissima in the absence of Q. chenii (Lepais et al., 2009). Moreover, the observed hybrids present a scattered distribution across the mountainous areas in both western and eastern China. Previous phylogeographical studies have shown that these montane regions (e.g. the Qin Mountains, the Daba Mountains, the Dabie Mountains and the Tianmu Mountains) may have acted as glacial refugia for Q. acutissima (Zhang et al., 2015, 2018). Thus, although Q. chenii suffered local extinction, the genetically admixed individuals can still survive in situ among isolated habitats with high ecological stability. However, the complex landscape in the ancient contact zone could have reduced long-distance pollen dispersal. Using nuclear genetic markers, previous work has revealed a strong phylogeographical break associated with the topographic, climatic and vegetational differences between western and eastern China for many woody plants (Sun et al., 2016; Luo et al., 2020), including wind-pollinated trees (Zhang et al., 2018; Gao et al., 2021). Thus, even considering that oaks have a high potential for long-distance pollen dispersal (Buschbom et al., 2011), it seems unlikely that the reduced contemporary gene flow among western and eastern populations can fully explain the observed hybrid structure in the ancient contact zone.
Overall, we combine evidence from fossil records, demographic inference and ENM to support the occurrence of a mid-Pliocene contact zone between Q. acutissima and Q. chenii. These two species may have experienced extensive pre-glacial gene flow followed by reinforced post-glacial isolation. At present, we detect a clear-cut bimodal hybrid structure in the ancient contact zone (Fig. 5), suggesting that the accumulation of reproductive barriers may have significantly reduced the ongoing hybridization rate (Zeng et al., 2011; Ortego et al., 2014). Such a finding is consistent with the expanded biological species concept (Wang et al., 2020), which expects substantial gene flow at the onset of speciation until reproductive isolation evolves after the gradual development of geographical isolation. However, the demographic history estimated here is different from that of European white oaks. Using genome-wide nuclear markers, recent studies have shown that long-term strict isolation followed by recent postglacial contacts has occurred between four white oak species in Europe (Leroy et al., 2017, 2020). This difference in gene flow history may be attributed to the fact that the two East Asian Cerris oaks diverged ~ 20 mya earlier than their European congeners (Hipp et al., 2020). Such a long period would provide sufficient time for Q. acutissima and Q. chenii to reinforce the reproductive barrier, which may have been strong enough to completely suppress the recent interspecific gene flow (Roux et al., 2016).
Influence of Pliocene and Pleistocene climates on genetic admixture patterns
Our investigation demonstrates that the Q. acutissima populations inside the ancient contact zone exhibit a higher level of genetic admixture than those outside this zone (Supplementary Data Fig. S11). One variable, the climatic niche suitability index for Q. chenii (SCHE) during the mPWP, shows the best performance in explaining the overall patterns of admixture regardless of whether the two sympatric sites are included or not (Table 2; Supplementary Data Table S11 and Fig. S12). Such a result is partially consistent with the findings of Ortego et al. (2014), who concluded that climatic niche suitability is associated with the observed admixture patterns between Q. engelmannii and the widespread scrub oaks in California. However, in contrast to previous work, our most parsimonious models only include SCHE for the mPWP rather than that for the present as an explanatory variable, suggesting a much stronger effect of mid-Pliocene climate in constraining the admixture patterns that may have originated from the ancient contact events during the mPWP.
Within the ancient contact zone, our analyses indicate that SCHE for the LGM plays a more important role in determining the local patterns of admixture than that for other periods (Table 2; Supplementary Data Table S11 and Fig. S12). The observed positive trend between SLGM_CHE and QCHE is mainly driven by the two sympatric sites that present the most suitable LGM climatic conditions for Q. chenii (Supplementary Data Fig. S12). One explanation is that the extremely cool climate during the glacial period has exerted a strong ‘filter’ effect on the occurrence of interspecific gene flow through demographic forces (Ortego et al., 2018). This interpretation is supported by the ENM predictions showing that the two sympatric sites have maintained a moderate to high climatic suitability for both species throughout the Pleistocene (SMIS19 and SLGM > 0.18), whereas the other sites were unsuitable for Q. chenii at least in the LGM (SLGM < 0.04; Fig. 3), thus reducing the probability of historical gene exchange. Furthermore, we detect a high level of nuclear diversity as well as the fixation of locally shared chloroplast haplotypes in the two sympatric populations (Fig. 6), suggesting that the two species may have persisted in situ within an ecologically stable area (Zhang et al., 2018; Li et al., 2019a). Combining all these lines of evidence, we infer that the mountainous areas around the two sympatric sites have acted as a shared glacial refugium for Q. acutissima and Q. chenii, allowing the long-term coexistence and historical hybridization between them. In contrast, the remaining sites are characterized by low ecological stability, which would reduce the survival rate of offspring and further restrict interspecific gene exchange.
Another explanation for the correlation between SLGM_CHE and QCHE is that sympatry (i.e. Presence) creates the opportunity for contemporary hybridization, but the current local distribution of Q. chenii is still constrained by LGM climate. This situation may occur when range shifts of oak trees lag behind climate change, while interspecific gene flow continues in previous overlapping distributions (Nathan et al., 2011; Browne et al., 2019). However, our present study provides weak support for this hypothesis because all the scenarios considering contemporary gene flow (i.e. the IM and SC models) are rejected by demographic inference (Supplementary Data Tables S8 and S9). Furthermore, both the sharp bimodal hybrid structure and the lack of F1 hybrids suggest that the ongoing hybridization rate is rather low within the sympatric populations (Zeng et al., 2011; Ortego et al., 2014; Kim et al., 2018). Although a previous study has described a putative hybrid variety (Q. chenii var. linanensis) from one of the sympatric sites (i.e. Lin’an County; Liu, 1992), the asymmetry in parental contributions to the ancestry of the hybrids sampled here suggested that this variety is more likely to have arisen from ancient introgression rather than contemporary hybridization between Q. acutissima and Q. chenii (Zhang et al., 2019).
Our results highlight that palaeoclimate change shapes present admixture patterns between the two East Asian Cerris oaks by influencing the extent of historical range overlap (Maze, 1968). Specifically, the mid-Pliocene warm climate promoted ancient contact, allowing widespread hybridization throughout central China. In contrast, the Pleistocene cool climate caused the local extinction of Q. chenii, reducing the probability of interspecific gene flow in most areas except those sites having a high level of ecological stability.
Limitations and future directions
Our study has several limitations. First, the microsatellite dataset only includes a limited number of nuclear markers and represents a rather low proportion of the oak genome. These markers show good performance in identifying hybrids, but they do not provide sufficient information to reliably estimate the direction and extent of interspecific gene flow (Supplementary Data Fig. S10). Second, considering that most microsatellites are neutral loci reflecting the patterns of mutation, genetic drift and gene flow (Scotti-Saintagne et al., 2004), it is difficult to assess whether selective forces (e.g. adaptive introgression) also contribute to the admixture pattern. Finally, compared with other types of markers, microsatellites are more likely to be affected by homoplasy because of their high mutation rates and allele size constraints, which also hampers the estimation of long evolutionary history (Estoup et al., 2002; Camacho-Sanchez et al., 2020). To overcome these shortcomings, future studies should utilize genome-wide informative markers from reduced library sequencing (e.g. Ortego et al., 2018; Kim et al., 2018; Burge et al., 2019) or when possible whole genome sequences, to robustly evaluate the influence of past climate change on the extent, direction and timing of interspecific gene flow. In combination with the identification of genomic introgression regions, such work would provide novel insight into how ancient introgression has interacted with both non-adaptive and adaptive processes to shape the complex evolutionary history of East Asian Cerris oaks. Nonetheless, the use of multiple types of corroborative evidence supports the conclusions reported here regarding the associations between palaeoclimate change, historical range shifts and ancient hybridization history.
Conclusions
Our study demonstrates that the current admixture patterns among East Asian Cerris oaks reflect the genetic imprints of ancient hybridization affected by both Pliocene and Pleistocene climates. This finding differs from that reported for European white oaks, whose hybridization history is mainly constrained by the last glacial–interglacial cycle (Petit et al., 2002a; Leroy et al., 2017, 2020), but is generally consistent with the patterns found in western North America, where genetically distant or allopatric oak species exchanged alleles in much older times and left signals of historical introgression in contemporary populations (e.g. Maze, 1968; McVay et al., 2017; Kim et al., 2018; Ortego et al., 2018). Our results indicate that the hybridization history between sister species occurring in areas less affected by glaciations is more complex than that for species seriously threatened by Pleistocene climate change. The impact of pre-Quaternary climate should be taken into account especially when exploring ancient gene flow among species native to East Asia.
SUPPLEMENTARY DATA
Supplementary data are available online at https://academic.oup.com/aob and consist of the following. Table S1: Locations of 30 Quercus acutissima populations and 18 Q. chenii populations. Table S2: Primer sequences, annealing temperatures and lengths of consensus sequences for the three chloroplast intergenic spacers. Table S3: Primer sequences, annealing temperature and genetic statistics for the seven nuclear microsatellite loci. Table S4: Genetic statistics for each population of Q. acutissima and Q. chenii. Table S5: Genetic statistics of Q. acutissima and Q. chenii. Table S6: Analyses of molecular variance for Q. acutissima and Q. chenii. Table S7: Descriptive statistics and multiple comparisons for PC1 score and maximum-likelihood hybrid index among parental individuals of Q. acutissima and Q. chenii and their hybrids. Table S8: Model choice using the multinomial logistic regression approach. Table S9: Model choice using the random forest approach. Table S10: Fraction of models correctly predicted and classification error rate estimated by both multinomial logistic regression and random forest approaches. Table S11: Model selection to assess the relationships between genetic admixture and three sets of explanatory variables for all Q. acutissima populations and those inside the ancient contact zone at both population and individual levels. Table S12: Chloroplast haplotypes identified in this study based on 18 nucleotide substitutions, seven indels and one 32-bp inversion. Fig. S1: The receiver operating characteristic curves averaged over ten replicate runs. Fig. S2: ΔK and Ln Pr(X|K) obtained in Bayesian cluster analysis for each value of K. Fig. S3: Variable importance estimated by the random forest approach. Fig. S4: Genetic differentiation index between Q. acutissima and Q. chenii under the four demographic models. Fig. S5: Posterior predictive checks for the ancient migration model. Fig. S6: Posterior predictive checks for the isolation with migration model. Fig. S7: Posterior predictive checks for the secondary contact model. Fig. S8: Posterior predictive checks for the strict isolation model. Fig. S9: Goodness of fit for the four demographic models. Fig. S10: Prior and posterior distributions of parameters in the ancient migration model. Fig. S11: Probability of membership to the genetic cluster of Q. chenii for Q. acutissima populations inside and outside the ancient contact zone. Fig. S12: Linear correlations between genetic admixture and climatic niche suitability for Q. chenii during the Last Glacial Maximum or mid-Pliocene Warm Period.
ACKNOWLEDGEMENTS
We thank Qingliang Liu, Baokun Xu, Xuan Li, Zhongren Xiong, Xiaochen Zhang, Kaiwen Zhang, Sujing Fu, Heng Jia, Wenbin Xu, Huachen Wang, Gang Yao, Ye Tian, Xulan Shang, Shengzuo Fang and Qianru Liu for their help with fieldwork.
Funding
This work was supported by the National Natural Science Foundation of China (31770699, 31870506, 31370666), the China Postdoctoral Science Foundation (2020M681629), the Jiangsu Postdoctoral Research Funding Program (2021K038A), the Strategic Priority Research Program of the Chinese Academy of Sciences (XDB31000000), the Natural Science Foundation of Jiangsu Province, China (BK20181398), the Postgraduate Research & Practice Innovation Program of Jiangsu Province (KYLX15_0922), and the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD).
DATA ACCESSIBILITY
Sequence data are available on GenBank (http://www.ncbi.nlm.nih.gov/genbank/) under accession numbers KT152178–KT152200 and MH924168–MH92419. The microsatellite genotype dataset is available from Zenodo: https://doi.org/10.5281/zenodo.1442911. The input files and R scripts for the ABC analyses are available from Zenodo: https://doi.org/10.5281/zenodo.5631037.





