Power of Bayesian and Heuristic Tests to Detect Cross-Species Introgression with Reference to Gene Flow in the Tamias quadrivittatus Group of North American Chipmunks

Abstract In the past two decades, genomic data have been widely used to detect historical gene flow between species in a variety of plants and animals. The Tamias quadrivittatus group of North America chipmunks, which originated through a series of rapid speciation events, are known to undergo massive amounts of mitochondrial introgression. Yet in a recent analysis of targeted nuclear loci from the group, no evidence for cross-species introgression was detected, indicating widespread cytonuclear discordance. The study used the heuristic method HYDE to detect gene flow, which may suffer from low power. Here we use the Bayesian method implemented in the program BPP to re-analyze these data. We develop a Bayesian test of introgression, calculating the Bayes factor via the Savage-Dickey density ratio using the Markov chain Monte Carlo (MCMC) sample under the model of introgression. We take a stepwise approach to constructing an introgression model by adding introgression events onto a well-supported binary species tree. The analysis detected robust evidence for multiple ancient introgression events affecting the nuclear genome, with introgression probabilities reaching 63%. We estimate population parameters and highlight the fact that species divergence times may be seriously underestimated if ancient cross-species gene flow is ignored in the analysis. We examine the assumptions and performance of HYDE and demonstrate that it lacks power if gene flow occurs between sister lineages or if the mode of gene flow does not match the assumed hybrid-speciation model with symmetrical population sizes. Our analyses highlight the power of likelihood-based inference of cross-species gene flow using genomic sequence data. [Bayesian test; BPP; chipmunks; introgression; MSci; multispecies coalescent; Savage-Dickey density ratio.]


Introduction
Genomic sequence data are a rich source of information concerning the history of species divergences and cross-species gene flow. The past two decades have seen widespread use of genomic data to infer hybridization or introgression (Mallet et al., 2016). Gene flow has been detected in a variety of species including Arabidopsis (Arnold et al., 2016), butterflies (Martin et al., 2013), corals (Mao et al., 2018), lizards (Finger et al., 2022), birds (Ellegren et al., 2012), and mammals (Chan et al., 2013;Kumar et al., 2017;Shi and Yang, 2018). The studies have considerably enriched our understanding of the evolutionary dynamics of introgressed genes, and the role of introgression in speciation and ecological adaptation (Payseur and Rieseberg, 2016;Martin and Jiggins, 2017).
A number of statistical methods have been developed to analyze genomic sequence data to detect gene flow between species and to estimate its strength (as measured by the introgression probability or migration rate). Heuristic or summary methods are based on summaries of the multilocus sequence data and include the popular D-statistic or ABBA-BABA test (Patterson et al., 2012), HYDE (Blischak et al., 2018), and SNAQ (Solis-Lemus and Ane, 2016). The D-statistic and HYDE use the site-pattern counts for a species quartet to test for the presence of gene flow between nonsister species, while SNAQ uses the frequencies of estimated gene tree topologies. Likelihood methods use the multilocus sequence alignments directly and include the Bayesian implementations of the introgression model in PHYLONET/ MCMC-SEQ (Wen and Nakhleh, 2018), *BEAST (Zhang et al., 2018), and BPP , as well as the maximum-likelihood and Bayesian implementations of the continuous-migration model (also known as the isolation-with-migration or IM model) (Nielsen and Wakeley, 2001;Zhu and Yang, 2012;Dalquen et al., 2017;Hey et al., 2018). See Jiao et al. (2021) for a recent review. In theory, likelihood methods are expected to be more powerful because they use all information in the data about the model and parameters. However, summary and likelihood methods for inferring cross-species gene flow are seldom applied to the same real datasets with their utilities evaluated, partly because likelihood methods typically involve intensive computation and may not be computationally feasible for genome-scale datasets. In this regard, it is noteworthy that the BPP implementation of the multispecies-coalescent-with-introgression (MSci) model has been successfully applied to genomic datasets of more than 10,000 loci  Table 1; Thawornwattana et al., 2022; Table S4).
The Tamias chipmunks (sensu lato, but see Patterson and Norris, 2016) are a diverse group of at least 23 distinct species, occupying a variety of habitats in the western United States. Molecular phylogenetic studies have revealed a complex history of radiative speciations and cross-species gene flow involving morphologically and ecologically diverse lineages (Good and Sullivan, 2001;Good et al., 2003).
The Tamias quadrivittatus group of chipmunks currently consists of nine species that are distributed across the Great Basin along with the central and southern Rocky Mountains in North America (Fig. 1). Previous work on Tamias has highlighted the importance of genital morphology, specifically the baculum (a bone found in the penis) in male chipmunks, as a reliable indicator of species limits (Patterson and Thaeler Jr, 1982;White, 2010). The biogeographic history of the group likely included large range fluctuations that have periodically resulted in isolation and secondary contact among species, which would have affected opportunities for hybridization and/or introgression (Good et al., 2003). The current distributions of species in the group have extensive regions of overlap and broad parapatry in ecological transition zones (Fig. 1), with instances of Table 1 Summary of evidence for mitochondrial introgression in the T. quadrivittatus group (Sullivan et al., 2014)    both allopatry and parapatry, and the determinants of current distributions are thought to be related primarily to competitive exclusion and ecological preference (Brown, 1971;Heller, 1971;Root et al., 2001). The system provides an exciting opportunity to investigate the effects of introgression on genetic variation within and between species.
Hybridization between chipmunk species has been widely reported based on discrepancies between mtDNA, nuclear DNA, and morphology (Good and Sullivan, 2001;Good et al., 2003Good et al., , 2008Hird et al., 2010). Work in the past decade has documented widespread mitochondrial introgression among species of the group (Reid et al., 2012;Sullivan et al., 2014;Sarver et al., 2017Sarver et al., , 2021, which is often asymmetrical, possibly due to bacular morphology, which has been identified in at least six species (Good et al., 2003(Good et al., , 2008Reid et al., 2012;Sullivan et al., 2014). Recent work on six species in the T. quadrivittatus group found that four of them exhibited clear evidence of introgressed mitochondrial DNA: T. cinereicollis, T. dorsalis, T. quadrivittatus, and T. umbrinus (Table 1). The cliff chipmunk (T. dorsalis) was involved in local introgression with multiple other species, receiving mtDNA from whichever congeneric chipmunk it came into contact with. However, populations of T. dorsalis that are geographically isolated carry mtDNA haplotypes that are unique to the species (Sullivan et al., 2014;Sarver et al., 2017). Range overlap in transition zones plays an important role in mitochondrial introgression in Tamias (Brown, 1971;Bi et al., 2019). Sarver et al. (2021) used a targeted sequence-capture approach to sequence thousands of nuclear loci (mostly genes or exons) to estimate the species phylogeny of the T. quadrivittatus group and to infer possible nuclear introgression. The program HYDE (Blischak et al., 2018) was used to infer gene flow. Surprisingly, no significant evidence for gene flow involving the nuclear genome was detected between any species in the group, despite the evidence for widespread mitochondrial introgression. We note that HYDE, like the D-statistic, uses the four-taxon site-pattern counts pooled across the genome as data, and does not use information in the variation in genealogical history across the genome caused by the stochastic fluctuation of coalescent and introgression (Lohse and Frantz, 2014;Jiao et al., 2021;Zhu and Yang, 2021). As a result, neither the D-statistic nor HYDE can detect gene flow between sister lineages. Importantly, HYDE is designed to estimate the relative genetic contributions of the two parental species which hybridized to form a third species. When applied to detect other modes of gene flow, it makes restrictive assumptions about the direction of gene flow, and about species divergence times and population sizes that may be unrealistic (Fig. 7). The performance of HYDE when its model assumptions are violated is unexplored.
To examine whether the lack of evidence for nuclear introgression in the analysis by Sarver et al. (2021) may be due to the lack of power of HYDE, here we re-analyze the data of Sarver et al. (2021) using the BPP program (Flouri et al., 2018, which includes a Bayesian implementation of the MSci model. Borrowing ideas from stepwise regression or Bayesian variable selection, we add introgression events sequentially onto the binary species tree to construct a joint MSci model with multiple introgression events. We develop a Bayesian test of introgression, calculating the Bayes factor for comparing the null model of no introgression against the alternative model of introgression via the Savage-Dickey density ratio (Dickey, 1971), using a Markov chain Monte Carlo (MCMC) sample under the MSci model. This may have a computational advantage over cross-model MCMC algorithms such as reversible jump MCMC (Green, 1995) or calculation of Bayes factors using thermodynamic integration (Gelman and Meng, 1998;Lartillot and Philippe, 2006). Our re-analysis revealed robust evidence for several ancient introgression events affecting the nuclear genome in the Tamias group, involving both sister and nonsister species. We examine the model assumptions underlying HYDE and use computer simulation to demonstrate that the opposite conclusions reached in the two analyses may be explained by the lack of power of HYDE to detect gene flow. We then assess the impact of ignoring introgression on estimation of population parameters, highlighting serious biases in species divergence time estimation when introgression exists and is ignored. Our results highlight the power of coalescent-based likelihood methods in the analysis of genomic datasets to infer the history of species divergence and gene flow.

Bayes Factor Is Given by the Savage-Dickey Density Ratio in Comparisons of Nested Hypotheses
One can test for the presence of cross-species gene flow by comparing the introgression (MSci) model with the corresponding multispecies-coalescent (MSC) model with no gene flow. The model of no gene flow (H 0 ) is a special case of the introgression model (H 1 ), with H 1 reducing to H 0 when the introgression probability is 0.
The commonly used device for Bayesian model comparison is the Bayes factor, which is the ratio of the marginal likelihood values under the two compared models. When the two models are nested, the Bayes factor is given by the Savage-Dickey density ratio (Dickey, 1971). In general, suppose we wish to compare the null model H 0 : φ = φ 0 against the alternative model H 1 : φ = φ 0 , and suppose that both models have common (nuisance) parameters λ, while parameters ξ in H 1 become unidentifiable when φ = φ 0 . The parameter vector is λ for H 0 and (φ, λ, ξ) for H 1. Given data x, let the likelihood be L 0 (λ) under H 0 and L (φ, λ, ξ) = p (x | φ, λ, ξ) under H 1, with L (φ 0 , λ, ξ) = L 0 (λ) as the two models are nested. Let the prior be π 0 (λ) under H 0 and π (φ, λ, ξ) = π (φ) π (λ | φ) π (ξ | φ, λ) under H 1. The Bayes factor in support of H 1 over H 0 is defined as VOL. 72 where m 0 and m are the marginal likelihoods for the two models, respectively. Under the assumption that the priors on the common parameters (λ) agree between the two models (2) B 10 can be expressed as the ratio of the prior and posterior densities for φ in H 1, both evaluated at the null value φ 0 : A proof is provided in the Appendix. Note that equation (3) holds even when there exist nuisance parameters (λ) in both models, and also when the usual regularity conditions are not met: for example, when the null parameter values (φ 0 ) are at the boundary of the parameter space in H 1, and when some parameters in H 1 (ξ) become unidentifiable when the parameters of interest take the null values (when φ = φ 0 ). Such nonstandard conditions cause considerable difficulties for the likelihood ratio test (LRT), leading to unconventional or unknown null distributions for the test statistic (Self and Liang, 1987). It is interesting that they do not cause any difficulty for the Bayesian test.
If the condition on the priors (equation (2)) does not hold, a correction factor may be applied (Verdinelli and Wasserman, 1995). This is not needed in our application.

Calculation of the Savage-Dickey Density Ratio
The prior density π (φ 0 ) of equation (3) is typically available analytically. The posterior density π (φ 0 | x) can be estimated using a kernel density smoothing procedure using the MCMC sample under H 1 (Silverman, 1986). This means that calculation of B 10 using equation (3) requires running the MCMC under H 1 only and no cross-model algorithms such as reverse-jump MCMC (Green, 1995) are needed. Note that within-model MCMC typically has better mixing properties than cross-model algorithms (Yang, 2014, pp. 247-260).
Suppose φ (1) , φ (2) , · · · , φ (N) are an MCMC sample from the posterior π (φ | x). These are the φ values sampled during the MCMC, with the values for other parameters (λ and ξ) simply ignored. The kernel density estimator at the point φ 0 iŝ (5) The kernel function K is typically symmetrical around 0, with points further away from φ 0 make less contribution to the density at φ 0 . For example, the Gaussian kernel is given as However, this approach may be awkward to apply if the prior or posterior density at the null value, π(φ 0 ) or π(φ 0 | x), is 0 or ∞. In this paper, we use a more intuitive way of deriving the Savage-Dickey density ratio of equation (3), which also provides an approach to its calculation. This treats the problem of testing as a problem of estimation, and assesses how likely the parameter of interest (φ) differs from the null value (φ 0 ). Define a null region or region of null effects, / o : |φ − φ 0 | < ε, inside which φ is very close to φ 0 . The null region is a small part of the parameter space for H 1 that represents H 0 (Fig. 2). We then define a Bayes factor to represent the evidence for H 1 where the differential ∆ is the size of the null region, so that B 10,ε → π(φ 0 )/π(φ 0 | x), as in equation (3). Thus the same conclusion is reached whether the problem is considered a testing problem (equation (1) or (3))) or an estimation problem (equation (7)). The approach is illustrated in Fig. 2 using the simple problem of testing H 0 : µ = 0 against H 1 : µ = 0 using a sample of size n from N (µ, 1). The data are summarized as the sample mean |x|. We assign the prior µ ∼ N 0, σ 2 0 under H 1. The posterior is then µ|x ∼ N(µ 1 , σ 2 1 ), with µ 1 = nx/ [(n + 1)/σ 2 0 ] and 1/σ 2 1 = n + (1/σ 2 0 ). The prior and posterior probabilities of the null interval with the differential to be the width of the null interval, ∆ = 2ε.
The above-mentioned theory applies generally to Bayesian testing of nested hypotheses. Examples include comparison of different species delimitation models (e.g., one-species versus two-species models) (Yang and Rannala, 2010) and test of migration between species (e.g., two species with and without migration) (Nielsen and Wakeley, 2001). The theory may be applied to compare nonnested models as well if they both have the same null model as a special case. Suppose H 0 is a special case of both H 1 and H 2 , and let M 0 , M 1 , M 2 be their marginal likelihood values. Then B 12 = M 1 /M 2 = B 10 /B 20, where the Bayes factors B 10 and B 20 can be calculated using the Savage-Dickey density ratio by running MCMC under H 1 and H 2 . In practice, the approach has only limited precision and works only if the goodness of fit of the compared models is not drastically different. If both H 1 and H 2 fit the data much better than H 0 so that B 10 and B 20 are estimated to be ∞, no sensible estimate for B 12 can be generated.

Test of Introgression
When we use the Savage-Dickey density ratio (equation (3)) to test introgression, the nuisance parameters include species divergence times (τ ) and population sizes (θ) on the species tree. Since we use the same priors on τ and θ in models with and without introgresion, independent of the introgression probabilities (ϕ), the assumption of equation (2) holds. We consider two tests with different assumptions about the population size parameters (Fig. 3). In test 1, the MSci model assigns different θ parameters on the two segments of a branch broken by an introgression event; for example, in Fig. 3a branch RA is broken into two branches RX and XA and assigned θ X and θ A, respectively. The null model of no gene flow will have two θ parameters for the branch as well. Such a model can be implemented in BPP by including ghost species in the MSC model from which no sequences are sampled (Fig. 3a). In the second test, the MSci model assigns the same θ parameter for a branch on the species tree before and after an introgression event (which can be specified using the control variable thetamodel = linked-msci in BPP) (Fig. 3b). When the introgression probability takes the null value (0) in H 1, the introgression time τ X becomes unidentifiable. The proof of equation (A1) applies to both scenarios. In this study, we used test 1. Note that calculating the Bayes factor using the Savage-Dickey density ratio (equation (3) or (7)) requires an MCMC sample from H 1 and does not require any analysis or MCMC run under H 0 .
In our BPP analysis, the introgression probability ϕ is assigned a beta prior beta(a, b), and the null hypothesis corresponds to ϕ 0 = 0 in H 1. Let the null region be / o : ϕ < ε. Then P(/ o) = P(ϕ < ε) in equation (7) is given by the cumulative distribution function (CDF) for beta(a, b), while P(/ o | x) is simply the proportion of the sampled ϕ values that are < ε. Intuitively, the null region / o : ϕ < ε in H 1 represents absence of introgression (as the introgression probability ϕ is negligibly small), is the posterior odds, and B 10 measures the change in the odds in favor of gene flow when we move from the prior to the posterior. We used ε = 0.01 and confirm that use of ε = 0.001 gave very similar results. A cutoff of 20 for B 10 may be considered strong evidence in support of H 1 (corresponding to 95% posterior for H 1 if the prior model probabilities for H 0 and H 1 are 1/2 each), while 100 means extremely strong evidence (corresponding to 99% posterior for H 1).

Chipmunk Genomic Data
The dataset, generated and analyzed by Sarver et al. (2021), includes 1060 nuclear loci from six chipmunk species: T. rufus (R), T. canipes (C), T. cinereicollis (I), T. (a) Bayes factor expressed as the Savage-Dickey density ratio in the test of the null hypothesis H 0 : µ = 0 against the alternative hypothesis H 1 : µ = 0, using a data sample from N (µ, 1). The black and red curves represent the prior and posterior densities for µ in H 1, and the small interval (of width ε) in the parameter space for H 1 is the null interval ϕ (or interval of null effects), representing H 0 . The prior and posterior probabilities over the null interval (the gray and red areas) depend on the interval width (ε), but when ε → 0, their ratio converges to the Bayes factor B 10 = π (µ 0 )/π (µ 0 | x). If the area of null effects shrinks greatly when we move from the prior to the posterior, the data contain strong evidence against H 0 . (b) Approximate Bayes factor B 10,ε = P(/ o)/P(/ o | x) (equation (7)) plotted against ε for a dataset of size n = 100 with the sample mean x = 0.258. The prior is µ ∼ N(0, σ 2 0 ) with σ 0 = 2 (twice the sampling standard deviation). When ε → 0, B 10 = 1.381. (c) Bayes factor (equation (1) or (12)) plotted against the prior variance σ 2 0 for the same dataset showing the sensitivity of B 10 to the prior on the parameter of interest (µ). Note that in this dataset (with √ n|x| = 2.58) H 0 is rejected by the LRT with p -value 1%.

VOL. 72
umbrinus (U), T. quadrivittatus (Q), and T. dorsalis (D) (with 5, 5, 9, 10, 11, 11 individuals, respectively), as well as the outgroup T. striatus (3 individuals). We included all individuals whether or not their mtDNA was likely to be introgressed. Due to lack of a reference genome, Sarver et al. (2021) assembled genomic loci (targeted genes or exons) into contigs using an approach called Assembly by Reduced Complexity (ARC). Filters were then applied to remove missing data (contigs not present across all individuals) and sequences with likely assembly errors. The procedure generated a dataset of 1060 loci (1060 ARC contigs, Sarver et al., 2021), with sequence length ranging from 14 to 1026 bp among loci and the number of variable sites from 0.33% to 15.2%. High-quality heterozygous sites in the data, as identified by high mapping quality and depth of coverage, are represented using IUPAC ambiguity codes. They are accommodated using the analytical integration algorithm implemented in BPP (Gronau et al., 2011;Flouri et al., 2018). This takes the unphased genotype sequences as data and averages over all possible heterozygote phase resolutions, using their relative likelihoods based on the sequence alignment at the locus as weights (Huang et al., 2022).
We assigned inverse-gamma (IG) priors to parameters in the MSC model: θ ∼ IG(3, 0.002) with mean 0.001 for population size parameters and τ 0 ∼ IG(3, 0.01) with mean 0.005 for the age of the root. The shape parameter α = 3 means that those priors are diffuse, while the prior means are based on estimates from preliminary runs. Note that both θ and τ are measured in the expected number of mutations per site. The inverse gamma is a conjugate prior for θ and allows the θ parameters to be integrated out analytically, leading to a reduction of parameter space and improved mixing of the MCMC algorithm. We conducted 10 replicate MCMC runs, using different starting species trees. Each run generated 2 × 10 5 samples, with a sampling frequency of 2 iterations, after a burn-in of 16,000 iterations. Each run took about 70 hours using one thread on a server with Intel Xeon Gold 6154 3.0 GHz processors. Convergence was confirmed by consistency between runs. All runs converged to the same species tree (Fig.  4a), with ∼ 100% posterior probability, which had the same topology as the tree inferred by Sarver et al. (2021).

Stepwise Construction of the Introgression Model
As the species tree is well supported, apparently unaffected by cross-species introgression, we used the species tree to build an introgression model with multiple introgression events. Our procedure is similar to stepwise regression, the step-by-step method for constructing a regression model that involves adding or removing explanatory variables based on a criterion such as an For t-test.
Our procedure has two stages. In the first stage, we used BPP to fit a number of introgression models, each with only one introgression event, and rank candidate introgression events by their strength (indicated by the introgression probability ϕ). The analyses of Sarver et al. (2021) suggest that mitochondrial introgression affected mostly four species: T. umbrinus (U), T. dorsalis (D), T. quadrivittatus (Q), and T. cinereicollis (I). We considered introgression events involving all possible pairs among those four species, as well as another species, QI, the common ancestor of T. cinereicollis and T. quadrivittatus (Fig. 4a). The dataset of 1060 loci was analyzed under an MSci model with only one introgression event, estimating the introgression probability (ϕ) and introgression time (τ ). We assign the same inverse-gamma priors on θ and τ as above, and beta(1, 1) or U (0, 1) for the introgression probability ϕ. Two replicate runs were conducted for each analysis to confirm consistency between runs, and MCMC samples from the two runs in H 1 becomes unidentifiable at the null value ϕ 0 = 0. Here only the two species involved in introgression are shown. Including other species on the species tree adds the same set of parameters to the null and alternative hypotheses.
were then combined to produce posterior estimates of parameters. This analysis provides a ranking of the introgression events by the introgression probability. We calculated the Bayes factor for testing H 0 : ϕ 0 = 0 given by the Savage-Dickey density ratio (equation (3)), using the null interval / o = (0, 0.01) (equation (7)); use of (0, 0.001) produced virtually identical results. Only introgresssion events with B 10 ≥ 20 were considered further.
In the second stage, we added introgression events onto the binary species tree (Fig. 4a) sequentially in the order of decreasing strength (introgression probability). To reduce the computational cost and to examine the robustness of the analysis, this step was applied to two subsets of the 1060 loci: the first half and the second half, each of 530 loci. The priors used for population sizes and root age were as above. With multiple introgression events in the model, we extended the MCMC runs to be k-times as long if the model involved k introgression events. Three replicate runs were performed to check consistency between runs. Samples from the replicate runs were then combined to produce posterior summaries. At each step, the added introgression event was retained if it met the same cutoff as above in either of the two data subsets.
Our procedure produced a joint introgression model with three unidirectional introgression events. The joint model was then applied to the full dataset of 1060 loci to estimate the population parameters including introgression probabilities, introgression times, species divergence times, and population sizes (Fig. 4b), using the same prior settings. We conducted 3 replicate runs, using a burn-in of 50,000 iterations and then taking 10 6 samples, sampling every 2 iterations. Each run took 200 hrs.

Species Tree Estimation for the T. quadrivittatus Group
We analyzed the full data of 1060 loci under the MSC model without gene flow to estimate the species tree. The ten replicate runs using different starting species trees converged to the same maximum a posteriori probability (MAP) tree, with posterior probability ∼ 100% (Fig. 4a). Sarver et al. (2021) recovered the same species tree topology in their analysis of the same data using ASTRAL (Mirarab and Warnow, 2015) and SVDQUARTETS (Chifman and Kubatko, 2014), although with weaker support for some nodes, e.g., concerning the placement of T. rufus. The differences in support may be due to the fact that ASTRAL and SVDQUARTETS use summaries of the multilocus sequence data that are not sufficient  Table S3. The MSci model includes 6 species divergence times and 3 introgression times (τ ), 19 population size parameters (θ), and 3 introgression probabilities (ϕ).
VOL. 72 453 statistics, and are thus less efficient than the full likelihood method implemented in BPP (Xu and Yang, 2016;Zhu and Yang, 2021).

Stepwise Construction of the Introgression Model
In the first stage of our procedure, we fitted introgression models, each involving one introgression event, using the full dataset of 1060 loci. We considered introgression events between every contemporary pair of the five species: T. cinereicollis (I), T. dorsalis (D), T. quadrivittatus (Q), and T. umbrinus (U), and the ancestral species QI (Fig. 4a). Introgression events that passed our cutoff (B10 ≥ 20) are listed in Table 2. Introgression from QI into D had the highest probability, > 10%, while six more events had ϕ > 5%: Q → D, D → QI, QI → U, I → D, Q → I, and I → Q. We note that introgressions between Q and I, and between QI and D, were significant in both directions and the estimated introgressions times were close (Table 2). We thus replaced the two unidirectional introgression events by one bidirectional introgression in further analyses (model D in . The time of QI → U introgression was estimated to be 0.000408, very close to the species divergence time at node QIR (0.000417) (Fig. 4a), suggesting that the introgression was probably a more ancient event. Note that if an introgression event is assigned incorrectly to a daughter branch to the lineage truly involved in introgression, one would expect the estimated introgression time to collapse onto the species divergence time. We thus attempted to place the introgression onto more ancient ancestral branches on the species tree (Fig. 4a) and finally identified the lineage involved in introgression to be the ancestral species QIRCD. The QIRCD → U introgression had an estimated time that was away from the species divergence times, and the estimated introgression probability (62%) was the highest (Table  2).
In the second stage, we added introgression events identified in Table 2 onto the binary species tree of Fig.  4a, in the order of their introgression probabilities (Table  S1). This was applied to two data subsets (the full data split into two halves). While our procedure allows introgression events already in the model to drop out when new introgressions are added to the model, this did not happen in the analysis of the Tamias dataset. Instead the most important introgression events identified in stage 1 remained to be most important in the joint introgression models constructed in stage 2. Note that multiple introgression events may not be independent. An introgression event significant in stage 1 may not be significant anymore when other introgression events are already included in the model. For example, when the QI → D introgression was already included in the model, none of the introgressions Q → D, D → QI, I → D and I → Q was significant. Those introgressions may be expected to lead to similar features in the sequence data, such as reduced sequence divergences between Q or I and D. Similarly, introgression probability for an introgression event often became smaller when other introgressions were added in the model. However, the opposite may occur as well. For example, ϕ QIRCD→U was estimated to be 54-63% when this was the only introgression assumed in the model, but increased to 59-69% when other introgression events were added in the model (Table S1).
Results for the two data subsets were largely consistent, especially concerning introgression events with high introgression probabilities. We thus arrived at a joint introgression model with three unidirectional introgression events (Fig. 4b, Table S1).  Fig. 4a is used, with a single introgression event assumed in each analysis. The full dataset of 1060 loci is analyzed using bpp to estimate the introgression probability (ϕ) and the introgression time (τ ), together with the species divergence times (τ ) and population sizes (θ) on the species tree. Introgression events with B 10 < 20 (D → U and below) are not considered further in the stepwise approach of constructing the joint introgression model. The three introgression events that are selected in the joint introgression model are marked with asterisks. Bayes factor B 10 = ∞ occurs if all ϕ values in the MCMC sample are > ε = 1%.

Estimation of Introgression Probabilities and Species Divergence/Introgression Times
Finally, we fitted the joint introgression model of Fig.  4b to the full data of 1060 loci, as well as the two halves, with parameter estimates shown in Table S3. The fitted model is very parameter-rich, partly as we assign different θ parameters for different branches on the species tree: for example, branch Q in Figure 4b is broken into two segments by the introgression event, Q → I, which are assigned two independent θ parameters. As a result, population sizes for ancestral species tend to be poorly estimated, especially for those populations with a very short time duration. These patterns are consistent with simulation studies that examine the information content in multilocus datasets (Huang et al., 2020).
The estimated introgression probabilities from the full data are 0.625 with the 95% highest probability density (HPD) credibility interval (CI) to be (0.442, 0.794) for ϕ QIRCD→U , 0.106 (0.074, 0.139) for ϕ QI→D , and 0.050 (0.028, 0.074) for ϕ Q→I . The introgression probability ϕ QIRCD→U involved considerable uncertainty, with a large CI, possibly because the introgression is ancient and is between sister species, making it hard to estimate its strength, so that the dataset of 1060 loci may be too small.
We evaluated the impact of the prior for ϕ on parameter estimation in the analysis of the full dataset, using α = 0.2, 1, 5 and β = 0.2, 1, 5 in the prior ϕ ∼ beta(α, β) (Fig. 5). The prior had some effects on ϕ QIRCD→U , with the prior mean being more important than the prior variance. Under beta(0.2, 5) with the prior mean 0.0385, the posterior mean was lower, and the CI wider. Under beta(5, 0.2) with the prior mean 0.961, the posterior mean was higher, and the CI narrower. However, the posterior CIs overlapped considerably among the different priors, and overall the impact of the prior for ϕ on the estimate of ϕ QIRCD→U was minor. Estimates of ϕ QI→D and ϕ Q→I were insensitive to the prior used (Fig.  5).
Accommodating gene flow in the model had significant impacts on estimation of the time of divergence between species involved in gene flow (Figs. 4 and 6). While estimates of times for the recent divergences (τ QI , τ QIR , τ QIRC , and τ QIRCD ) were nearly identical between the MSC model ignoring gene flow and the MSci model incorporating gene flow, the estimated age of the T. quadrivittatus clade (τ QIRCDU ) was much greater under MSci than under MSC (Fig. 6). This can be explained by the fact that the MSC model ignored the QIRCD → U introgression, which had introgression probability 62.5%. Note that sequence divergence between any pair of species X and Y has to be older than species divergence (tXY > τ XY), and as a result, the minimum (rather than average) sequence divergence dominates the estimate of species divergence time. If gene flow is present between species and is ignored in the model, the reduced sequence divergence due to gene flow will be misinterpreted as recent species divergence, leading to underestimation of species divergence time. This effect has been noted in previous simulations (Leaché et al., 2014).
The estimated age of the root of the species tree (τ QIRCDUS ) was slightly smaller under MSci than under MSC. However, τ QIRCDUS is negatively correlated with the population size (θ QIRCDUS ) so that both parameters have large uncertainties (Burgess and Yang, 2008). Sullivan et al. (2014, Fig . 1) used the minimum divergence time of 7 Ma for the outgroup species T. striatus, based on fossil teeth thought to belong to Tamias found in the late Miocene, reported by Dalquest et al. (1996), to date the T. quadrivittatus clade to 1.8 Ma in a maximum-likelihood concatenation analysis of four nuclear genes, and to 1.2 Ma (with 95% CI 0.6-2.2) in a *BEAST (Heled and Drummond, 2010) analysis of the same data. Concatenation analysis is known to be biased as it does not accommodate the stochastic variation of gene tree topologies and divergence times among loci due to the coalescent process (Ogilvie et al., 2017). We used the same calibration to rescale the estimates of τ under the MSC and MSci models (Fig. 4). The minimum age for the T. quadrivittatus clade was 1.9 Ma (with 95% HPD CI to be 1.8-2.0) under the MSC model, comparable to the *BEAST estimate under the same model (Fig. 4a). Under the MSci model, the estimated minimum age was 4.1 Ma (with CI be 3.2-5.1) (Fig. 4b), much older than the estimates under the MSC model without gene flow. Note that here the CIs accommodate the uncertainty due to finite amounts of sequence data but not uncertainties in the fossil calibration.

Model Assumptions Underlying HYDE
Whereas the analyses of nuclear data by Sarver et al. (2021) using HYDE detected no significant signal of introgression at all, our BPP analyses of the same data revealed strong evidence of multiple introgression events, involving both sister and nonsister species (Fig.  4b). To understand the opposing conclusions reached in the two analyses, here we examine the model assumptions underlying HYDE. We then use simulation to compare the performance of HYDE and BPP under conditions that are representative of the Tamias data but may violate the assumptions of HYDE.
HYDE was developed under the hybrid-speciation model of Fig. 7a, with τ S = τ X = τ T , and θ S = θ T (Blischak et al., 2018). Formulated for quartet data, with one sequence from each of the four species, it uses the VOL. 72 455 counts or frequencies of three parsimony-informative site patterns: iijj, ijji, ijij, to estimate the genetic contributions of the two parental species to the hybrid species: ϕ and 1 − ϕ. Here pattern ijkl means a site with nucleotides i, j, k, l in O, P 1 , H, P 2, respectively (Fig. 7a). Under this model, the probabilities of gene trees and site patterns are both given by a mixture over the two binary species trees S 1 and S 2 (called parental species trees), with mixing probabilities ϕ and 1 − ϕ (Fig. 7b and c). Given species tree S 1, the matching pattern iijj has a larger probability (say, a) than the other two mismatching patterns (each with probability b, say, with b < a). Given species tree S 2, the matching pattern ijji has probability a while the two mismatching patterns have b each. The symmetry assumptions (τS = τ T and θ S = θ T) ensure that a, b for tree S 1 are equal to a, b for S 2. By averaging over the two species trees, the site-pattern probabilities under the hybridization model are given as Figure 5. Posterior means and 95% HPD CIs for the three introgression probabilities (ϕ) obtained from BPP analyses of the full data of 1060 loci using different beta priors, ϕ ∼ beta(α, β).
Setting those probabilities to the observed frequencies (p) and eliminating a and b from the system of equations gives the estimatê This is equation (3) by Blischak et al. (2018), although the derivation here is simpler than that of Kubatko and Chifman (2019). Note that the theory works if τ S = τ T > τ X and θ S = θ T, so that the method may be used under model A of Flouri et al. (2020, Fig . 1) with the symmetry assumption. The null hypothesis of no hybridization/introgression ( H 0 : ϕ = 0) can be tested by applying a normal approximation to the site-pattern counts (Kubatko and Chifman, 2019). To see which of the two assumptions (τS = τ T and θ S = θ T) has more impact, note that a change in τ is comparable with the same amount of change in 2/θ. Coalescent may occur in population RS (if the H sequence takes the left parental path in the model of Fig. 7a), at the rate 2/θ S over time period τ R − τ S, and it may occur in population RT (if the H sequence takes the right parental path), at the rate 2/θ T over time period the probability of coalescent (given that two sequences enter populations S or T) will be the same in the two populations. However, the probabilities of the site patterns depend on the time of coalescent as well as its occurrence. Thus for equation (9) to be valid, both the rates and the times have to be identical: τ S = τ T and θ S = θ T.
Note that HYDE or the D-statistic cannot be used to infer gene flow between sister lineages. One might think that HYDE or D could be applicable if two sequences were sampled from the recipient lineage to form a quartet. However this is not the case. With ancient introgression, the two sequences from the same lineage are interchangeable and have the same average genomic distance to the outgroup sequence. Suppose P 1 and H in Fig. 7a are two sequences from the same lineage. Then site patterns iijj and ijij will have the same probability even if ϕ > 0.

Simulations to Examine the Performance of HYDE
Our examination of assumptions underlying HYDE suggests that HYDE may not be suitable for testing gene flow in the Tamias data. The strongest introgression in the Tamias data detected using BPP was between sister species, with ϕ QIRCD→U = 0.625 (Fig. 4b). This is unidentifiable by HYDE. The next introgression involved outflow with ϕ QI→D = 0.106, whereas HYDE assumes inflow. The third introgression was again between sister species, with ϕ Q→I = 0.050. To verify those expectations and to explore the performance of HYDE and BPP under different scenarios of gene flow, we conducted simulations using four different model settings (Fig.  8a-d), based on parameter estimates obtained from the Tamias data (Fig. 4b, Table S3). Gene trees and sequence alignments at multiple loci were generated using the simulate option of BPP. HYDE analysis was conducted using PAUP (Swofford, 2003). The data were also analyzed using BPP. The results are summarized in Fig. 9.
Model a (Fig. 8a) assumes gene flow between sister lineages, based on the introgression event from QIRCD → U in the Tamias data (Fig. 4b). It was suggested that by including multiple sequences from the recipient lineage, HYDE or the D-statistic might be used to detect gene flow between sister lineages. We used species R and U, with introgression rate ϕ R→U = 0.625, including two sequences (Ua and Ub) from the recipient species U, while S was used as the outgroup. The divergence times (τ ) and population sizes (θ) were based on the real data (Table S3). When multiple branches in the full  (Fig. 4b) were merged into one branch in the tree of Fig. 8a, θ for the merged branch was calculated as a weighted average, with the branch lengths as weights. As our objective in this case was to confirm the lack of power of HYDE (and the D-statistic), we simulated large datasets, each with L = 8000 loci. The sequence length was 500 sites, and the number of replicates was 100. When the data were analyzed using HYDE and the D-statistic, the quartet tree (((Ua, Ub), R), S) was used, with Ua or Ub labeled the "hybrid" lineage. The same data were analyzed using BPP under the MSci model with three species (Fig. 8a).
As expected, HYDE and the D-statistic had no power to detect gene flow between sister lineages: indeed, the power of HYDE and D was not higher than the significant level (Fig. 9, Table S4). Note that a test that ignores data and produces 5% positives at random will have 5% of power. Also HYDE did not produce reliable estimates of ϕ; in about half of the datasets, the estimate was outside the range (0, 1).
Model b (Fig. 8b) was based on the next strongest introgression in the Tamias data, with ϕ QI→D = 0.106 (Fig. 4b). We used species D, Q, R, with S as the outgroup. This is a case of outflow, when gene flow from Figure 8. Introgression models (species trees with introgression) used for simulating data to evaluate the performance of HYDE and BPP. (a) Species tree for three species (R, U, and S) with R → U introgression at the rate of ϕ = 0.625, and with S to be the outgroup, based on BPP estimates from the Tamias data (Fig. 4b, Table S3). Population sizes (θ) are next to the branches and species divergence times (τ ) are next to the nodes. Two sequences are sampled from species U. When the data are analyzed using HYDE, either Ua or Ub is specified as the hybrid lineage. (b) Outflow model for three species (D, Q, R), with S to be the outgroup, with introgression from Q to D at the rate ϕ = 0.106 (Table S3). (c) Inflow asymmetrical model for three species, with asymmetrical divergence times and population sizes. (d) Inflow symmetrical model for three species, with τ M = τ QR and θ M = θ QR (see Fig. 7a). Note that only model (d) matches the assumption of HYDE. Figure 7. (a) HYDE assumes a hybrid-speciation model with the additional assumption of equal population sizes, or a symmetrical inflow model, with τ S = τ T and θ S = θ T (Blischak et al., 2018). (b, c) Two parental species trees S 1 and S 2 induced by the hybridization model of (a). Site patterns are a mixture over the two species trees. an ingroup species Q to a more distant species D. Our examination of the assumptions made by HYDE suggests that HYDE can be used to detect inflow, but not outflow. We generated datasets of various sizes with L = 500, 2000, or 8000 loci. The other settings were the same as for model a. When the data were analyzed using HYDE, Q was designated the "hybrid" lineage while R and D were the two parents. HYDE performed poorly (Fig. 9b), with very low power and frequent invalid estimates of ϕ (Table S5).
Model c (Fig. 8c) was the same as model b but the direction of gene flow was reversed. The model was then a case of inflow, as assumed by HYDE. However, species divergence times and population sizes did not satisfy the symmetry requirements of HYDE (in other words, τ M = τ QR and θ M = θ QR ). In this case, HYDE had considerable power in detecting gene flow (Fig. 9c). However, the estimates of ϕ by HYDE involved large biases, apparently converging to ≈ 0.32 when the true value was 0.106 (Table S5). This positive bias is apparently because coalescent occurs at a higher rate or over longer time period on the M branch than on the QR branch in Fig. 8c In the opposite case, the bias should be negative.
Model d (Fig. 8d) was the same as model c with inflow but in addition we enforced the symmetry assumptions, so that species Q was a hybrid species formed by hybridization between D and R. This is the hybrid-speciation model assumed by HYDE, and the method performed well (Fig. 9d). Its power was lower than that for BPP, as expected from statistical theory, but improved with the increase of data, rising from 10% at L = 500 loci to 90% at 8000 loci. The parameter estimate appeared to be consistent, converging to the correct value (0.106) when the number of loci increased, and there were not many invalid estimates (Table S5). Those results are consistent with previous simulations, which evaluated the performance of HYDE when all its assumptions were met and found the method to perform well (Blischak et al., 2018;Flouri et al., 2020).
In summary, our simulations suggest that it is important to apply HYDE to detect the correct mode of gene flow (that is, gene flow between nonsister lineages, and inflow instead of outflow) (Fig. 8d). Furthermore, the symmetry assumptions are important for HYDE to produce reliable estimates of introgression probability. When all model assumptions are met, HYDE performed well. However, HYDE had no power to detect gene flow between sister lineages, and very low power to detect outflow.
In all four models (Fig. 8a-d), the Bayesian test using BPP had good power (Fig. 9, Tables S4 and S5). Furthermore, the posterior means and 95% HPD CIs for parameters in the introgression models b-d were well behaved (Fig. 10). While HYDE can estimate only two parameters from the site-pattern counts (the internal branch length in coalescent units on the species tree and the introgression probability), the BPP analysis of the same data estimates all parameters in the model. The species divergence/introgression times were all well estimated with small CIs (Fig. 10). The introgression probability was accurately estimated with narrow CIs when ≥ 500 loci were used. Population size parameters for short branches were poorly estimated due to lack of coalescent events in those populations.
We also examined the false positive rate (type I error rate) of the HYDE and Bayesian tests, by simulating data using the inflow-asym (Fig. 8c) and inflow-sym (Fig.  8d) models but with ϕ = 0 fixed so that there was no introgression in the true model. The results are summarized in Table 3. Under the inflow-asym model, HYDE had higher false positive rate than the nominal significant level. For example, at the 5% significance level, the false positive rate was 7%, 13%, and 7% in datasets of 500, 2000, and 8000 loci, respectively. The high rate may be explained by the violation of the symmetry assumptions for HYDE. Under the inflow-sym model (or the HYDE model), the rate was 3%, 2%, and 3%, all within the allowed 5% (Table 3). Thus HYDE performed well when its assumptions were met and had elevated false positives when the assumptions were violated. In all settings, the false positive rate of the Bayesian test was estimated to be ∼ 0%. This is consistent with the expectation that the Bayesian test may be more conservative (with lower false positive rate and lower power) than the LRT (see discussions later).
Finally, to assess the information content in datasets of the size of the Tamias data, we used parameter estimates from the full dataset (Fig. 4b, Table S3) to simulate two datasets of the same size as the original, with 5, 5, 9, 10, 11, 11, 3 unphased sequences per locus for species R, C, I, U, Q, D, and S, respectively. The sequence length was 200 sites. We analyzed the datasets under the same MSci model of Fig. 4b using BPP to estimate all parameters. The estimates from the two datasets were similar, so we present those from one of them in Table S3. At this data size, BPP achieved relatively good precision and accuracy. The posterior means were close to the true values, and the CIs were also similar to those calculated from the real data. Similarly to analyses of the real data, divergence times and population sizes for modern species were well estimated, but ancestral population sizes, in particular those for populations of short time duration, were more poorly estimated.

Criteria for Testing Gene Flow
Hypothesis testing or model selection involves arbitrariness, and classical hypothesis testing and Bayesian model selection applied to the same data may produce strongly opposed conclusions, a situation known as Jeffreys's paradox (Jeffreys, 1939;Lindley, 1957). Furthermore, Bayesian model selection is known to be sensitive to priors on model parameters, especially on parameters that are not shared between the models under comparison. See Yang (2014, pp. 194-7) for a discussion of those issues. Here we review different strategies for testing, using as example a simple problem of testing the null hypothesis H 0 : µ = 0 against the alternative H 1 : µ = 0, using a data sample, , from the normal distribution N (µ, 1). We assume that a false positive error (of falsely rejecting H 0 when it is true) is more serious than a false negative error (of failing to reject H 0 when it is false). The data can be summarized as the sample mean x, with the likelihood given by x ∼ N (0, 1/n) under H 0 and x ∼ N (µ, 1/n) under H 1. Let φ x; µ, σ 2 be the probability density function (PDF) for N µ, σ 2 and Φ ( ) be the CDF for N (0, 1).
In hypothesis testing, the P-value can be calculated from the fact that under H 0 , √ n|x| ∼ N (0, 1) or n|x| 2 ∼ χ 2 1 . At the α = 5% significance level, we reject Alternatively one may consider this as an estimation problem and construct a confidence interval (CI) for µ and reject H 0 if the CI excludes the null value 0. This is equivalent to the LRT.
In a Bayesian analysis, we consider two approaches. The first is to examine whether the posterior 95% credibility interval (CI) for µ under H 1 excludes the null value 0. We assign the prior µ ∼ N 0, σ 2 0 under H 1. The posterior is then µ|x ∼ N(µ 1 , σ 2 1 ), with mean µ 1 = nx/(n + 1/σ 2 0 ) and precision 1/σ 2 1 = n + 1/σ 2 0 . Here the reciprocal of variance is known as precision. The sample precision is n and the prior precision is 1/σ 2 0 , while the posterior precision is the sum of the two. The 95% CI for µ is given as µ 1 ± 1.96σ 1 so that the CI excludes 0 (in which case we reject H 0 ) if |µ 1 | > 1.96σ 1, or if n|x| 2 > 3.84 1 + 1/(nσ 2 0 ) . (11) Figure 10. Posterior means and 95% HPD CIs for parameters in the three introgression models of Fig. 8: (b) outflow asym, (c) inflow asym and (d) inflow sym (HYDE model), in BPP analyses of 100 replicate datasets, each with 500, 2000, or 8000 loci. Note that in model (d) inflow sym, all populations had the same size (θ) although separate θ parameters were estimated for different populations when the data were analyzed using BPP. Parameters τ and θ are multiplied by 10 3 . The number above the CI bars is the coverage or the probability that the CI includes the true value.

461
The second approach is to use the Bayes factor to compare the null and alternative hypotheses (e.g., Yang, 2006, eq. 5.21).
The Bayes factor is closely related to (and 'calibrated' using) the posterior model probability. If the two models are assigned equal prior probabilities (π0 = π 1 = 1/2), the posterior model probability is so that a 95% cutoff on P (H 1 | x) corresponds to B 10 = 19, and H 0 is rejected based on the Bayes factor if and only if n|x| 2 > log 19 » 1 + nσ 2 0 × 2 1 + 1/(nσ 2 0 ) . (14) While the LRT (equation (10)) depends on √ n|x| only, both the posterior CI (equation (11)) and the Bayes factor (equation (14)) depend in addition on nσ 2 0 . Note that the three criteria (equations (10), (11), and (14)) have the ordering 3.84 < 3.84 1 + 1/(nσ 2 0 ) < log 19 Thus the LRT has more power and higher false positive rate than the posterior CI while the Bayesian test based on the Bayes factor is the most conservative. The result reflects the general perception that the LRT tends to reject the null hypothesis and favor parameter-rich models too often, especially in large datasets. Note that if H 0 is true, the false positive rate of the LRT stays at 5% when the sample size n → ∞, whereas in the Bayesian analysis, the true model H 0 will dominate, with P (H 0 | x) → 1 and B 10 → 0 when n → ∞.
Example calculations are given in Table 4 for two datasets with √ n|x| = 1.96 or 2.58 and n = 100. In both datasets, H 0 is rejected by the LRT (at the 5% and 1% levels, respectively), but the Bayes factor and the posterior model probabilities favor H 0 over H 1, with B 10 < 1 and P (H 1 | x) < 1/2. This analysis suggests that the difference in power between HYDE and BPP are due to the inefficient use of information in the data by HYDE, not to the different statistical philosophies. An LRT for testing introgression applied to the multilocus sequence alignments may be expected to have more power (and higher false positive rate) than the Bayesian test based on the Bayes factor.

The Power of Heuristic and Likelihood Methods to Detect Introgression
When applied to the Tamias dataset, HYDE and BPP produced opposite conclusions concerning gene flow. Our examination of the model assumptions for HYDE and our simulations suggest that this is because gene flow with the strongest signal in the Tamias group, either between sister species or involving outflow, may be of the wrong type or in the wrong direction for HYDE. Here we review and summarize the major issues with HYDE.
First, both HYDE and the D-statistic pool sites across loci when counting site patterns, so that the site-pattern counts are genome-wide averages. Cross-species gene flow creates genealogical variation across the genome, with the probabilistic distribution of the gene trees and coalescent times specified by parameters in the MSC model with gene flow, such as species divergence times, population sizes, and rates of gene flow (Barton, 2006;Lohse and Frantz, 2014). As a result, there is important information concerning gene flow in the variance of site-pattern counts among loci, but this information is ignored by those methods. In other words, sites at the same locus share the genealogical history under the assumption of no within-locus recombination (see Zhu et al., 2022 for an evaluation of the impact of this assumption on MSC-based analyses), and their differences reflect the stochastic fluctuation of the mutation process. Sites at different loci in addition may have different genealogical histories, reflecting the stochastic nature of the process of coalescent and introgression. When sites are pooled across loci, those two sources of variation are confounded, leading to loss of information (Shi and Yang, 2018; Zhu and Yang, 2021). As a consequence, certain forms of introgression, such as introgression between sister lineages, are unidentifiable by D or HYDE, while estimation of introgression rates between nonsister species suffers from larger variances (Jiao et al., 2021). Second, HYDE makes restrictive assumptions about gene flow. The underlying model is one of hybrid speciation with identical population sizes or equivalently the inflow model with symmetrical species divergence times and population sizes (Fig. 7a, with τ S = τ T and θ S = θ T) (Blischak et al., 2018;Kubatko and Chifman, 2019). Our simulation suggests that HYDE can indeed infer gene flow/hybridization and produce reliable estimates of introgression probability under this model ( Fig. 9d; Table S5; see also Blischak et al., 2018;. However, introgression in the wrong direction or violation of the symmetry assumptions may lead to loss of power and biased or invalid estimates by HYDE (Fig. 9b, c, Table S5).
Third, the approaches taken by HYDE to accommodate multiple samples per species and heterozygote sites in diploid genomes may be problematic. When multiple samples are available in the species quartet, HYDE counts site patterns in all combinations of the quartet. Let the numbers of sequences for species O, P 1 , H, P 2 be n O , n 1 , n H , n 2. There are then n O × n 1 × n H × n 2 combinations in which one sequence is sampled per species, and HYDE counts site patterns in all of them (Blischak et al., 2018). This ignores the lack of independence among the quartets and exaggerates the sample size. At the same time, multiple samples from the same species are never compared with each other, which should provide important information about the population size for that species. In a likelihood method such as BPP, all sequences at the same locus, both from the same species and from different species, are related through a gene tree, and genealogical information at the locus is used.
Similarly heterozygote sites are not treated properly in HYDE. If the site pattern is AGRG, with R representing an A/G heterozygote, HYDE adds 0.5 each to the site patterns ijjj (for AGGG) and ijij (for AGAG) (Blischak et al., 2018), in effect treating R as an unknown nucleotide that is either A or G whereas correctly it means a heterozygote (both A and G). The proportion of heterozygotes in each diploid genome should be informative about θ for that population, but such information is not used by HYDE. In BPP, heterozygote sites are resolved into their underlying nucleotides using an analytical integration algorithm (so that R means both A and G, say), with the uncertainty in the genotypic phase of multiple heterozygous sites in a diploid sequence accommodated by averaging over all possible heterozygote phase resolutions, weighting them according to their likelihoods based on the sequence alignment at the locus (Gronau et al., 2011;Flouri et al., 2018). Simulations suggest that this approach has nearly identical statistical performance to using fully phased haploid genomic sequences (Gronau et al., 2011;Huang et al., 2022).
In this paper, we have focused on the heuristic method HYDE and the likelihood method BPP, as they have been used to analyze the Tamias data. By choosing parameter values to be representative of the Tamias data, our simulation has evaluated a tiny portion of the parameter space and does not constitute a systematic evaluation of the performance of HYDE. The strengths and weaknesses of heuristic and likelihood methods for inference under models of gene flow were discussed by Degnan (2018) and Jiao et al. (2021), but a comprehensive comparative study has not yet been conducted. For estimation of the species phylogeny under the MSC without gene flow (Zhu and Yang, 2021, Fig. 3) demonstrated a dramatic information loss resulting from pooling sites across loci in the site-pattern based methods (also known as coalescent-aware concatenation methods), and from the failure to use information in coalescent times or gene-tree branch lengths in the two-step methods (which infer the gene trees and then treat them as data to infer the species tree). Both the site pattern-based and the two-step methods are used to infer gene flow and to estimate the introgression probability (e.g., HYDE and the D-statistic in the first category and SNAQ in the second) and similar information loss may be expected. A detailed analysis of the performance of heuristic methods in comparison with likelihood methods will be interesting. Currently, the gap between the heuristic and likelihood methods appears to be a large one. Heuristic methods are orders-of-magnitude more efficient computationally and can be applied to much larger datasets, whereas likelihood methods have far better statistical properties, being able to identify and estimate all parameters in the model. There are great opportunities for improving both the statistical Note: The Bayes factor B 10 is calculated assuming data size n = 100 in equation (12), while the posterior model probability is given by equation (13). Note that the P-value for the LRT is 5% (or 1%) in the dataset with √ n|x| = 1.96 (or 2.58).
VOL. 72 performance of heuristic methods and the computational efficiency of likelihood methods (including the mixing efficiency of MCMC algorithms).

Introgression in T. quadrivittatus Chipmunks
The joint introgression model for the T. quadrivittatus group (Fig. 4b) was constructed using a stepwise approach that iteratively adds introgression events to the binary species tree. We note several limitations with this approach. First the approach assumes the availability of a stable binary species tree, and may not be feasible if the species tree is large and highly uncertain, possibly influenced by introgression events (Leaché et al., 2014). The Tamias dataset analyzed here includes only six species, and the first stage of our procedure (i.e., the separate analysis) involved 16 possible introgression events, so that the computation was feasible. Second, the approach is not an exhaustive search in the space of introgression models and may miss certain introgression events. Note that introgression events not selected in the first stage of the procedure will not be incorporated in the final joint introgression model. In our analysis of the Tamias data, we considered introgressions between contemporary species, mostly based on phylogenetic analyses of the mitochondrial genome (Sarver et al., 2017), and moved certain events to older ancestral branches when the estimated introgression time coincided with the species divergence time. We did not evaluate introgressions involving ancestral branches systematically. Furthermore, the criterion based on the Bayes factor used in our test is a stringent one, and the dataset of 1060 loci is relatively small. All those factors suggest that we cannot rule out the possibility that we may have missed some introgression events; in other words, our analysis may suffer from false-negative errors. In contrast, the three introgression events identified in our analysis (Fig. 4b) appear to be robust and are unlikely to be false positives (Fig. 5, Table S2). We conclude that there is strong and robust evidence that gene flow has affected the nuclear genome in the T. quadrivittatus group of chipmunks.
Given the extensive mitochondrial introgression in the Tamias group (Sullivan et al., 2014;Sarver et al., 2017Sarver et al., , 2021, introgression affecting the nuclear genome was expected, and the failure to detect any significant evidence for it in the HYDE analysis was surprising (Sarver et al., 2021). Sarver et al. (2021) discussed the evidence for cytonuclear discordance in the pattern of introgression (Bonnet et al., 2017;McElroy et al., 2020;Sarver et al., 2021), as well as possible roles of purifying selection affecting the coding genes or exons that make up the nuclear dataset being analyzed. Our results suggest a simpler explanation, that gene flow in the Tamias group is of a wrong type or in the wrong direction, undetectable by HYDE.
Our analyses suggest that species involved in excessive mitochondrial introgression tend to be those involved in nuclear introgression as well. T. dorsalis was noted to be a universal recipient of mtDNA from other species (Sullivan et al., 2014;Sarver et al., 2017). Consistent with this, our separate analysis (Table 2) identified three introgression events into T. dorsalis with ϕ > 5% as well as one event with T. dorsalis to be the donor species, even though some of those events become non-significant after introgression involving older ancestors was incorporated in the model. It will be interesting to use expanded datasets to examine whether this is due to a lack of power to detect gene flow or a genuine lack of gene flow.
It will be very useful to generate more genomic data, especially the noncoding parts of the nuclear genome, including more species from the genus, to provide more power for detecting gene flow and estimating introgression rates. It will also be interesting to examine whether the noncoding and coding regions of the genome give consistent signals concerning species divergences and cross-species gene flow, and to examine how the effective rate of gene flow vary among chromosomes or across genomic regions. In a few genomic analyses, coding and noncoding parts of the genome were found to produce highly consistent results, with nearly proportional estimates of divergence times (τ ) and population sizes (θ), and with very similar estimates of introgression rates (Shi and Yang, 2018;Thawornwattana et al., 2018Thawornwattana et al., , 2022. One can also examine the posterior distribution of the gene trees to identify loci or genomic segments that are most likely to have been transferred across species boundaries, and to correlate with the functions of genes residing in or tightly linked to the segments.