Divergence-Based Introgression Polarization

Abstract Introgressive hybridization results in the transfer of genetic material between species, often with fitness implications for the recipient species. The development of statistical methods for detecting the signatures of historical introgression in whole-genome data has been a major area of focus. Although existing techniques are able to identify the taxa that exchanged genes during introgression using a four-taxon system, most methods do not explicitly distinguish which taxon served as donor and which as recipient during introgression (i.e., polarization of introgression directionality). Existing methods that do polarize introgression are often only able to do so when there is a fifth taxon available and that taxon is sister to one of the taxa involved in introgression. Here, we present divergence-based introgression polarization (DIP), a method for polarizing introgression using patterns of sequence divergence across whole genomes, which operates in a four-taxon context. Thus, DIP can be applied to infer the directionality of introgression when additional taxa are not available. We use simulations to show that DIP can polarize introgression and identify potential sources of bias in the assignment of directionality, and we apply DIP to a well-described hominin introgression event.


Introduction
Hybridization is an influential evolutionary force (Stebbins 1969) that is widespread in natural populations (Yakimowski and Rieseberg 2014;Mallet et al. 2016). Through backcrossing to parental populations, hybrids can serve as bridges for the transfer of alleles and adaptive traits between species or populations, a process known as introgression (Rieseberg and Soltis 1991;Rieseberg et al. 1996;Green et al. 2010;Dasmahapatra et al. 2012;Mallet et al. 2016;Suarez-Gonzalez et al. 2016). Whole-genome sequences and advances in phylogenetic methods (Soltis and Soltis 2003) have revealed signatures of historical introgression in scientifically and economically important groups, including well-studied examples in Neanderthals and non-African human populations (Green et al. 2010;Kuhlwilm et al. 2016). Several methods have been developed to identify taxa that exchanged genes during introgression (Huson et al. 2005;Than et al. 2008;Green et al. 2010;Durand et al. 2011;Liu et al. 2014;Martin et al. 2015;Pease and Hahn 2015;Stenz et al. 2015;Rosenzweig et al. 2016). Although these methods generally perform well across a variety of biological and experimental scenarios (Zheng and Janke 2018), theoretical and empirical studies have identified conditions under which each method is susceptible to bias (Eriksson and Manica 2012;Rosenzweig et al. 2016).
One challenging aspect of analyzing introgression is to identify taxa serving as donors versus recipients of genetic material during introgression (i.e., introgression directionality). If hybrids successfully backcross to both parents, alleles will move in both directions, meaning each parent will serve as donor for some introgressed loci and recipient for other loci. However, if backcrosses with one parent but not the other are favored by physiological (Rieseberg and Soltis 1991), selective (Orive and Barton 2002), or biogeographical (Currat et al. 2008) factors, it can lead to asymmetrical (Barton and Hewitt 1985) movement of alleles (directional introgression, denoted hereafter with )). Introgression has been shown to underlie the transfer of adaptive traits to recipient lineages (Whitney et al. 2006;Dasmahapatra et al. 2012;Dannemann et al. 2016;Figueir o et al. 2017), so the ability to infer the directionality of introgression (i.e., polarize introgression) is essential in order to form hypotheses about functional and adaptive consequences.
The majority of tests to detect the occurrence of introgression do not explicitly polarize directionality (Zheng and Janke 2018), and those that can only do so in certain cases. For example, the D-statistic (Green et al. 2010) is widely used to infer instances of introgression in a four-taxon system. Introgression polarization is possible under D only when data for a fifth taxon are available (Green et al. 2010;Eaton and Ree 2013;Eaton et al. 2015;Pease and Hahn 2015). Moreover, the fifth taxon must be sister to one taxon involved in introgression but cannot itself be involved in introgression. (Pease and Hahn 2015) define this specific configuration of introgressing taxa and sister taxa as "intergroup" introgression and describe how, when these specific five-taxon conditions are met, the branching order of introgressed gene trees indicates directionality. However, the authors also describe how other types of introgression (e.g., "ancestral" introgression) cannot be polarized. Moreover, there are many cases in which a fifth taxon with the required phylogenetic placement is either not sampled or does not exist. In these cases, it is possible to statistically identify introgression using existing methods but not necessarily to polarize introgression. Thus, there is a need for a more widely applicable statistical method to distinguish between bidirectional and unidirectional introgression, while identifying donor and recipient taxa.
Here, we describe and test a method for inferring directionality of introgression from genome-scale data, which we refer to as divergence-based introgression polarization (DIP). DIP is based on the observation that, when introgression occurs, it alters not only the level of nucleotide sequence divergence between the two species exchanging genes (Rosenzweig et al. 2016) but also divergences with related species that are not directly involved in introgression; these changes occur in systematic and predictable ways according to the directionality of introgression ( fig. 1) (Forsythe et al. 2018;Fontaine et al. 2015;Hibbins and Hahn 2019). DIP is calculated from pairwise sequence divergence between taxa involved in introgression and a sister taxon, comparing divergence values obtained from introgressed loci versus nonintrogressed loci. It takes as input the same types of data used to infer introgression by existing methods (whole-genome/ chromosome alignments or single-gene alignments of loci sampled throughout the genome). However, unlike most existing methods, DIP is applicable to cases in which only four taxa are sampled, thereby expanding inference of introgression directionality to a broader scope of evolutionary histories.
We present tools to implement the DIP method: https:// github.com/EvanForsythe/DIP. We also simulate whole-genome alignments in which a subset of loci was introgressed either unidirectionally, asymmetrically, or symmetrically. We use these simulated genome alignments to assess how accurately DIP polarizes asymmetrical introgression and to investigate the effects of parameters that are known to affect existing introgression inference methods, such as the proportion and timing of introgression (Durand et al. 2011;Martin et al. 2015;Zheng and Janke 2018). We have recently used the principles of DIP to document asymmetrical introgression among Brassicaceae species (Forsythe et al. 2018), and here, we also apply DIP to empirical data from modern and archaic hominins.

New Approaches
Introgression alters levels of sequence divergence between taxa, and these changes can differ depending on directionality (Forsythe et al. 2018;Hibbins and Hahn 2019) ( fig. 1). Although several statistics focus on the effects of introgression on sequence divergence between species involved in introgression (Feder et al. 2005;Joly et al. 2009;Rosenzweig et al. 2016), here, we describe how patterns of sequence divergence in a taxon that is sister to those involved in introgression can be indicative of the directionality of introgression. To define the properties of a divergence-based introgression test, we use hypothetical species P1, P2, P3 and an outgroup, O. Species P1 and P2 are sister within the species tree, and we model introgression between species P2 and P3. We denote the timing of the three successive speciation events among these taxa as T c , T b , and T a and the timing of the introgression event between P2 and P3 as T INT ( fig. 1A). When introgression has occurred between P2 and P3, some loci will reflect a history of introgression, whereas other loci will reflect a history of speciation. In applying DIP, a gene tree is inferred for each locus, and the resulting topology is used to distinguish  (Hudson 2002). (B) A gene tree depicting a gene that was introgressed P3)P2. (C) A gene tree depicting a gene that was introgressed P2)P3. DK values are calculated based on changes in mean divergence between pairs of taxa in the set of trees with the speciation topology versus the set of introgression trees (see eqs 1-3). Note that the expected profiles of DK values for P3)P2 introgression differs from that of P2)P3 introgression, forming the basis for the DIP test (see main text and fig. 2). introgressed loci from nonintrogressed loci. For all loci, we quantify pairwise sequence divergence values between P2 and P3 (K 23 ), between P1 and P2 (K 12 ), and between P1 and P3 (K 13 ) ( fig. 1). The values of K 23 , K 12 , and K 13 on a given gene tree are expected to correspond to T INT , T a , and T b in a way that depends on the introgression history of that gene. Note that K 23 is the divergence measurement that is most commonly used to indicate the presence of introgression (Feder et al. 2005;Joly et al. 2009;Rosenzweig et al. 2016) because introgression in either direction is expected to reduce K 23 relative to genes that reflect the species tree, as the divergence time between the sequences of these taxa is reduced from T b to T INT ( fig. 1). In contrast, changes in K 12 and K 13 will depend on the direction of introgression. For example, introgression can cause K 12 to increase corresponding to a change in divergence time from T a to T b but only if introgression occurred from P3 to P2 ( fig. 1B). Introgression in the other direction should not affect K 12 . The effects on K 13 are also sensitive to the direction of introgression. If it occurs from P2 to P3, introgression should decrease K 13 based on a change in divergence time from T b to T a ( fig. 1C), but there should be no effect on K 13 if introgression occurs in the other direction. To quantify these effects, differences are calculated between the mean values of K 23 , K 12 , and K 13 from all loci displaying the species topology (abbreviated SP loci in equations/figures) and the mean values of the same corresponding divergence measurements from all loci displaying the introgression topology (abbreviated INT loci in equations/figures) in the following fashion: Note that the order of subtraction used in defining these terms is not always the same with respect to species and introgression loci and was chosen such that the effects of relevant introgression are expected to yield positive (rather than negative) DK in each case. Together, this set of DK values composes the divergence profile of DIP. Below, we show the relative magnitudes of these values can be used to differentiate evolutionary histories based on the polarity of introgression. We also use coalescent-based simulations to identify biases that can be introduced by other sources of genealogical discordance such as incomplete lineage sorting (ILS), and we devise additional layers of DIP comparisons that can be used to partially alleviate these biases.

DIP: Distinguishing Modes of Unidirectional and Bidirectional Introgression
The simplest application of DIP involves testing whether DK 23 , DK 12 , and DK 13 are significantly >0 and compares these results to the expectations for DK under different introgression scenarios ( fig. 2). If introgression has occurred in both directions between P2 and P3, then all three DK values should be positive. However, as noted above, if introgression has occurred exclusively in one direction, the expectation for either DK 12 or DK 13 should remain zero ( fig. 2). To test the performance of DIP, we simulated alignments for thousands of loci (5,000 bp each) undergoing unidirectional introgression in each direction, as well as symmetric bidirectional introgression (see Materials and Methods and supplementary fig. S1, Supplementary Material online). We applied DIP to each simulated genome. For the genome simulated under unidirectional P2)P3 introgression, we observed DK 23 > 0, DK 12 ¼ 0, and DK 13 > 0 ( fig. 3A), which is the expected pattern for that direction of introgression ( fig. 1). For the genome simulated under symmetric bidirectional introgression, we observed DK 23 > 0, DK 12 > 0, and DK 13 > 0 ( fig. 3B), which is the expected pattern if some introgression is occurring in both directions. For the genome simulated under unidirectional P3)P2 introgression, we observed DK 23 > 0, DK 12 > 0, and DK 13 ¼ 0 ( fig. 3C), again reflecting our expected DIP profile for that direction. These results indicate that DIP can correctly classify all three types of introgression under these simulated conditions. Next, we explored the performance of DIP across a range of different parameter settings, including the proportions of genes in the genome that had been subject to introgression (pINT). We also varied the proportions of introgressed loci that moved in one direction or the other [p(P3)P2)]. We

Double-DIP: Detecting Asymmetry in Cases of Bidirectional Introgression
Existing introgression polarization methods tend to assume unidirectionality of introgression, but it is also important to consider the possibility of asymmetric bidirectional introgression that falls short of being strictly unidirectional (discussed in Martin et al. 2015). To more directly test for asymmetry in cases of bidirectional introgression, we developed an additional step in the DIP analysis, which we refer to as double-DIP or 2ÂDIP. The premise of 2ÂDIP is that DK 12 for loci introgressed P3)P2 and DK 13 for loci introgressed P2)P3 have the same expected values, as they are both based on a shift in divergence time between T b and T a ( fig. 1). Therefore, under symmetric bidirectional (P3 () P2) introgression, we expect genome-wide values of DK 12 and DK 13 to equal each other. Alternatively, if P3)P2 introgression exceeds P2)P3 introgression, we expect genome-wide DK 12 > DK 13 . 2ÂDIP compares the magnitudes of DK 12 and DK 13 by formulating a simple summary statistic, DDK, which is defined as follows: The expectation for the DDK summary statistic is zero under symmetric bidirectional introgression, positive under introgression that is biased toward P2, and negative under introgression that is biased toward P3 ( fig. 4).
We explored the performance of 2ÂDIP by simulating genomes in the same manner as described above for 1ÂDIP. For the genome simulated under unidirectional These results indicate that 2ÂDIP correctly classified all three types of simulated introgression events. As above, we also performed a parameter scan to explore 2ÂDIP. We found that genomes simulated with p(P3)P2) ¼ 0.5 (i.e., symmetric bidirectional introgression) returned FIG. 2.-Workflow of the DIP test. Point estimates of DK 23 , DK 12 , DK 13 are calculated from whole genomes, which are then resampled to yield distributions of DK 23 , DK 12 , DK 13. Unidirectional P3)P2 introgression is indicated by the profile, DK 23 > 0, DK 12 > 0, and DK 13 ¼ 0. Unidirectional P2)P3 introgression is indicated by DK 23 > 0, DK 12 ¼ 0, and DK 13 > 0. Bidirectional introgression is indicated by DK 23 > 0, DK 12 > 0, and DK 13 > 0. All other profiles are considered inconclusive regarding the occurrence and directionality of introgression. P values for testing whether each DK value significantly differs from 0 are obtained from the proportion of replicates for which DK 0. Colors reflect the black, red, and gray genealogical histories from figure 1. In this illustration, all introgression loci are in the P3)P2 (red) direction. However, we use the red/gray dashed lines for showing the distribution of introgression loci because, in general, the set of introgression loci can contain P3)P2 loci, P2)P3 loci, or both.
DDK value that did not significantly differ from zero ( fig. 5D, white boxes). We also found significant DDK < 0 for nearly all replicated genomes simulated with p(P3)P2) < 0.5 and significant DDK > 0 for nearly all replicated genomes simulated with p(P3)P2) > 0.5 ( fig. 5D). The only exception to these patterns was found when 10% or less of loci in the simulated genome (pINT 0.1) underwent nearly symmetrical introgression (p(P3)P2) ¼ 0.45 and 0.55).
To test the influence of recombination on DIP performance, we also applied an alternative simulation approach in which full chromosomes were simulated under different rates of recombination (resulting in varying haplotype block sizes), while applying the same 5,000-bp partition size used in our other analyses (see Materials and Methods). We found that 2ÂDIP correctly inferred unidirectional introgression regardless of recombination rate (supplementary fig. S2, Supplementary Material online; p(P3)P2) ¼ 0 and 1) and reliably detected slight (p(P3)P2) ¼ 0.4 and 0.6) directional asymmetries when the size of haplotype blocks was the same or smaller than the size of the sliding window applied during DIP (supplementary fig. S2B and C, Supplementary Material online). However, when haplotype blocks were an order of magnitude larger than the window size, we observed in- Material online), ultimately leading to increased sampling variance (see Discussion). Taken together, these results indicate that 2ÂDIP correctly inferred asymmetrical introgression, even in many cases in which there is only slight asymmetry, meaning it is a sensitive method for polarizing asymmetrical introgression that is robust across a variety of parameter values.

Robustness of DIP to Population Divergence Time
The task of accurately classifying loci as introgressed versus nonintrogressed (i.e., INT loci vs. SP loci, respectively) based on gene tree topology is an integral part of DIP; however, this task is confounded when the topology of a gene tree does not accurately reflect the history of introgression (or lack thereof) that occurred at that locus. For example, phylogenetic methods rely on diagnostic synapomorphies to infer gene tree topologies; scarcity of synapomorphies or large amounts of homoplasy in an alignment can lead to phylogenetic error and, thus, inaccurate classification. Another important confounding factor is ILS, which can result in gene trees that reflect a history of deep coalescence at a locus, as opposed to the underlying history of speciation and/or introgression at that locus. This process can result in nonintrogressed loci displaying the introgressed topology. Alternatively, because ILS and introgression are not mutually exclusive processes, ILS can also lead to introgressed loci displaying the species topology. Importantly, ILS is also expected to yield gene trees displaying an alternative third topology that is neither the species topology or the introgressed topology (Green et al. 2010) (see Triple-DIP below).
Both phylogenetic error and ILS are more pronounced during rapid divergence (i.e., short internal branches) (Fontaine DIP was applied to each genome to yield profiles of DK 23 , DK 12 , DK 13 . ** indicates significant departure from 0 (P < 0.01). (D) A plot scanning simulation parameters, proportion of the genome that was introgressed (pINT) (y axes) and proportion of introgressed loci transferred in each direction (p(P3)P2)) (x axis). Each square in the plot indicates the DIP results obtained from five replicated simulated genome alignments. Red boxes indicate the profile consistent with P3)P2 introgression (see panel C). Gray boxes indicate the profile consistent with P2)P3 introgression (see panel A). The shading of the boxes corresponds the percentage of replicates that indicate a given profile, as specified by the key to the right of the plot. Unshaded boxes indicate zero replicates yielded a significant unidirectional profile (i.e., all replicates yield the bidirectional introgression profile; see panel B). et al. 2015). Moreover, it has been shown that, because P3)P2 introgression trees have longer internal branch lengths than P2)P3 introgression trees, the latter are more prone to both phylogenetic error and ILS (Zheng and Janke 2018), ultimately leading them to be more prone to misclassification in DIP. This feature introduces the potential for directional bias in DIP (see Discussion). Therefore, we explored divergence times, as an additional parameter that may influence performance. We focus our discussion on the process of ILS, but it should be noted that phylogenetic error also has the potential to occur in empirical data sets.
All previous simulations were implemented with constant and large divergence times (see fig. 1). To explore the branch length parameter, we modified divergence times by multiplying all of the branch lengths by a scaling factor (SF) (see Materials and Methods), essentially modifying the height of the entire tree used for simulations. SFs >1 yield taller trees, whereas SFs <1 yield shorter trees. For each SF, we simulated five replicate genomes and calculated DDK for each replicate. We first classified introgressed and nonintrogressed loci based on the known history used to simulate the data and plotted the resulting DDK values (omniscient 2ÂDIP). We found that 2ÂDIP correctly inferred asymmetry (or lack thereof) at all branch lengths and that the magnitude of DDK was proportional to the SF ( fig. 6A, D, and G). However, when working with real data sets it is rare to know if individual loci with introgression topologies are the result of bona fide introgression, as opposed to ILS or errors in phylogenetic inference. To explore the impact of the SF on the ability of 2ÂDIP to distinguish between a signature of bona fide introgression versus the effects of ILS, we calculated DDK using topology-based (non-omniscient) classification. With this approach, we observed an upward bias in DDK at low SFs  (Bottom) A sampling distribution of DDK is calculated from resampled gene alignments (bootstrapping) obtained from the original genome. If the majority of DDK replicates are > 0, it is an indication of asymmetric P3)P2 introgression. In this case, the proportion of DDK replicates <0 determines the P value (doubled for a two-sided test) for asymmetric P3)P2 introgression. Asymmetric P2)P3 introgression is indicated by the opposite pattern.
We also explored the influence of the timing of introgression relative to speciation nodes. We held the timing of speciation constant while varying only the timing of the introgression event (i.e., relative introgression time). We found that 2ÂDIP accurately polarizes asymmetric introgression in all cases under omniscience (supplementary fig. S4A and D, Supplementary Material online). Under nonomniscience, 2ÂDIP is accurate when speciation and introgression are separated by a substantial period of time (i.e., relatively recent introgression times) (supplementary fig. S4B, Supplementary Material online). However, we observe a bias in favor of inference of P3)P2 introgression (similar to the bias described above) when introgression occurs immediately following speciation (supplementary fig. S4B, Supplementary Material online) and this effect is compounded when total tree-height is small (i.e., SF ¼ 0.1) (supplementary fig. S4E, Supplementary Material online). Below, we explore sources of bias and strategies for mitigating its effects.

Triple-DIP: Adjusting for Gene Tree Classification Bias
To address the directional bias in 2ÂDIP caused by gene tree ILS at short branch lengths, we developed an additional layer that can be applied in DIP analysis, which we refer to as triple-DIP or 3ÂDIP, so named because it includes an additional D component (i.e., the "delta of the delta of the delta"). Briefly, in addition to calculating the standard 2ÂDIP as above, we also calculate an alternative DDK (DDK ALT ) that substitutes gene trees with the alternative topology, ((P1, P3), P2), for the introgressed loci used in the standard DDK: Note that, K 23 values are substituted in place of K 13 values in calculating this version of DDK because we are now focusing on a conflicting topology in which P1 and P3 are sister to each other. Because P2 and P3 are the two taxa subject to introgression, loci with this alternative topology should arise only from ILS and not introgression. Following the logic of standard D-statistics (Green et al. 2010;Durand et al. 2011), we reasoned that ILS should be equally likely to produce each of the two topologies that conflict with the species tree. Therefore, this alternative 2ÂDIP calculation may provide a measure of the amount of bias that is introduced by ILS. In applying 3ÂDIP, we weight this value by the counts of loci with the expected (P3 () P2) introgression topology (N INT loci) and the alternative topology (N ALT loci). The DDDK summary statistic is calculated as follows (see Materials and Methods for derivation): It should be noted that calculation of a 3ÂDIP correction is only possible when there is at least some ILS because it relies on the presence of ((P1, P3), P2) loci. As such, when we applied 3ÂDIP to genomes simulated with different branch lengths, we were only able to consistently obtain measurements under short-branch conditions (SF < 1.0) where ILS is  fig. 6C, F, and I) because these were the only conditions that returned some loci with the relevant topology. Under these short-branch conditions, we found that 3ÂDIP reduced but did not eliminate the bias observed in 2ÂDIP. Although DDDK was still erroneously positive for the lowest branch length values ( fig. 6F and I), the magnitude of DDDK was less than that of DDK.
We further explored bias in 2ÂDIP and 3ÂDIP by simulating short branch trees (with SF of 0.1, 0.2, and 0.3) across a range of p(P3)P2) values. We first applied omniscient 2ÂDIP to give context to the bias introduced. (as opposed to total tree-height), we found some situations in which 3ÂDIP showed increased directional bias compared with 2ÂDIP (supplementary fig. S4, Supplementary Material online). 3ÂDIP bias exceeded 2ÂDIP bias in situations in which total tree-height was large (high SFs) (supplementary fig. S4G, Supplementary Material online) but the opposite was true for low SFs (supplementary fig. S4H, Supplementary Material online). Together, these results indicate that 3ÂDIP reduces bias in some (but not all) situations, meaning that information can be gained by applying both 2ÂDIP and 3ÂDIP when analyzing empirical data.

Analysis of Hominin Introgression
To understand the performance of DIP on empirical data, we applied DIP to existing genomic data for introgression that occurred between Neanderthal and a modern human European lineage (Green et al. 2010;Prü fer et al. 2014). Applying a five-taxon version of the D-statistic that made use of the phylogenetic position of multiple modern African populations, a previous study (Green et al. 2010) determined that unidirectional introgression occurred from Neanderthal to European lineages. We applied DIP to Chromosome 1 from a Neanderthal sample, a Denisovan sample, two modern human (San [African] and French [European]) samples, and the chimpanzee reference genome. The availability of a Denisovan sample allowed us to infer DIP in two different ways using two different taxon-sampling schemes (TSS1 and TSS2) ( fig. 8A and F). For both TSSs, there were three gene tree topologies present ( fig. 8B and G), indicating the possibility of misclassification due to phylogenetic error and ILS.
Using TSS1, 1ÂDIP yielded a profile indicating the presence of at least some bidirectional introgression ( fig. 8C), a scenario which was not ruled out by Green et al. (2010). However, it should be noted that, whereas DK 12 and DK 13 were both significantly positive, the DK 13 was much closer to zero, which would indicate a substantial asymmetry toward Neanderthal)French introgression. 2ÂDIP and 3ÂDIP indicated significantly positive DDK and DDDK, respectively ( fig. 8D and E), consistent with asymmetric introgression in the Neanderthal)French direction. However, when we applied DIP to TSS2, we saw contradictory results. While 1ÂDIP again indicated the presence of bidirectional introgression, although without the near-zero DK 13 ( fig. 8H), 2ÂDIP and 3ÂDIP yielded positive DDK and DDDK, respectively ( fig. 8I and J). 2ÂDIP and 3ÂDIP from TSS2 thus indicate French)Neanderthal introgression. Although introgression from modern humans has been inferred in other Neanderthal samples (Kuhlwilm et al. 2016), it is at odds with findings from TSS1 and Green et al. (2010).
To understand this discrepancy and put our empirical analyses in the context of our simulations, we plotted distributions of divergence estimates (K 23 , K 12 , and K 13 ) calculated from two simulated genomes and the TSSs used for the empirical analysis (supplementary fig. S6, Supplementary Material online). The empirical distributions display a wider spread than the simulated distributions, potentially introducing noise into the empirical analysis. Importantly, empirical data also show reduced levels of divergence, even compared with the data set simulated with the shortest branch lengths (SF ¼ 0.1). This suggests that the biasing factors explored above could be even more at play in the hominin analysis (see Discussion).

Intended Applications of DIP
Our simulation analyses provide a proof-of-principle that divergence data can be used to polarize introgression in a fourtaxon context, narrowing the methodological gap between our ability to identify introgression and our ability to determine the direction of gene transfer. It should be noted that DIP is not designed to replace existing methods and act as a frontline test of whether introgression has occurred. Instead, we recommend cases of introgression first be confidently identified with existing tools (Huson et al. 2005; Than et al. 2008; Green et al. 2010;Durand et al. 2011;Martin et al. 2015;Pease and Hahn 2015;Stenz et al. 2015;Rosenzweig et al. 2016). In these cases, DIP can then be used to polarize the direction of introgression, a critical step toward interpreting biological implications. As we have shown above, DIP has the potential to distinguish unidirectional and bidirectional introgression and, in cases of bidirectionality, to test for asymmetry between the two directions.
Although there are population genetic (Schrider et al. 2018) and five-taxon phylogenetic (Green et al. 2010;Pease and Hahn 2015) methods capable of polarizing introgression, DIP offers the ability to detect asymmetric introgression in both directions using a four-taxon context. This will be valuable because very little is known about the extent of reciprocal exchange that occurred during even well-studied introgression events (Green et al. 2010;Kuhlwilm et al. 2016), a deficit that likely stems from an absence of sensitive tools. Another group (Hibbins and Hahn 2019) has recently proposed an approach that overlaps with DIP. They introduce a statistic, D 2 , which is conceptually similar to DK 13 described here. As such, nonzero values of D 2 indicate the presence of P2)P3 introgression (B)C by their nomenclature). DIP goes further than this approach because it also uses DK 12 to test for introgression in the opposite direction and DDK to determine the predominant direction of introgression. The primary focus of the recent work by Hibbins and Hahn (2019) is the development of another statistic, D 1 , that assesses the timing of introgression relative to speciation events and can be used in assessing possible cases of homoploid hybrid speciation. This is an elegant application of the same type of divergencebased logic that underlies DIP to a biological question that cannot currently be addressed with our method. We suggest that further improvements in polarizing introgression can be made by combining the explicit coalescent-based modeling of Hibbins and Hahn (2019) with the more comprehensive summary provided by 1Â, 2Â, and 3ÂDIP.

Bias in DIP
It should be noted that the simulation branch length parameters used in figures 3 and 5 resulted in gene trees with relatively deep divergences. These branch lengths were chosen because they emphasize differences in divergence and minimize potential biasing factors, thus providing the clearest view of the general properties of DIP. However, it has been shown that timing of population divergence is an extremely influential parameter in introgression analyses (Durand et al. 2011;Martin et al. 2015;Zheng and Janke 2018). This is true, in part, because the length of internal branches is directly related to the extent of ILS that occurs (Maddison and Knowles 2006). Short branches lead to increased ILS (Degnan and Rosenberg 2013), which can mimic introgression and introduce noise and bias into introgression analyses. Coalescent simulations, such as those that we performed, capture this phenomenon (Hudson 2002;Degnan and Rosenberg 2009), introducing discordant gene trees at a rate dependent on branch length parameters.
Population divergence is additionally important for DIP for a more intuitive reason; the magnitude of the DK measurements, which are the cornerstone of DIP, is directly proportional to the length of internal branches, meaning that DIP gains power to differentiate between alternative hypotheses as branches are lengthened. Finally, there is a disparity in the accuracy of topology classification for loci introgressed P3)P2 versus the opposite direction (Zheng and Janke 2018). This disparity stems from the fact that the internal branch on P2)P3 introgression gene trees is shorter than the same branch on P3)P2 introgression gene trees, making for fewer diagnostic synapomorphies by which to infer the introgression topology. This disparity is most pronounced under conditions in which phylogenetically informative synapomorphies are scarce (i.e., short branch lengths). Moreover, the specific disparity between genes introgressed in each direction has an important consequence for simulation analyses, the short internal branch on P2)P3 introgression gene trees results in a higher rate of ILS for these loci compared with other categories of loci, meaning that ILS obscures the introgression history of these loci at a higher rate than loci introgressed in the opposite direction. This disparity is especially problematic for DIP because it is likely to introduce a directional bias, favoring inference of P3)P2 introgression.
For the above reasons, we performed parameter scans to explore the influence of branch lengths and timing of introgression. We found that 2ÂDIP performs as expected when the classification step is bypassed in omniscient mode ( fig. 6A, D, and G) but bias at short branch lengths arises when introgressed and nonintrogressed loci must be classified directly based on the data ( fig. 6B, E, and H). When working with empirical data sets, omniscience about origins and the effects of introgression versus ILS on individual loci is not possible. As such, classification error may be unavoidable, so we sought to develop a strategy to correct for bias that arises from it, leading to the development of 3ÂDIP. A benefit of 3ÂDIP is that it is applicable under the conditions in which bias is most pronounced. Following the logic of the D-statistic (Green et al. 2010), 3ÂDIP is based on the expectation that ILS is equally likely to produce the two topologies that conflict with the species tree: (P1(P2, P3)) and (P2(P1, P3)). Therefore, under the assumption that there has been no introgression between P3 and P1, the number of ALT loci, which are defined by having the (P2(P1, P3)) topology, provides an estimate for the number of identified loci displaying the introgressed topology that were actually the result of ILS. Accordingly, 3ÂDIP applies a correction for ILS that is proportional to the frequency of these ALT loci. We found that 3ÂDIP reduces directional bias at short branch lengths ( fig. 6C, F, and I and  fig. 7) and does not provide false positive results in the complete absence of introgression (supplementary fig. S5, Supplementary Material online). These results indicate that 3ÂDIP is a step toward overcoming directional bias; however, bias persisted for the shortest branch-length simulations, meaning that there are biological scenarios in which 3ÂDIP is not free from bias. Further, under situations in which introgression occurs immediately following speciation, we observed cases in which 2ÂDIP exhibited less bias than 3ÂDIP (supplementary fig. S4G, Supplementary Material online).
The basic premise of 3ÂDIP is that the number of ALT loci serves as a proxy for the number of loci that have a true history of speciation but display an introgression topology due to ILS. This assumption appears valid in a scenario with ILS but not introgression, as indicated by the ability of 3ÂDIP to eliminate bias under these simulated conditions (supplementary fig. S5, Supplementary Material online). However, 3ÂDIP does not account for the fact that ILS occurs not only for loci with a speciation history but also loci with an introgression history. In other words, some of the loci that exhibit the ALT topology will have a true history of introgression, making these loci an imperfect proxy for the number of loci with a speciation history affected by ILS. This can cause undesired behavior of 3ÂDIP in situations in which most or all of the ALT topologies stem from loci with a history of P2)P3 introgression. Therefore, we suggest that there is a benefit to applying all three variations of DIP to provide the most comprehensive view of introgression directionality.
Fully overcoming bias introduced into introgression analyses by classification error represents a future goal for the field. With current implementations of DIP, inferences of introgression in the P3)P2 direction should be viewed with caution, especially in taxa with very recent divergence times or when introgression occurred very shortly after a speciation event. On the other hand, it can be viewed as a conservative test for P2)P3 introgression, so identification of introgression in that direction can be interpreted as a much more confident prediction. As suggested above, further progress in this area may come through more complex models that explicitly include ILS that occurs at introgressed loci (Hibbins and Hahn 2019), rather than solely at nonintrogressed loci.
A related challenge to DIP analyses is associated with the question of how to partition the genome. Arbitrarily breaking chromosomes into loci of a fixed size may be problematic because the resulting "loci" may either be composed of multiple haplotype blocks with different genealogies due to intralocus recombination or, conversely, an individual haplotype block may contain multiple partitioned "loci," resulting in pseudoreplication as it will be sampled numerous times by DIP. Our simulations of introgression and recombination revealed that these issues do not introduce a directional bias but do dramatically increase the variance of DIP when the size of true haplotype blocks is much larger than the window size used by DIP. One potential strategy for mitigating this challenge would be to incorporate methods that explicitly infer recombination breakpoints (e.g., the four-game test; Hudson and Kaplan 1985) into the window-definition phase of DIP.
There are also unexplored factors that should be considered when implementing DIP because our simulations were run under simplifying assumptions such as random mating, constant population size, and a single bout of instantaneous introgression solely between P3 and P2. Violation of these assumptions in natural populations (Eriksson and Manica 2012;Prü fer et al. 2014;Kuhlwilm et al. 2016;Slon et al. 2018) may introduce additional sources of bias, Our simulation strategies also do not fully capture rate heterogeneity across the genome, branch-specific variation in effective population size/mutation rate, technical biases caused by readmapping, and introgression from unsampled taxa (i.e., "ghost lineages"). These factors should be investigated in future studies with more complex simulation scenarios.

DIP Performance on Empirical Data
We chose hominin introgression as a test case because it is one of the most famous and best-studied examples of introgression. An additional benefit is that the sampling in the group is dense; several modern human samples as well as samples from ancient Neanderthal and Denisovan tissues are available. A benefit of this dense taxon-sampling is that previous studies have been able to apply five-taxon statistics to polarize introgression, leading to the conclusion that "all or almost all of the gene flow detected was from Neandertals into modern humans" (Green et al. 2010). However, more recent analyses of additional archaic samples from different parts of the hominin geographical range also indicated introgression in the opposite direction (Kuhlwilm et al. 2016) as well as mating between Neanderthals and Denisovans (Slon et al. 2018).
An additional benefit of dense hominin taxon-sampling is that the phylogenetic placement of samples allows us to analyze the same introgression event with four-taxon statistics from two different angles. We devised a TSS in which Neanderthal and a modern human acted as P3 and P2, respectively (TSS1, fig. 8A) as well as one in which the roles were reversed (TSS2, fig. 8F). Importantly, these TSSs allowed us to evaluate whether the directional bias described above was strong enough to outweigh the true signature from introgression. DIP returned contradictory results for TSS1 and TSS2. In both cases, 2ÂDIP and 3ÂDIP favored P3)P2 introgression, despite the identity of P3 and P2 being reversed in the two cases. The fact that both analyses sided with the directional bias we documented above, suggests that bias may be outweighing the introgression signature. This is consistent with the observation that hominin divergence is both lower and more heterogenous than our simulated branch lengths (supplementary fig. S6, Supplementary Material online), suggesting that biasing factors are strong enough to bias even 3ÂDIP. It is worth noting, however, that the magnitude of DDK from TSS1 is higher than that from TSS2 and the variance of DDDK is much larger for TSS2 than for TSS1, meaning the signal favoring Neanderthal)French introgression (the expected direction) is stronger and less noisy than the signal in the opposite direction.
Our general takeaway from analysis of hominin data is that, like all introgression analysis tools, there are limits to the conditions under which DIP can be reliably applied. Although 3ÂDIP may represent a step in the right direction, in the case of hominin introgression, the level of ILS swamps out the signal of introgression. We suggest that incorporating an alternative means of identifying introgressed loci, such as f d (Durand et al. 2011;Martin et al. 2015), may yield more reliable results when ILS is prevalent, representing an area of future work. For the time being, DIP will be most reliable in cases of introgression that occurred at more ancient time scales (Forsythe et al. 2018;Dasmahapatra et al. 2012;Fontaine et al. 2015).

Resource Availability
URLs for downloading previously published data are provided in place in the following sections. Scripts for reproducing the analyses in this study are available at:https://github.com/ EvanForsythe/DIP. In addition included are R scripts for performing DIP on genomic data. All scripts are callable from the command line. Users have the choice of inputting either whole-chromosome alignments, which will be divided into single-window (i.e., locus) alignments in preparation for DIP. Alternatively, DIP takes single-locus alignments, bypassing the window partitioning step. DIP outputs descriptive statistics and PDF figures similar to figure 8.

Simulations of Sequence Evolution
We generated whole-genome alignments in which introgression has occurred in some (but not all) loci, and in which donor and recipient taxa for each introgressed locus are known. To accomplish this, we simulated sequence evolution of loci 5,000 nucleotides in length in a four-taxon system (three in-group taxa, P1, P2, and P3 and an outgroup, O) ( fig. 1). All simulations were performed with ms (Hudson 2002) and seq-gen (Rambaut and Grassly 1997) implemented in R v3.5.0 with phyclust v0.1-22 (Chen 2011) similar to Martin et al. (2015). Ms was used to generate a coalescence tree, which was passed to seq-gen in order to generate a sequence alignment. A portion of the loci were simulated to have evolved along a path of simple speciation. In the absence of ILS, the gene trees for these loci should match the speciation history, ((P1, P2)P3)O) (fig. 1A). These loci, denoted as species topology loci, were simulated with the following R commands: ret.msSP<-ms(nsam ¼ 4, nreps ¼ 1, opts ¼ "-T -t 50 -I 4 1 1 1 1 -ej 4 2 1 -ej 8 3 1 -ej 12 4 1r 5 5000") seqsSP<-seqgen(opts ¼ "-mHKY -l5000 -s 0.01", newick.tree ¼ ret.msSP[3]) In the above ms call, the -T argument directs ms to output gene trees, one of which is used as input for seq-gen. The -t argument sets the theta value used by ms, which was held constant across all simulations. The arguments -I 4 1 1 1 1 indicate that four populations were simulated with one individual sampled from each, which was also held constant across all simulations. Each -ej command represents a speciation event (in a forward-time context), the first number following the -ej flag being the timing of the event and the two following numbers being the two daughter populations arising from the speciation. The -r argument indicates the rate of recombination and the final number indicates the length of the segments being simulated by ms. However, for this simulation strategy, we only input one tree into seq-gen, essentially simulating nonrecombining loci (however, see below for our explicit treatment of recombination).
Loci with instantaneous unidirectional introgression occurring between P2 and P3 were also simulated. Introgression trees (transferred in either direction) will have the topology, (P3, P2)P1)O), and thus differ from the species tree. The direction of introgression for an individual locus was indicated by "donor taxon" and "recipient taxon" as in the following R command: ret.msINT <ms(nsam ¼ 4, nreps ¼ 1, opts¼ "-T -t 50 -I 4 1 1 1 1 -ej 4 2 1 -ej 8 3 1 -ej 12 4 1 -es 2 <recipient taxon> 0.4 -ej 2 5 <donor taxon> -r 5 5000") seqsINT<-seqgen(opts ¼ "-mHKY -l5000 -s 0.01", newick.tree ¼ ret.msINT[3]) We replicated the above commands for species and introgressed topology loci to create data sets representing simulated whole-genome alignments composed of a total of 5,000 loci (supplementary fig. S1, Supplementary Material online). The argument in the above command that specify introgression are the -es argument and the final -ej command. We define the proportion of all loci in the genome resulting from simulated introgression in either direction as pINT and the proportion of introgressed genes that were transferred in the P3)P2 direction as p(P3)P2). Because a single locus can only be transferred in one direction or the other, the proportion of loci transferred in the P2)P3 direction, p(P2)P3), is 1Àp(P3)P2). Whole-genome alignments with known values of pINT and p(P3)P2) were used to test the performance of DIP. We performed parameter scans by simulating genome alignments with varying combinations of pINT and p(P3)P2) (see supplementary fig. S1, Supplementary Material online).
Recognizing that the above simulation strategy does not realistically model recombination, we also employed an alternative simulation strategy in which we simulate whole chromosomes (rather than individual loci) while allowing for varying levels of recombination. Introgression in the presence of recombination was simulated with the following ms command in R.
ms(nsam ¼ 4, nreps ¼ 1, opts ¼ T -t 50 -I 4 1 1 1 1 -ej 4 2 1 -ej 8 3 1 -ej 12 4 1 -es 1 <recipient taxon> <pINT> -ej 1 5 <donor taxon> -r <recombination rate> 12500000) The output files from the above ms command (run twice in cases of bidirectional introgression-once for each direction of introgression) were combined into a single file, which was input to seq-gen in order to generate a whole-chromosome alignment. Seq-gen was called from the command line with the following command: seq-gen -mHKY -l 25000000 -s 0.01 -p <number of haplotype blocks from ms> < <ms_output_file> > <seqgen output file name> 2> <file name to store haplotype block positions> Whole-chromosome alignments were replicated five times for each parameter value and DIP analyses were performed with the 5,000-bp partitioning approach applied elsewhere in this article.
The default branch length parameters used for figures 3 and 5 are T INT ¼1, T a ¼4, T b ¼8, and T c ¼12 measured in coalescent units of 4 N generations (see fig. 1). To explore the effects of divergence times, we multiplied all branch length parameters by a range of different SF values. For example, SF ¼ 0.1 results in the following node depths: T INT ¼0.1, T a ¼0.4, T b ¼0.8, and T c ¼1.2.
As an additional means of exploring the effects of speciation and introgression timing, we also varied the timing of introgression in proportion to the most recent speciation even (relative introgression time). The timing of introgression was set relative to the T a speciation time. For example, under default SF described in the previous paragraph with T a ¼4, a relative introgression time of 0.8 translates to T INT ¼3.2. For parameter scans involving branch lengths, we generated point estimates of DDK and DDDK from five replicate genomes for each condition.

Classification of Introgressed and Nonintrogressed Loci
The first step in all versions of DIP is sorting loci to distinguish the loci that were introgressed from those that follow the species branching order (i.e., classification). Using simulated data affords us omniscience at this step (i.e., we know whether each locus was originally simulated as introgressed or not). However, unless specifically stated, we did not make use of the known history of simulated loci. Instead, DIP infers the introgression status of loci based on the topology of a neighbor-joining gene tree inferred for each locus using Ape v5.2 (Paradis et al. 2004). Loci displaying the ((P1, P2)P3)O) topology are marked as nonintrogressed loci. Loci displaying the ((P2, P3)P1)O) topology (introgressed topology) are designated as introgressed loci. Any loci displaying the alternative topology, ((P1, P3)P2)O), which are not produced by speciation or introgression, are omitted from 1ÂDIP and 2ÂDIP but used by 3ÂDIP to calculate a correction factor (see below).

Inferring Introgression Directionality with 1ÂDIP
We calculated the pairwise divergences, K 23 , K 12 , and K 13 (as indicated in fig. 1A) for each locus using the dist.dna command from the Ape package with default settings. Pairwise divergences, K 23 , K 12 , and K 13 are named for the taxa involved in the distance calculation. For example, K 23 measures the divergence of P2 and P3 (see fig. 1). DK 23 , DK 12 , and DK 13 were calculated based on difference in mean K values between SP and introgression loci as shown in equations (1-3). To test for significance, bootstrapped distributions were obtained by resampling (with replacement) loci from the genome to achieve genome alignments equal in number of loci to the original genome alignment. 1,000 such replicates were performed, recalculating DK 23 , DK 12 , and DK 13 for each replicate. P values for the significance of DK values were calculated as the proportion of replicates for which DK 0. For the parameter scan of 1ÂDIP ( fig. 3D), inference of a significant directional profile required that all three measures, DK 23 , DK 12 , and DK 13 , adhere to their expected profile with a significant (P < 0.05) P value for each (with the exception of cases in which the expectation is DK ¼ 0).

Inferring Introgression Directionality with 2ÂDIP and 3ÂDIP
DDK was calculated from DK 12 and DK 13 as described in equation (4). The bootstrap resampling scheme described in the previous paragraph was used to assess the significance of 2ÂDIP. DDK was calculated for each replicate and P values were obtained from the proportion of replicates for which DDK overlapped zero (multiplied by two for a two-sided test). Like 2ÂDIP, 3ÂDIP makes use of DDK to indicate the directionality of introgression. However, 3ÂDIP also introduces DDK ALT , which is calculated according to equation (5).
DDDK is obtained from the difference between DDK and DDK ALT (see eq. 6).
The rationale for the 3ÂDIP correction is that the observed value of DDK may be viewed as a weighted average of: 1) a corrected value (DDDK) that is based only on the loci that truly experienced a history of introgression and 2) a spurious signal (DDK ILS ) arising from the unknown number of loci that exhibit an introgression topology that is actually the result of ILS (N ILS ). (7) Based on the expected symmetry of ILS, we can use DDK ALT and N ALT as estimates of DDK ILS and N ILS , respectively.
Solving equation (8) for DDDK yields equation (6) (see Results). This approach is based on substantial simplifying assumptions. For example, it does not account for the misidentification of loci that have a true history of introgression but exhibit the species or ALT topology because of ILS (see Discussion). As for DDK above, significance of DDDK is obtained from resampled whole-genome alignments.