Estimating divergence times from DNA sequences

Abstract The patterns of genetic variation within and among individuals and populations can be used to make inferences about the evolutionary forces that generated those patterns. Numerous population genetic approaches have been developed in order to infer evolutionary history. Here, we present the “Two-Two (TT)” and the “Two-Two-outgroup (TTo)” methods; two closely related approaches for estimating divergence time based in coalescent theory. They rely on sequence data from two haploid genomes (or a single diploid individual) from each of two populations. Under a simple population-divergence model, we derive the probabilities of the possible sample configurations. These probabilities form a set of equations that can be solved to obtain estimates of the model parameters, including population split times, directly from the sequence data. This transparent and computationally efficient approach to infer population divergence time makes it possible to estimate time scaled in generations (assuming a mutation rate), and not as a compound parameter of genetic drift. Using simulations under a range of demographic scenarios, we show that the method is relatively robust to migration and that the TTo method can alleviate biases that can appear from drastic ancestral population size changes. We illustrate the utility of the approaches with some examples, including estimating split times for pairs of human populations as well as providing further evidence for the complex relationship among Neandertals and Denisovans and their ancestors.


Background
Many population genetic inference approaches compare levels of genetic variation within and across genomes, individuals and/or populations in order to uncover their evolutionary history. A multitude of demographic inference methods have been developed in order to capitalize on the wealth of information that comes with the availability of full genomes from multiple individuals (see Schraiber and Akey 2015, for a review).
The sheer scale and complexity of whole-genome data sets poses its own challenge for making inference of population demographic parameters. A common approach for inference has been to compare the observed data, often summarized in some statistic, to simulated data that can be generated under a range of population-genetic models. Building on this idea and combined with a rejection algorithm, Approximate Bayesian computation (ABC; Tavaré et al. 1997;Beaumont et al. 2002;Cornuet et al. 2014;Pudlo et al. 2016) has proven to be one useful tool for both model choice and parameter estimation. However, the problem of choosing which models to test is not trivial for most inference approaches, including ABC, as the set of models to choose from is very large.
In parallel, there have been recent developments in methods that use haplotype information. A challenge for these approaches is how to model the dependence of genealogies along a sequence. One solution has been to approximate the full ancestral recombination graph, a method used in the pairwise sequentially Markovian coalescent (PSMC) (Li and Durbin 2011) and similar approaches (Schiffels and Durbin 2014;Terhorst et al. 2017;Kelleher et al. 2019;Speidel et al. 2019;Wang et al. 2020).
Another strategy has been to rely on relatively short genetic fragments located sufficiently far away from each other to be able to assume linkage equilibrium between loci, combined with absolute linkage (absence of recombination) within each locus (e.g. Gronau et al. 2011). Both these approaches typically lead to set-ups that cannot be solved analytically and often rely on computationally heavy, advanced statistical methods in order to estimate parameters (but see Gattepaille et al. 2016;Lohse et al. 2016). A related strategy is to assume independence among sites, using a composite likelihood framework (Gutenkunst et al. 2009;Excoffier et al. 2013). From this assumption, the observed variables (i.e. frequency spectra) do not depend on the full distribution of genealogical branch lengths, they are functions only of the expected branch lengths (Griffiths and Tavaré 1998;Chen 2012). This observation greatly simplifies the probability computations. To the extent that closed-form solutions can be obtained, the assumption of independence between sites also leads to inference tools that are easier to integrate with other methods, and can provide useful insights into underlying processes (Beichman et al. 2017;Terhorst et al. 2017). Conversely, a disadvantage of assuming independence between sites is that only information concerning the expected values can be obtained, rather than the full distributions of stochastic variables.
For small samples, an alternative is to derive closed-form expressions for the probability of observing particular configurations of variants in simple divergence models, including the isolation-with-migration model (Wakeley and Hey 1997;Wilkinson-Herbots 2008;Chen 2012). Lohse et al. (2011Lohse et al. ( , 2016 showed that more generally, the probability of observing a particular variant configuration can be obtained from a generating function of genealogical branches. Assuming independence among sequence-blocks, Lohse et al. (2011Lohse et al. ( , 2016 outlined an approach for computing the likelihood under various demographic models and sampling schemes. Regardless of whether independence between sites is assumed or not, all of these methods can be useful for inferring the time of divergence between two populations. Examples of simple and direct methods used to estimate population divergence times include: Gutenkunst et al. (2009),Wakeley (2009, Green et al. (2010), Schlebusch et al. (2012), and Theunert and Slatkin (2018). These methods build on the principle of genetic drift accumulating as a function of effective population size and number of generations. Following a population backwards in time, and using that the accumulated drift at generation t is: where N(i) is the (effective) number of individuals at generation i, the divergence time is then the number of generations required to generate the estimated drift. Such estimates are typically not dependent on knowing the mutation rate but some assumptions regarding N(i) is required, either by assuming a fixed effective population size or depending on an estimated function of N(i). Alternatively, one can base the divergence time estimate on an assumed mutation rate (e.g. Wakeley and Hey 1997;Chen 2012;Pickrell et al. 2012). By assuming independence among sites, in a two-population divergence model (without migration), the probability of observed sample configurations (summarized as the full site frequency spectrum (SFS), including invariable sites) can be derived analytically. Using a likelihood framework, we can then estimate parameters of interest in the divergence model. Here, we present two simple approaches based on picking two gene copies from each of two populations: the "Two-Two" (TT) method, which was briefly introduced in Schlebusch et al. (2017) and the "Two-Two-outgroup" (TTo) method. These are sufficiently simple to allow for analytical solutions giving closed formulas for the estimates of the model parameters based on the counts of different sample configurations.
Specifically, assuming a mutation rate and generation time, we can estimate population divergence time separately from genetic drift since the model is parametrized with both a drift parameter and a time parameter.

Observed data
For the purpose of investigating the demographic relationship between two populations denoted population 1 and population 2, assume that two gene copies have been sampled from each population. For bi-allelic sites, assume that the ancestral (denoted "0") and the derived variant (denoted "1") is known. The number of derived alleles in a sample from population 1 combined with the number of derived alleles in as sample from population 2 is referred to as the joint frequency spectra (e.g. Chen 2012). In our set-up, the sample size from both populations is two so that the number of derived is either 0, 1, or 2, and there are 9 possible sample configurations, which are presented in Table 1. The observed number of sites with sample configuration O i;j will be denoted by m i;j and the total number of investigated sites by m tot .

Theory
We study a general population divergence model where the population-branch leading to population 1 and the populationbranch leading to population 2 merge (backwards in time) to become the ancestral population. The model makes no assumptions regarding population size and/or population structure changes in the daughter populations. The model assumes no migration between the two daughter populations and that these merge into a panmictic ancestral population. We use the following notation, with time measured in number of generations: t 1 , time to split for population 1; t 2 , time to split for population 2; a 1 , probability of two lineages in population 1 not coalescing before t 1 ; a 2 , probability of two lineages in population 2 not coalescing before t 2 , n 1 , expected time to coalescent in population 1 given coalescingbefore t 1 ; and n 2 , expected time to coalescent in population 2 given coalescing before t 2 .
In addition to the drift parameters a 1 and a 2 , the parameters 1 and 2 are needed because two branches with the same timelength and the same drift can have different distributions of coalescence times. To illustrate, a linearly growing population that starts with size N and ends with size 2N will have the same drift as a shrinking population that starts with size 2N and ends with size N but they will not have the same distribution of coalescent times within that interval. These parameters also cover cases when the daughter populations are not panmictic. A similar parametrization can be found, for instance, in Rogers and Bohlender (2015).
The composite likelihood assumption of independence between sites implies that the probability of a mutation on a specific branch in a genealogy is the expected length of that branch (given a demographic model) multiplied by the mutation rate. We denote the mutation rate per site and generation by l, assume independence between sites, and an infinite sites model.
We define the following events for the two sampled lineages for each population: H 1 : a coalescence in population 1 beforet 1 ; PðH 1 Þ ¼ 1 À a 1 H 2 : a coalescence in population 2 beforet 2 ; PðH 2 Þ ¼ 1 À a 2 : With 2 k 4 lineages surviving to enter the ancestral population (depending on whether coalescence events have occurred Table 1 Notation for the number of sites with 0, 1, or 2 derived variants in the sample from population 1 and the sample from population 2 0 in population2 1 in population2 2 in population2 in the daughter populations), we define A k to be the number of derived variants in a sample of size k drawn at the split time in the ancestral population and write a ki ¼ PðA k ¼ iÞ. To illustrate how the probabilities of the sample configurations are derived, we can take an example conditional on no coalescent event in population 1 and a coalescent event in population 2 (the event :H 1^H2 ). There are then three lineages entering the ancestral population. These lineages constitute a sample of size 3 from the ancestral population. Sample configuration O 1;0 will then be observed with probability ð2=3Þa 31 (the ancestral variant has to be assigned to the lineage entering population 2; an event with probability 2/3) plus the probability that a mutation occurs on either lineage entering population 1 during the time interval t 1 . The probability that a mutation hits a branch of length t 1 is lt 1 , and the probability that this happens and that the derived variant already exists in the ancestral population can be ignored as it requires two mutational events at the same site. Thus, conditional on :H 1^H2 ; PðO 1;0 Þ ¼ ð2=3Þa 31 þ 2lt 1 . The same reasoning can be applied to derive the conditional probabilities for all seven (polymorphic) sample configurations and these are shown in Table 2.
Since a subsample of size n randomly drawn from a larger sample of size n þ k has the same distribution as a sample of size n drawn directly from the population, we can reduce the number of parameters by replacing all a ij with i < 4 using a ij -terms with i ¼ 4 as follows: These equations together with Table 2 allow us to derive the probabilities for the different sample configurations. For instance: Using the same strategy for the derivation of the other six probabilities, we obtain the probabilities for all seven sample configurations in Table 2. Writing p i;j ¼ PðO i;j Þ for brevity, these are: Furthermore, if we assume a (indefinitely) panmictic ancestral population ( Figure 1A), we define: to be the number of generations a coalescent process that starts with four lineages at the (most recent) base of the ancestral population spends with i lineages, so that the time to the most recent common ancestor (T mrca ) is Writing s i ¼ lE½T 4i , and replacing the b i with their respective expression in terms of s i , the probabilities for the different sample configurations can be expressed as: 1 9 a 1 ð4 À a 2 Þð2s 2 þ 3s 3 Þ À 1 3 a 1 s 2 These eight equations point to two challenges: (1) it is not possible to completely separate s 4 from divergence times due to its co-occurrence with lt 1 and lt 2 , ii) disregarding s 4 , it is still an underdetermined set of equations with eight parameters but only seven equations/degrees of freedom (p 0;0 þ p 2;2 ¼ 1À P 0 < iþj < 4 p i;j ). It can be tempting to reduce the number of parameters by setting t 1 ¼ t 2 , but because [from equations (1) above] specifying t 1 ¼ t 2 would add additional dependence between the equations. Although this will decrease the number of parameters, it also decreases the number of independent equations. Furthermore, allowing for separate divergence times along the two branches is a valuable asset; not only does it allow the framework to be applicable for temporally structured samples, but separate estimates for each branch can be useful more generally. In fact, it turns out that the divergence time estimate based on the population branch represented by a modern-day individual alleviates the potential issue of residual ancient DNA-specific properties (DNA degradation, sequencing errors, and mapping errors) that could impact divergence time estimates (see below). In contrast, for contemporaneous samples, divergence time estimates should be the same along the two branches (assuming neutrality and the same mutation rate and generation time along the two branches). The challenges noted above can be dealt with either by assuming a constant ancestral population size (the"TT"-method) or by using an outgroup to increase the number of equations (the"TTo"-method).

Assuming a constant ancestral population size ("TT")
Assuming a constant ancestral population size N A reduces the number of parameters in the model ( Figure 1B Then the probabilities in equations (2) simplify as: We set m i;j mtot ¼ p i;j and solve for the parameters (and note that this guarantees that they are also maximum likelihood estimates (MLEs; Doob 1934;Wald 1949) to obtain: m 0;1 þ m 2;1 2 À m 1;1 2m 0;2 À m 2;1 2m 1;2 À m 1;1 : (3) These equations are referred to as the 'TT'-method in Schlebusch et al. (2017) where, in order to get the divergence time in years we used G ¼ 30 and l ¼ 1: where G is the length of a generation. Note also that if sequencing errors or DNA degradation mainly result in additional singletons, then errors in the sample from population 1 only affects m 1;0 and thus only c T 1 and c V 1 (m 1;0 occurs exclusively in the equations for estimating T 1 and V 1 ).

Adding an outgroup ("TTo")
The equations (1) are useful for data (including SNP-genotype data) where the derived variant at each site has been ascertained in a population that branched off prior to the investigated population split. Such data will ensure that derived variants in the studied sample will be older than the split so that there are no new mutations occurring in the branches. In such a case, lt 1 ; lt 2 ; l 1 , and l 2 can all be set to 0 in the equations (1) where * indicates that these are the corresponding parameters and sample configuration counts conditional on ascertainment in an outgroup. With this ascertainment procedure it is important that the population used to ascertain the SNPs represents a true outgroup to our studied populations and that the populations satisfy an assumption of bifurcating topology (or "tree-ness"). To validate such an assumption we can set up tests of tree-ness, since if should be 0 (see Appendix, where it is also shown that Y 2 is closely related to the D-statistic, Green et al. 2010).
The estimates b a 1 Ã and b a 2 Ã of a 1 and a 2 together with the equations in (2) can furthermore be used to obtain estimates of: Note the two alternative estimates (one using m 2;1 and one using m 1;2 ) for s 3 and s 2 and we take the average of these in estimates below.
Based on the obtained estimates of s 2 and s 3 , we can attempt to approximate s 4 as a combination of s 2 and 3s 3 . In a constant population, E½T 43 =E½T 42 ¼ 1=3 and E½T 44 =E½T 43 ¼ 1=2 or For this reason we propose to approximate s 4 =s 3 as ð3=2Þx where x is the estimated ratio of s 3 =s 2 . This leads to b We refer to this approach to estimate divergence time as:TTo" (as in"TT outgroup").
Picking two gene copies from population 1 and one gene copy from population 2 The method so far described can be seen as an expansion of the simpler case of picking two gene copies from one population, and only one gene copy from the other population. This simpler setup can be useful, for instance, when dealing with low-coverage genome data (e.g. ancient DNA sequence data). With this simpler approach, divergence time estimation needs an outgroup (only assuming a constant population size is not sufficient to solve the equations in this case). This 2 plus 1 approach does, however, provide reliable estimates of branch specific genetic drift (under often reasonable demographic assumptions, see Appendix and Wakeley 2009;Schlebusch et al. 2012;Skoglund et al. 2011).

Simulations and comparison to GPhoCS
The model underlying the TT method assumes a panmictic ancestral population of constant size prior to the split, and no geneflow between populations after the split. Although common to many coalescent-based approaches, such assumptions are rarely realistic for natural populations, and it is increasingly evident that mis-specification of an overly simplistic model may lead to substantially biased parameter estimates (Gronau et al. 2011;Mazet et al. 2016;Orozco 2016).
Here, we investigate the robustness of the TT-method parameter estimation against violation of the basic model assumptions [equations in (3)]. We compare its performance under these conditions against an alternative method for parameter inference, GPhoCS (Gronau et al. 2011). The analytical TT method and the Bayesian inference method GPhoCS are located to some degree at opposite ends of the statistical inference spectrum; instead of relying on independent single bi-allelic sites, GPhoCS assumes complete linkage between individual sites at a genetic locus (typically 10 kb), but independence between these loci. It should be noted that GPhoCS is capable of estimating parameters under more complicated demographic models than the simple split model we study here. In particular, GPhoCS allows users to specify migration rates and define migration bands between populations, such that it does not share the TT method assumption of no gene-flow occurring between populations after the population split. For this reason, the effect of migration on parameter estimation was investigated only for the TT method.
The software MS (Hudson 2002) was used to generate polymorphic datasets using a standard coalescent algorithm under a variety of demographic scenarios. The effects of changes in ancestral population size ( Figure 2A) and migration between branches since the split ( Figure 2B) were investigated. In each model, the ancestral population size, N A , was fixed at 34,000, corresponding to 17,000 diploid individuals. This value is in line with recent estimates of African ancestral effective population size $1 million years ago (Li and Durbin 2011;Schiffels and Durbin 2014;Schlebusch et al. 2017). MS scales time by 4N e , and simulations were constructed with true split times of 10,000 and 1500 generations. Assuming a generation time of 30 years this equates to split times of 300,000 and 45,000 years, respectively. These were chosen to keep simulations relevant to the findings of previous work where the deepest split among human groups was estimated at >260,000 years , together with more recent divergence events.
MS generates samples assuming h ¼ 4N 1 l, where N 1 is the diploid population size of population 1, and l is the neutral mutation rate. Mutation rates vary across the human genome and estimates vary depending on the method used (Scally and Durbin 2012). Li and Durbin (2011) calculated a human neutral mutation rate of 2:5 Â 10 À8 per generation (assuming 25 years per generation), whilst recent consensus suggests a lower rate of 1:25 Â 10 À8 per base pair per generation is more accurate (Moorjani et al. 2016). The latter is the mutation rate used across all simulations. Results were filtered such that only those simulations resulting in all sample configurations represented by > 10; 000 sites were used in subsequent analyses.
The Bayesian inference method GPhoCS is based on likelihood estimation and in order to allow adequate convergence of parameter estimates, a burn-in period of 100,000 iterations was used when applied to MS simulated data.

The effect of varying ancestral population size
In simulating the demographic scenario shown in Figure 2A, populations 1 and 2 are constant backwards in time, but not (necessarily) equal in size; each population size is independently drawn from a uniform distribution between 170 and 1,700,000 diploid individuals. A total of 1000 of such demographies were generated. Populations 1 and 2 merge at 10,000 generations to form a single ancestral population of initial size N A ¼ 17; 000 individuals for / generations. The ancestral population then changes to kN A , [drawn uniformly from (1.7 Â 10 2 , 1.7 Â 10 6 )] for s generations, before returning to N A . We investigate the impact of that change in ancestral population size (kN A for s generations) on TT estimates of population divergence time ( b t) and ancestral population size, c N A , for s ¼ 0; 100; 500; 1; 000 generations and / ¼ 0; 100; 500; 2000 generations (Supplementary Figure  S1). Supplementary Figure S1 shows that increasing ancestral population size for s generations can have the effect of inflating divergence time estimates for the TT method. This behavior is expected; if we imagine an expansion of the ancestral population to infinite size for s generations, no coalescence events would occur during that time, and divergence time estimates would be upwardly biased by s generations. This bias however seems to be relatively minor compared with that arising from severe bottlenecks. For instance, when the true t ¼ 10,000, a bottleneck in the ancestral population of 1500 individuals lasting for 100 generations results in b t % 9000 generations. However, the same severity of bottleneck lasting for 500 generations will result in a greater underestimate of b t % 5000 generations. Similarly, N A is underestimated when severe bottlenecks occur, though these estimates seem to be more robust than estimates of divergence time (Supplementary Figure S2). When studying more recent splits, (1500 generations), we observe that severe bottlenecks have the potential to result in nonsensical negative split time estimates (Supplementary Figure S3). Figure 3 shows a comparison between TT method and GPhoCS estimates of population divergence time (t) and ancestral population size (N A ) in cases where the duration of the bottleneck (s) is fixed at 1000 generations and true split time is 10,000 generations. Results suggest that both methods react similarly to violations of the assumption of a change in ancestral population size; each being particularly susceptible to bias when severe bottlenecks have occurred. GPhoCS performs somewhat better than the TT method, with severe bottlenecks resulting in less of an underestimate of population divergence time.
An interesting effect appears when a bottleneck of sufficient severity occurs, whereby both methods' b t estimates begin to rebound towards the true split time of 10,000 generations. Again this behavior is expected as all lineages will coalesce in a bottleneck of sufficient severity prior to a population divergence event. In this case the bottleneck itself will act as the constant ancestral population size, and as long as it occurs in close proximity to the split, divergence time estimates are not affected much. For the same reason, when a severe bottleneck occurs a long time prior to the split, both methods produce a (slight) overestimate of the true divergence time ( Figure 3E).

The effect of migration between branches
In simulations based on the demographic scenario shown in Figure 2B, a pulse of admixture occurs d generations ago, with proportion 0 c 1 of one daughter population made up of migrants from the other daughter population. Thus we examine the effect of increasing proportion of migration occurring at various times between present and the split time. All populations are kept fixed and constant at 17,000 diploid individuals (N 1 ¼ N 2 ¼ N A ). Figures 4 and 5 show the effect of increasing proportion of migrants on TT divergence time estimates when true split time is 10,000 generations. Divergence times are reliably estimated when the proportion of migrants is below 0.1, and as expected, even at higher proportions the bias decreases the nearer the admixture event is to split time. Note that under this set-up, the TT method returns b t % 3000 generations even when the proportion of migrants (c) is 1 and admixture time (d) is 0. We would expect in this case a b t ¼ 0, but it seems that this scenario (where all populations are equal in size) is equivalent to a violation of the assumption of a constant ancestral population. As described previously, this has the effect of biasing b t upwards, and shows that in cases of high proportion of recent admixture, differences between the size of the daughter populations and the ancestral population also has the potential to result in biased b t. Estimates of ancestral population size on the other hand, are only very slightly affected ( Supplementary Figures S4 and S5). Very similar results are observed when a more recent true split time of 1500 generations is studied ( Supplementary Figures S6 and S7).
Although simulations have shown the TT method to be relatively robust to violations of its assumptions in general, it is evident that extensive, recent gene flow between daughter populations or strong, prolonged bottlenecks in the ancestral population has the potential to introduce bias. If however we obtain external estimates of a 1 and a 2 through the outgroup ascertainment procedure outlined above (TTo), we can obtain estimates of divergence time that are much less dependent on assumptions concerning the ancestral population. Figure 6 shows a comparison of TT and TTo method results in scenarios of increasing duration of ancestral population size change. The true values of a 1 and a 2 have been used in equations in (7) to obtain estimates of B 1 , B 2 , s 2 and s 3 that in turn have been used to approximate s 4 and divergence times following equation (8). These results show that by using external estimates of drift, there is the potential to considerably reduce bias in divergence time estimates when severe bottlenecks have occurred in the ancestral population. Furthermore, Figure 6 also compares estimates of N A ,   from that of the TT method to one based on the TTo estimate of s 4 (6 b s 4 =l), which is found to be much more sensitive to ancestral bottlenecks.

Application to data
The TT method requires good quality sequence data, typically high-coverage genome sequence data since diploid genotype calls are utilized, including invariable sites and singletons (in the sample of four chromosomes). Alternatively, instead of one highcoverage diploid genome, several low-coverage genomes from the same population data can be combined to produce (sufficiently many) sites with two gene copies. It is also important to be careful when filtering the genome for reliable regions as this can cause an artificial bias of the mutation rate.
The formulas are sufficiently simple to allow for asymptotic confidence intervals based on MLE theory, and one can imagine thinning the genome data to make sites independent of each other (to overcome potential dependence via linkage). However, we chose a more conservative approach of estimating confidence; the weighted block jackknife procedure (Busing et al. 1999), which should be more robust to large-scale "outlier" regions driving the signal. We conducted pairwise comparisons among the 11 HGDP individuals and the Denisovan genome and the Altai Neandertal genome from (Meyer et al. 2012) and (Prü fer et al. 2014) and estimate population divergence times (see Appendix for a description of data cleaning and calling of ancestral states). The 11 individuals from the Human Genome Diversity Project (HGDP) include 5 individuals from Africa: one Khoe-San ("San"), one rainforest hunter-gatherer ("Mbuti"), two West-African ("Mandenka" and "Yoruba") and one East-African ("Dinka"). The other individuals were two Europeans ("French" and "Sardinian"), two East-Asians ("Han" and "Dai"), one individual from Oceania ("Papuan") and one individual representing a South-American indigenous population ("Karitiana"). In addition, we used the high-coverage ancient southern African hunter-gatherer genome ("Balito Bay A"; Schlebusch et al. 2017) as an outgroup for some divergence estimates (see below).

Split model parameter estimates
We refer to split time estimates in years in the TT-method as b t i and under the TTo method as b t i Ã . These are obtained by setting G ¼ 30 and l ¼ 1:25 Â 10 À8 in equation (4) and applying this to the estimates of T 1 and T 2 in equations (3) and (8), respectively. Comparisons are grouped according to the population split they represent. For instance, the comparison between French and San is referred to as the "Khoe-San split."

Divergence estimates according to the TT-method
Assuming a constant ancestral population size, N A , it is possible to estimate N A , a 1 , a 2 , 1 , 2 as well as t 1 , t 2 without relying on ascertainment procedures. Estimates of a, N A and are shown in Supplementary Figures S8-10, respectively. From Supplementary Figure S10 it is apparent that is often poorly estimated and the uncertainty of the estimate appears to be closely linked to the amount of branch-specific genetic drift (Supplementary Figure  S11). A closer look at any of the formulas for p i;j reveals that the impact of on the probabilities disappears as aapproaches 1 (no drift).
Estimates of the ancestral population size remain remarkably constant at around N A ¼ 17; 000, regardless of choice of individuals (Supplementary Figure S8). Figure 7. To summarize, estimates of the different split times are (in descending order):

Values of b t are shown in
• the split between Neanderthal and Denisovans 962 À 979 kya; • the split between archaic humans and modern humans 510 À 707 kya; • the deepest split among modern human population (between Khoe-San and other human populations) 233 À 266 kya (see Schlebusch et al. 2017) for the consequence of using the ancient southern African Balito Bay A genome); • the split between Mbuti and other modern humans (excluding Khoe-San populations) 186 À 220 kya • the split between West-and East-Africans 96 À 117 kya; • the split between East-Africans and non-African 66 À 82 kya; and • splits between non-African < 0 ya.
Here, the range for the split between archaic and modern humans takes into account the fact that the archaic genomes are older than 40 ky. There are two obvious odd sets of estimates among these: the negative times for non-Africans, and the deep time between Denisovans and Neandertals contrasted to the younger time between Denisovans/Neandertals and modern humans (note that we assume a constant ancestral population size here). We discuss each of these split time estimates below, but first we revisit the utility of ascertaining variants in an outgroup. Divergence estimates according to the TTo method By comparing two individuals using only those sites where the derived variant was present in an outgroup, it is possible to: (1) test whether the outgroup represents a true outgroup, and (2) obtain estimates of a 1 and a 2 that do not rely on assumptions concerning the ancestral population. We utilized the Mbuti, Balito Bay A, or Neandertal/Denisovan as outgroups. The estimates of a conditional on the derived variant being present in an outgroup are shown in Supplementary Figure S15. These three options were variably suitable as outgroups depending on the comparison being made. For instance, when comparing an individual from outside Africa to an African individual, Neandertal/ Denisovan would not be true outgroups given the archaic admixture shared among non-African individuals (Green et al. 2010). This was also visible in the tests based on equations (5) and (6) above (as well as the D-test, see Supplementary Figure S12). A likely consequence of the documented additional Denisovan ancestry in Papuan (Meyer et al. 2012) is that no comparison involving Papuan passed the outgroup tests. Perhaps more surprising, any comparison involving Mbuti failed the tests when Balito Bay A was used as the outgroup. Moreover, both Mbuti and Balito Bay A were expected to be true outgroups for the comparison of Neandertal vs Denisovan, but the test; however, pointed to them not being true outgroups.
Comparisons between estimates of h (assuming a constant ancestral population size) to estimates of s 2 and 3s 3 using different outgroups for ascertainment are shown in Supplementary  Figures S16-18. Since there is presently no suitable outgroup for comparisons between a modern human and one of the two archaic humans-this would require a genome from an archaic human that split off before the Neandertal/Denisovan branch-it was not possible to estimate s 2 and 3s 3 for such comparisons.
When reliable outgroup ascertained estimates of a 1 and a 2 can be obtained, we estimate s 2 , s 3 , B 1 and B 2 using equations (7) that are used in equation (8) to obtain an estimate of T Ã i . This in turn gives b t Ã that are shown in Supplementary Figures S16-21. For the majority of comparisons, such an approach does not yield different estimates compared with assuming a constant ancestral population size. The major exceptions to this are those comparisons involving non-Africans that show positive and realistic divergence time estimates using the ascertainment scheme ( Figure 8).

Divergence times outside Africa
The divergence time estimates for non-African populations under a constant model ( b t i ) are nonsensical, (negative values). This is likely a consequence of the severe out-of-Africa bottleneck that leads to s 4 ¼ lE½T 44 being much smaller than s 2 =6, which then violates the assumption of a constant N A (E½T nk ¼ 2N A =kðk À 1Þ in a constant population with N A chromosomes). Estimates based on the three outgroup ascertainment schemes ( b t i Ã ) give more reasonable values as shown in Figure 8.
These estimates are generally consistent with the prevailing view of the demographic history outside Africa. For instance, the deep split between Sardinians and French may reflect previous findings that while Sardinians trace their ancestry mostly to the early Neolithic farmers, the French are more admixed with European hunter-gatherers and components of the Yamnaya expansion (Skoglund et al. 2014;Allentoft et al. 2015;Gü nther et al. 2015;Haak et al. 2015). To interpret the split times between Han, Dai, and Karitiana, it should be noted that Karitiana is best modeled as a combination of three source populations (an ancient Siberian Eurasian source, a north East Asian source and an Australasian source), where the north East Asian contribution is substantially greater than the other two sources combined (Raghavan et al. 2015;Skoglund et al. 2015). The fact that the Karitiana show a more recent divergence with Han than with Dai likely reflects north East Asians contributing substantially to Native Americans, and that the Dai has a south East Asian component (closer to an Australasians; Raghavan et al. 2015;Skoglund et al. 2015). This admixture pattern results in shallower divergence between Karitiana and Han, and deeper divergence between Karitiana and Dai and between Han and Dai (see e.g. Figure 2 in Raghavan et al. 2015).

Western vs Eastern Africa and timing of the outof-Africa event
Assuming a constant ancestral population size, we estimate the split between non-Africans and East-Africans (Dinka) to between 66 and 82 kya. The split between Mandenka and Yoruba is estimated to 100 kya while the split between Western Africans and Eastern Africans (Dinka and non-Africans) is estimated to between 96 and 117 kya.
The estimates based on the three outgroup ascertainment schemes ( b t i Ã ) are generally older. Although the demographic history of Western and Eastern Africa appears to be particularly complex (Pickrell et al. 2014;Gurdasani et al. 2015;Triska et al. 2015;Busby et al. 2016;Hollfelder et al. 2017), and the SD estimates suggest one should not over-interpret the b t i Ã values, there are a few interesting tendencies among these estimates ( Figure 9). First, estimated split times are consistently lower among Yoruba, Mandenka, and Dinka than between any of these populations and a non-African population; likely an effect of

Deepest splits among modern human populations
The split between Khoe-San and other modern human populations are estimated to around 250 kya using the TT-method  Above we showed the effect of assuming a constant ancestral population size, and how violations of this assumption by a bottleneck in the ancestral population can bias divergence time estimates. However, we find no evidence for such a bottleneck in the common ancestral population to all modern humans, and hence.
In general, we find very little difference between the TT and the TTo estimates (Supplementary Figure S19). In fact, there is a tendency for b t i Ã to be lower than b t i for the Mbuti-split, providing additional support that the Khoe-San split is the deepest split among modern human populations (Schlebusch et al. 2012).

Archaic split times
The split between modern humans and both Neandertal and Denisovan is estimated to between 510 amd 707 kya, which is in line with previous such estimates (e.g.Prü fer et al. 2014). In fact, restricting the analysis to split times on the non-ancient branch (to alleviate issues with fossil dating and potential excess ancient DNA damage) and only to comparisons between Africans and the two archaic humans, gives a range of estimates from 609 to 681 kya ( Figure 10). Unfortunately, there is (presently) no suitable outgroup for comparisons between modern humans and archaic humans in order to utilize the outgroup ascertainment approach.
The estimated split between Neandertal and Denisovan is around 970 kya; more than 250 ky older than the split between modern humans and archaic humans. This is likely an artifact of violating the model assumptions; the existence of a more complex demography is indicated by our finding that neither Mbuti nor Balito Bay A were found to be true outgroups to the Figure 10 Split time estimates between the five African individuals and the two archaic humans assuming a constant ancestral population. Only estimates based on the non-ancient branch are shown. A mutation rate of 1:25 Â 10 À8 and a generation time of 30 years is assumed. Figure 9 Different estimates of split times using outgroup ascertainment assuming a mutation rate of 1:25 Â 10 À8 and a generation time of 30 years. Three estimates are shown: estimates where outgroup ascertainment is performed in Mbuti (þ), in Balito Bay A (᭺) and in Neandertal/Denisovan (Â). Transparent gray represents SD and for comparisons that failed the outgroup tests.
Neandertal-Denisovan comparison according to our outgroup test and the D-test ( Supplementary Figures S14 and S13). Some studies hypothesize that the demographic relationship between Neandertals and Denisovans was governed by meta-population dynamics (Rogers et al. 2017). Others suggest complicating demographic factors such as admixture between Denisovans and Homo erectus (Prü fer et al. 2014) or admixture from the modern human branch into the Altai Neandertal (Kuhlwilm et al. 2016).

Conclusion
We present a simple approach to estimate parameters under a comparatively general split model. In particular, no assumptions are needed concerning the population size processes/changes in the daughter populations (i.e. more recent than the split). The underlying model does not include gene-flow between daughter population; however, we can show that moderate violation of this assumption has little impact on the population divergencetime estimates. Assuming a constant ancestral population size, this approach provides an unbiased estimate of divergence time. However, when the ancestral population is not constant, and particularly in the case of severe bottlenecks, divergence time estimates can be biased. Indeed, simulations comparing the TTmethod to GPhoCS-an alternative, fundamentally different approach to demographic inference, has shown that the two methods are sensitive to violations of the same assumptions. The reason for this can be intuitively understood in terms of the tMRCA in the ancestral population; most of this time is spent with two lineages and the duration of this is utilized by both methods to estimate the size of the ancestral population. Since, by assumption, the ancestral size is constant, if the time to the first coalescent event in the ancestral population is shorter than expected, (for instance, due to a bottleneck shortly before the divergence), then both methods underestimate the true population divergence time. When such severe bottlenecks have occurred, we have shown that it is possible to reduce much of this bias through the outgroup ascertainment procedure implemented in the TTo method.
Applying the TT-method to a sample of 11 genomes from the HGDP panel together with the Neandertal and Denisovan genomes, we provide further information on the details of the various splits within the sample and corroborate many previously estimated population divergence times.
Finally, accumulating evidence suggests that human evolution is highly reticulated, and perhaps not well approximated by the sort of bifurcating tree-models studied here (Schlebusch and Jakobsson 2018;Henn et al. 2018;Scerri et al. 2018;Stringer 2016). Nonetheless, the framework presented here is still a useful tool: different population genetic methods vary in their assumptions and sensitivities to model violations, thus it is important when investigating the complex demographies underlying the evolution of humans to have access to a variety of different methods. The TT-method is relatively robust to model violations and provides a simple and transparent analytic framework that can be compared with, and potentially even integrated with, other, more computationally demanding methods (Beichman et al. 2017;Terhorst et al. 2017;Wang et al. 2020).

Data availability
Genome sequence data from the following publications was extracted and reprocessed in order to avoid mapping and filtering biases: Prü fer et al.