-
PDF
- Split View
-
Views
-
Cite
Cite
Trevor Cousins, Arun Durvasula, Insufficient Evidence for a Severe Bottleneck in Humans During the Early to Middle Pleistocene Transition, Molecular Biology and Evolution, Volume 42, Issue 2, February 2025, msaf041, https://doi.org/10.1093/molbev/msaf041
- Share Icon Share
Abstract
A recently proposed model suggests a severe bottleneck in the panmictic ancestral population of modern humans during the Early to Middle Pleistocene transition. Here, we show this model provides a worse fit to the data than a panmictic model without the bottleneck.
Recently, Hu et al. (2023) developed a new site frequency spectrum (SFS)-based approach, FitCoal, to infer effective population size changes in a panmictic population through time. The authors inferred a severe population bottleneck (>20-fold reduction in size) in the ancestors of modern humans approximately 1 million years ago (Mya) by applying FitCoal to present-day genome sequences from African populations. However, there are several issues with the analysis and conclusions. First, the authors did not evaluate how well their model fits the observed data relative to simpler models without a severe bottleneck. When performing demographic inference, it is important to recognize that the true history cannot be precisely determined. Instead, the principal goal of demographic history inference is to find the simplest model that can fit the observed data. New models that add complexity should only be preferred if they provide a better explanation to the observed data than simpler models. Without this crucial model comparison, it is unclear whether the model Hu et al. propose provides additional explanatory power. Second, FitCoal does not infer this bottleneck in out-of-Africa (OOA) populations, even though genetic events 1 Mya should be shared by all present-day modern humans. Hu et al. argue this is due to OOA populations losing a substantial amount of genetic diversity after their ancestors left Africa ∼60 kya, known as the OOA bottleneck, obscuring the ability to infer events in the deeper past. However, the amount of information about the ancient bottleneck available from the SFS of a population that experienced the OOA bottleneck remains unclear. Third, Hu et al. demonstrate with simulations that other methods, including PSMC (Li and Durbin 2011), Relate (Speidel et al. 2019), SMC++ (Terhorst et al. 2017), and Stairway Plot (Liu and Fu 2015), would have power to detect the bottleneck if it were truly there, but do not detect it in real data. This raises concerns about the robustness of the signal reported by Hu et al.
Here, we reexamine the findings of Hu et al. First, we perform formal model comparisons to test whether the bottleneck inferred by Hu et al. fits the data better than models without a bottleneck. Second, we test whether a bottleneck of the magnitude inferred by Hu et al. would leave a detectable signature in the SFS of OOA populations. Third, we discuss the implications of recently proposed models of ancestral structure for interpretations of panmictic effective population size.
We processed data from the Yoruba (YRI) sequenced at high coverage by the 1000 Genomes Project (Byrska-Bishop et al. 2022), aligned to GRCh38. We filtered out nonneutral regions of the genome (background selection statistic < 0.8; Murphy et al. 2023) and low-quality regions according to a mappability mask and generated an SFS using alternate allele frequency. We refer to this data set as “GRCh38, neutral, unpolarized.” Using this SFS, we inferred an effective population size trajectory using FitCoal and saw a strong population bottleneck approximately 1 Mya, consistent with the results published by Hu et al. (2023) (Fig. 1). Next, we ran mushi (DeWitt et al. 2021) on the same SFS and found it did not infer a bottleneck with a similar magnitude to that inferred by FitCoal in the history of YRI (Fig. 1), echoing a similar recent failure to replicate the signal (Terhorst 2024).

Demographic histories inferred by FitCoal and mushi. Models are fit to the “GRCh38, neutral, unpolarized” data set. Log-likelihood values of each model are reported in Table 1. See supplementary figs. S3 and S4, Supplementary Material online, for fits on GRCh37.
The fact that FitCoal infers a severe bottleneck and mushi does not could reflect FitCoal’s sensitivity to ancient demographic history or statistical problems in the fitting procedure, as has been recently suggested (Deng et al. 2024). Regardless, for a model to be favored over another model, it must fit the data better. We assessed this by computing the log-likelihood ( of the FitCoal and mushi models (Table 1). We find the model fit by mushi provides a better fit to the data (). This result suggests there is no reason to favor a model of panmictic human history with a severe bottleneck ∼1 Mya compared to a panmictic model without such a severe bottleneck.
Model . | Log-likelihood () . | with mushi model . |
---|---|---|
mushi ( | −1,882 | – |
FitCoal ( | −2,246 | 364 |
FitCoal | −2,966 | 1,084 |
Model . | Log-likelihood () . | with mushi model . |
---|---|---|
mushi ( | −1,882 | – |
FitCoal ( | −2,246 | 364 |
FitCoal | −2,966 | 1,084 |
We report the log-likelihood for the two demographic models in Fig. 1 and difference in log-likelihoods with the best fitting model from mushi. We also report the log-likelihood for the FitCoal model incorporating ancestral allele misidentification ().
Model . | Log-likelihood () . | with mushi model . |
---|---|---|
mushi ( | −1,882 | – |
FitCoal ( | −2,246 | 364 |
FitCoal | −2,966 | 1,084 |
Model . | Log-likelihood () . | with mushi model . |
---|---|---|
mushi ( | −1,882 | – |
FitCoal ( | −2,246 | 364 |
FitCoal | −2,966 | 1,084 |
We report the log-likelihood for the two demographic models in Fig. 1 and difference in log-likelihoods with the best fitting model from mushi. We also report the log-likelihood for the FitCoal model incorporating ancestral allele misidentification ().
We carried out two secondary analyses to test the robustness of this result. First, we explored varying the ridge hyperparameter used by mushi, which is a penalty that controls the extent to which large and sudden charges in the effective population size trajectory are allowed. For weak (0.0001) and moderate (100) penalties, we find that mushi does not infer a severe bottleneck and provides a better fit to the data than the FitCoal model (supplementary fig. S1 and table S1, Supplementary Material online). For the strongest penalty (1,000), we find mushi does not infer a bottleneck, but the log-likelihood of the FitCoal model exceeds the log-likelihood of the mushi model, which is reflective of the tradeoff between goodness of fit and flexibility.
Second, we explored using an older, low coverage (2 to 4×) version of the 1000 Genomes data, aligned to GRCh37, which was used by Hu et al. We processed these data in two different ways. First, we filtered out nonneutral regions of the genome (background selection statistic < 0.8; Murphy et al. 2023) and low-quality regions according to a mappability mask, and used the alternate allele frequency, following our procedure for the “GRCh38, neutral, unpolarized” data set. We call this the “GRCh37, neutral, unpolarized” data set. Second, we followed the analysis pipeline of Hu et al. and filtered out coding regions and regions that fell outside of the 1000 Genomes strict mask and polarized the ancestral alleles according to annotations provided in the 1000 Genomes data set. We call this the “GRCh37, noncoding, polarized” data set. In all three data sets, we observe an increase in high-frequency alleles, with the “GRCh37, noncoding, polarized” data set showing the largest increase (supplementary fig. S2, Supplementary Material online). Under a panmictic population size history, the SFS should be a monotonically decreasing function of frequency (Sargsyan and Wakeley 2008), which suggests the “GRCh37, noncoding, polarized” SFS may suffer from issues in polarization. We inferred effective population size history using FitCoal and mushi on these two data sets. In the “GRCh37, neutral, unpolarized,” FitCoal fits a severe bottleneck, but mushi does not infer a severe bottleneck even for weak penalties (supplementary fig. S3, Supplementary Material online). In the “GRCh37, noncoding, polarized” data set, which closely follows the processing of Hu et al., FitCoal inferred a severe bottleneck. We observed that mushi inferred a severe bottleneck when given a weak penalty but did not for a strong penalty (supplementary fig. S4, Supplementary Material online). This is likely reflecting the attempt to fit the large increase of high-frequency alleles observed in this data set. Taken together, these analyses suggest our findings of no support for an ancient, severe bottleneck are robust.
Next, we investigated the discordance between the OOA and African populations in the Pleistocene bottleneck. We computed the expected SFS of two demographic models: (i) both the Pleistocene bottleneck and the OOA bottleneck and (ii) only the OOA bottleneck (Fig. 2). We find substantial differences between the SFS for each of these models. This suggests a Pleistocene bottleneck of this magnitude would have left an identifiable signature in the genomic data of OOA populations, contrary to the claim of Hu et al. The fact that FitCoal did not infer a bottleneck in OOA populations raises further concerns about the robustness of the inference. We stress there is strong evidence for shared history for all modern humans prior to ∼200 kya and any model with features that are not shared across all human populations in these ancient times would require a substantial amount of robust evidence (Nielsen et al. 2017; Bergström et al. 2021).

Effect of ancient severe bottleneck on site frequency spectra for OOA populations. a) British (GBR) demographic history inferred by Hu et al. with FitCoal (solid, blue) and the same demography with an added ancient severe bottleneck (dashed, red). b) Expected SFS without (solid, blue) and with (dashed, red) ancient severe bottleneck for OOA populations.
Finally, there is growing evidence that humans do not descend from a single, panmictic ancestral population. Instead, recent studies suggest that humans descend from two (or more) divergent populations that admixed together prior to the OOA event (Lorente-Galdos et al. 2019; Ragsdale and Gravel 2019; Durvasula and Sankararaman 2020; Wang et al. 2020; Fan et al. 2023; Ragsdale et al. 2023; Cousins et al. 2024). These studies leverage different summaries of the genetic data, including linkage disequilibrium (Ragsdale and Gravel 2019; Ragsdale et al. 2023), the SFS (Lorente-Galdos et al. 2019; Durvasula and Sankararaman 2020; Fan et al. 2023), and heterozygosity across the genome (Wang et al. 2020; Cousins et al. 2024), providing multiple lines of support for a structured evolutionary history. In particular, Cousins et al. (2024) proposed a model where modern humans descend from two ancestral populations that diverged 1.5 Mya and admixed 300 kya in a ratio of 80:20%, and that the majority ancestral population went through a bottleneck immediately after divergence. However, we do not believe this reflects the same event as inferred by Hu et al. using FitCoal. This is because methods used to infer population size history, including FitCoal, PSMC, Relate, SMC++, Stairway Plot, and mushi, estimate coalescence rates and assume the inverse of this rate reflects population size. While this assumption holds for panmictic populations, it does not in structured populations (Charlesworth 2009; Mazet et al. 2016). The coalescence rate in Cousins et al.’s model aligns with the estimates from PSMC, Relate, and SMC++, but the coalescence rate from Hu et al.’s model differs from all of these. We simulated data under the structured model proposed by Cousins et al. to test whether a structured model could produce signals of a severe bottleneck when panmixia is assumed (Fig. 3). We found that mushi does not infer an ancient, severe bottleneck when the true demographic history is a structured model. We ran FitCoal on the same simulated data and found that it inferred a constant size population history until ∼55 kya, where it inferred a severe bottleneck and recovery (supplementary fig. S5, Supplementary Material online). Taken together, these results suggest that the inference of a bottleneck in real data with a structured model is not consistent with FitCoal’s inference of a bottleneck in a panmictic model.

Inference of panmictic population history on a simulated structured model as inferred by cobraa. a) Yoruba (YRI) demographic history inferred on real data by cobraa from Cousins et al. (2024). We used a structured demography in modern humans as inferred by cobraa, with population A (solid, green) and population B (dashed, green) diverging ∼1.5 Mya and subsequently admixing ∼300 kya. We used mushi to infer a panmictic demography (blue) on data simulated from cobraa’s structured model. b) Simulated SFS from cobraa’s model (green) and expected SFS from the inferred mushi model (blue).
In summary, we find there is insufficient evidence to support a model of human evolution where a panmictic population undergoes a severe bottleneck approximately 1 Mya. We stress that newly proposed models should be shown to fit the data just as well or better than traditional models.
Materials and Methods
Processing SFS in Real Data
“GRCh38, Neutral, Unpolarized” Data Set
We downloaded variants aligned to GRCh38 from the Byrska-Bishop high coverage resequencing of 1KGP (Byrska-Bishop et al. 2022). We used the phased VCF from the publicly available data and excluded all positions with a B < 0.8, based on the map of background selection from Murphy et al. (2023), which was lifted over from GRCh37 to GRCh38 using LiftOver (Hinrichs et al. 2006). Positions that could not be confidently lifted over were excluded from the analysis. We further removed positions that did not pass the strict mappability mask (ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000_genomes_project/working/20160622_genome_mask_GRCh38). To compute the SFS, we use the alternate allele counts at each position, assuming the reference allele was ancestral, and we modeled ancestral allele misidentification (i.e. the proportion of sites where the ancestral allele is not polarized correctly) using mushi (see below).
“GRCh37, Neutral, Unpolarized” Data Set
We downloaded variants aligned to GRCh37 from the phase 3 resequencing of 1KGP (1000 Genomes Project Consortium et al. 2015) (https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/) and used the phased VCF. We excluded all positions with a B < 0.8, based on the map of background selection from Murphy et al. (2023). We further removed positions that did not pass the mappability mask (http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/supporting/accessible_genome_masks/). To compute the SFS, we use the alternate allele counts at each position, assuming the reference allele was ancestral, and we modeled ancestral allele misidentification using mushi (see below).
“GRCh37, Noncoding, Polarized” Data Set
We used the same VCF as the “GRCh37, neutral, unpolarized” data set. We masked all coding sequence according to annotations from HAVANA and ENSEMBL. To determine the ancestral allele, we used the annotated reference allele included in the VCF. This processing closely matches the pipeline used by Hu et al.
Running FitCoal
We ran FitCoal (Hu et al. 2023) on the SFS data and varied the number of high-frequency SFS bins that are truncated, following Hu et al. We used truncation parameters of 40, 30, 26, 21, 20, 10, and 1. The truncation parameter did not significantly affect the results (as shown in supplementary fig. S6, Supplementary Material online). We used a per-basepair per-generation mutation rate of 1.25e−8 to convert from coalescent time to generations (Scally 2016). We used a generation time of 29 years to convert from generations to years (Fenner 2005).
Running mushi
We run mushi (DeWitt et al. 2021) on the SFS data using the parameters trend_1 = 0, trend_2 = 1, and ridge = 0.0001. mushi includes a parameter for ancestral allele misidentification, , which was estimated as 0.0017 in the “GRCh38, neutral, unpolarized” data set, 0.0015 in the “GRCh37, neutral, unpolarized” data set, and 0.01 in the “GRCh37, noncoding, polarized” data set. We explored the impact of using other ridge parameters and found similar results in our main analyses (supplementary figs. S1, S3, and S4, Supplementary Material online). We used a per-basepair per-generation mutation rate of 1.25e−8 to convert from coalescent time to generations (Scally 2016). We used a generation time of 29 years to convert from generations to years (Fenner 2005).
Computing Expected SFS
We used mushi to compute the expected SFS from the FitCoal model. We verified that mushi produces the correct SFS by running coalescent simulations of the same demography and generating an SFS using msprime (Kelleher et al. 2016).
Evaluating Model Fits
The log-likelihood of the data was calculated by assuming each entry in the SFS is an independent Poisson variable (Sawyer and Hartl 1992; Gutenkunst et al. 2009).
Suppose there are N haploid genomes, and let be a set of demographic parameters that has expected SFS , and let denote the observed SFS. The likelihood of the data is the following:
The log-likelihood of the data () then is as follows:
We calculate using the gamma function. To compute the log-likelihood of the FitCoal model with ancestral misidentification, we calculated the expected SFS as , where maps to .
OOA Bottleneck Demography
We used mushi to compute the expected SFS for OOA bottleneck models. The inferred demographic history was taken from Figure 3 of Hu et al. and downloaded from their Zenodo repository. To simulate a demography with an ancient bottleneck, we added a bottleneck of the same timing and magnitude as inferred in African populations by Hu et al.
Supplementary Material
Supplementary material is available at Molecular Biology and Evolution online.
Data Availability
All data are publicly available from the 1000 Genomes Project website: “GRCh38, neutral, unpolarized” data set is available at http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_raw_GT_with_annot/; “GRCh37, neutral, unpolarized” and “GRCh37, noncoding, polarized” data sets are available at https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/. Code to process data is available at https://github.com/trevorcousins/insufficient_evidence_panmictic_bottleneck.