Biases in Demographic Modeling Affect Our Understanding of Recent Divergence

Abstract Testing among competing demographic models of divergence has become an important component of evolutionary research in model and non-model organisms. However, the effect of unaccounted demographic events on model choice and parameter estimation remains largely unexplored. Using extensive simulations, we demonstrate that under realistic divergence scenarios, failure to account for population size (Ne) changes in daughter and ancestral populations leads to strong biases in divergence time estimates as well as model choice. We illustrate these issues reconstructing the recent demographic history of North Sea and Baltic Sea turbots (Scophthalmus maximus) by testing 16 isolation with migration (IM) and 16 secondary contact (SC) scenarios, modeling changes in Ne as well as the effects of linked selection and barrier loci. Failure to account for changes in Ne resulted in selecting SC models with long periods of strict isolation and divergence times preceding the formation of the Baltic Sea. In contrast, models accounting for Ne changes suggest recent (<6 kya) divergence with constant gene flow. We further show how interpreting genomic landscapes of differentiation can help discerning among competing models. For example, in the turbot data, islands of differentiation show signatures of recent selective sweeps, rather than old divergence resisting secondary introgression. The results have broad implications for the study of population divergence by highlighting the potential effects of unmodeled changes in Ne on demographic inference. Tested models should aim at representing realistic divergence scenarios for the target taxa, and extreme caution should always be exercised when interpreting results of demographic modeling.


Introduction
Since Alfred Wallace noted that "Every species has come into existence coincident both in space and time with a pre-existing closely allied species" (Wallace 1855), understanding the processes by which new species arise (speciation) has been one of the major quests of evolutionary biology. In the case of sexual organisms, speciation can be defined as the evolution of reproductive isolation among populations, leading to distinct gene pools (Bolnick and Fitzpatrick 2007). The study of population pairs where reproductive isolation is incomplete can therefore provide insight into the processes leading to the evolution of different species. According to the genic view of speciation (Wu 2001), in the early stages of divergence reproductive isolation is a byproduct of differential adaptations and/or genetic incompatibilities, and therefore restricted to regions of the genome under exogenous and endogenous selection (often referred to as "barrier loci"). Such process results in heterogeneous differentiation across the genome, with barrier loci appearing as areas ("islands") of higher differentiation due to selection, whereas unimpeded gene flow homogenizes the rest of the genome (Nosil 2012;Nosil and Feder 2012;Roux et al. 2016;Ravinet et al. 2017). Partial reproductive isolation could arise via the gradual erosion of gene flow (primary divergence), for example because of multifarious selection across environments (Nosil 2008;Nosil et al. 2009;Nosil and Feder 2012). Alternatively, successive stages of strict isolation and secondary contact (secondary divergence) may facilitate the evolution of reproductive barriers among populations, whether they are due to ecological selection or to the evolution of genetic incompatibilities (Roux et al. 2013(Roux et al. , 2014Rougeux et al. 2017;Rougemont and Bernatchez 2018). Both processes can lead to similar genomic landscapes of differentiation, as following secondary contact gene flow can erode genetic differentiation across the genome (with the exception of barrier loci and regions around them) to the point at which any signature of the initial stage of strict isolation is lost (Ravinet et al. 2017). Therefore, distinguishing whether divergence initiated in the presence or absence of gene flow, while important to understand how reproductive isolation arises, is not a trivial task.
Recent events of primary divergence and secondary introgression among ancient lineages are however expected to generate distinctive genomic landscapes surrounding barrier loci. Recent selection on rare or novel mutations is likely to temporally reduce genetic diversity (p) surrounding barrier loci in the population experiencing selection (Smith and Haigh 1974), revealing signatures typical of selective sweeps (Tavares et al. 2018). Absolute divergence among populations (d xy ) will initially remain low, as in the early stages of divergence d xy in regions surrounding a barrier locus is expected to reflect ancestral genetic diversity (see discussion in Ravinet et al. 2017). Instead, if barrier loci are of ancient origin, increased genetic differentiation (F ST ) around barrier loci is likely to be driven by an increase in d xy rather than a decrease in p (Cruickshank and Hahn 2014). Indeed, there are several examples where islands of divergence that originated during long allopatric phases show both elevated F ST and elevated d xy with respect to the rest of genome, where the original signatures of divergence have been eroded by unimpeded gene flow (Duranton et al. 2018(Duranton et al. , 2020Gagnaire et al. 2018;Nelson and Cresko 2018). Unfortunately, the interpretation of genomic landscapes of differentiation is not always strait-forward (reviewed in Ravinet et al. [2017]), and it requires data (highly contiguous genome assembly) that are still lacking for most non-model organisms.
Demographic modeling provides a framework to reconstruct how gene flow has changed through the evolutionary history of diverging populations. Within the past two decades, several computational methods have been developed to reconstruct demographic history from genomic data. Such approaches usually rely on comparing summary statistics obtained from empirical data to simulations performed under competing divergence scenarios, of which the most commonly tested ones include isolation with continuous migration (IM), secondary contact (SC), strict isolation (SI), and ancient migration (AM) (Roux et al. 2013). Approximate Bayesian computation (ABC) approaches (Beaumont et al. 2002;Excoffier et al. 2013), as well as methods based on the diffusion approximation of the joint site frequency spectrum (jAFS) (dadi, Gutenkunst et al. 2009) or its direct computation using a model of ordinary differential equations (moments, Jouganous et al. 2017), have been broadly applied to test among competing demographic models of divergence. The models usually assume that an ancestral population of size N ANC gives rise to two populations of size N 1 and N 2 respectively at a time of split T S , after which several migration scenarios are contrasted. ABC, dadi and moments allow users to define complex demographic scenarios, explicitly modeling the effect of barrier loci-modeled as heterogeneous migration rates across the genome (Roux et al. 2013;Tine et al. 2014)-as well as the effect of linked selection-modeled as heterogeneous effective population size (N e ) across the genome (Roux et al. 2016;Rougemont et al. 2017). Failing to account for heterogeneity in linked selection and migration rates may lead to biases in model choice and parameter estimation (Roux et al. 2014(Roux et al. , 2016Ewing and Jensen 2016;Pouyet et al. 2018).
Such models have been used to test among competing gene flow scenarios across a broad range of divergence times, from a few thousand to millions of generations. For example, demographic modeling has been extensively used to test whether sympatric and parapatric lineages within environments that were shaped during the last glacial cycle arose via rapid ecologically driven divergence or are the result of postglacial secondary contact between more ancient lineages (Tine et al. 2014;Le Moan et al. 2016Rougeux et al. 2017Rougeux et al. , 2019Van Belleghem et al. 2018;Jacobs et al. 2020). The same approach has been extensively used to infer demographic models of divergence among incipient species that diverged hundreds of thousands to millions of generations ago (Roux et al. 2013(Roux et al. , 2014(Roux et al. , 2016Stuglik and Babik 2016;Bourgeois et al. 2019). Most of these studies concluded that contemporary heterogenous gene flow is a result of recent secondary contact (Roux et al. 2013(Roux et al. , 2014(Roux et al. , 2016Tine et al. 2014;Le Moan et al. 2016;Rougemont et al. 2017;Gagnaire et al. 2018;Rougemont and Bernatchez 2018;Rougeux et al. 2019) providing support to the hypothesis that initial allopatric phases of differentiation play a central role in the evolution of reproductive isolation (Roux et al. 2014), and hence primary divergence with gene flow due to ecological selection is rarer than suggested by some authors (Nosil 2008). However, although these models can provide important insight into demographic history, they also show significant limitations. If the models tested are not close enough to the real divergence scenario, both model choice (e.g., the choice between an SC and IM model) and parameter estimation may be affected. Although great effort has been placed recently to overcome potential biases due to barrier loci and linked selection (Roux et al. 2013(Roux et al. , 2016Bhaskar and Song 2014;Tine et al. 2014;Le Moan et al. 2016;Rougemont et al. 2017;Gagnaire et al. 2018;Rougemont and Bernatchez 2018), it remains unclear how unmodeled demographic events, such as size changes in both ancestral and daughter populations, may affect model choice and parameter estimation.
Most recent studies of non-model organisms, where prior knowledge of past demographic events is limited, assume that divergence starts from an ancestral population at mutation-drift equilibrium, with an instantaneous split into two populations of constant size (as in Roux et al. [2013]). If, for example, there was an unmodeled size change in the ancestral population (such as a population expansion or contraction), we can expect an overestimation of divergence time, since the models allows changes in N e only at time T S , pushing estimates of T S toward the time of ancestral population size change. Similarly, unmodeled bottlenecks followed by exponential growth in one of the daughter populations may bias estimates of T S , as small populations experience faster genetic drift (potentially leading to overestimate recent divergence). Both bottlenecks (Luikart et al. 1998) and SC (Alcala et al. 2015) are expected to generate an excess of middle-frequency variants, and hence an unmodeled bottleneck could bias model choice toward SC. It is less clear how changes in N e in ancestral populations affect model choice, as to the best of our knowledge no one has addressed this question. This is a matter of concern, as few studies explicitly model growth in daughter populations, and very few studies of non-model organisms have tested for changes in ancestral population size. A brief search in Web of Science for published studies Momigliano et al. . doi:10.1093/molbev/msab047 MBE using demographic modeling in non-model organisms (using combinations of keywords "Isolation with Migration," "Secondary Contact," "dadi," "abc," "fastsimcoal") suggests that less than one fifth of papers published between 2016 and 2020 accounted for changes in N e in the ancestral population or in at least one of the daughter populations.
Here, we used both simulations and empirical data to demonstrate that unmodeled demographic events in both ancestral and daughter populations can strongly affect both model choice and parameter estimation. Using coalescent simulations of IM and SC scenarios under the Wright-Fisher neutral model, we demonstrate that failure to account for changes in N e in ancestral and daughter populations leads to extreme biases in estimates of T S and to a strong bias toward the choice of SC models. We then reconstruct the demographic history of the Atlantic and Baltic Sea populations of the turbot Scophthalmus maximus, which a recent study (Le Moan et al. 2019) suggested have diverged before the last glacial maximum (>50 kya) and experienced secondary contact following the end of the last glaciation. We argue that these inferences were likely biased because of the failure to account for a past demographic expansion (which led to overestimate T S ) and to model a bottleneck during the invasion of the Baltic Sea (leading to the erroneous choice of a SC model). Furthermore, we show that genomic patterns of differentiation are also consistent with a scenario of very recent divergence with gene flow. We discuss the potential implications of our findings for inferring the demographic history of non-model organisms in general.

Analyses of Simulated Data
For all simulated data ( fig. 1A and B), we optimized parameters for the simple IM and SC models and tested, using Akaike weights (W AIC see Materials and Methods) the support for the correct gene flow scenario (IM). For the recent divergence scenarios, we tested (and optimized parameters for) all basic demographic models ( fig. 1C), and used W AIC to test the support for the correct gene flow scenario (SC or IM depending on simulation). Details of all models used for demographic inference are given in supplementary table 1, Supplementary Material online.

Biases in IM and SC Models in Older Divergence Scenarios
An unmodeled recent bottleneck always led to severe biases in model choice (favoring the SC model, as shown by low W AIC ) and estimates of T S , even when the simulated bottleneck was mild ( fig. 2C). Estimates of T S tended to reflect the strength of the bottleneck rather than divergence time ( fig. 2B), suggesting the effects of a recent bottleneck on the joint allele frequency spectrum (jAFS) obscure signatures of divergence. This led to a severe overestimation of T S in recent divergence scenarios (simulation T S ¼ 0:05) and a severe underestimation of T S in older divergence scenarios (simulation T S > 0:4). The estimated time of strict isolation as a proportion of divergence time (T SI =T S ) is a positive function of the strength of the unmodeled bottleneck ( fig. 2D).
Non-equilibrium states in the ancestral population also led to biases in both model choice and parameter estimation ( fig. 3), but the severity of these biases was affected by both T S and the time before T S at which the last ancestral change in N e occurred (T AE for AE models and T AB =2 for AB models). When T S was relatively shallow (T S < 0:4) unaccounted changes in N e in the ancestral population led to overestimate T S . This bias was most severe when the last ancestral change in N e (T AE in AE model, and T AB =2 in AB models) occurred 0:5 Â 4N REF generations before T S . Biases toward the choice of SC models were most severe at intermediate T S , whereas the effects of not accounting for non-equilibrium in the ancestral population become irrelevant when T S > 0:8 ( fig. 3).

Biases in IM and SC Models in Recent Divergence Scenarios
For the simulations of recent divergence scenarios, we conducted analyses based on simulations of 1 million or 100,000 loci. These simulations gave similar results (compare fig. 4   We simulated data under an IM scenario with divergence times from 0.05 to 1.6. In IM B scenarios one population experiences a contraction to 1-64% of its previous size at time T B followed by exponential growth. The IM AB scenario reflects an ancestral population size contraction followed by expansion. In the IM AE scenario, the ancestral population undergoes an expansion. For both AB and AE scenarios, we tested a range of times (0.25, 0.5, 1) and strengths of the ancestral contraction (AB ¼ 1/4, 1/16, 1/64 of previous size) and expansion (AE ¼ 4Â,16Â,64Â the ancestral N e ). (B) Simulations of recent divergence scenarios, with a T S of 0.05 for IM scenarios and 0.1 for SC scenarios. Data were simulated under the IM and SC scenarios with 64 possible combinations of ancestral expansions and bottlenecks. Simulations were ran both with symmetric and asymmetric migration, for a total of 256 demographic scenarios. (C) demographic models used for inference with moments and dadi, that is, the basic IM and SC models as well as modifications that include an ancestral population size change as well as a bottleneck followed by growth in one of the daughter populations. Migration rate is asymmetric (i.e., different m 12 and m 21 parameters) in all inference models. For the analyses of the empirical data, we used the models graphically represented in panel C, and modifications of these models accounting for heterogeneity in migration rates (2M) and N e (2N models) across the genome, as well as modifications accounting for both (2N2M models). Momigliano et al. . doi:10.1093/molbev/msab047 MBE However, when an unmodeled ancestral expansion led to incorrectly choose the SC model, the proportion of time of strict isolation inferred was low (<50% of the total divergence time) unless the simulation included also a strong bottleneck (supplementary fig. 12A and B, Supplementary Material online). Interestingly, in SC simulations without strong bottlenecks, an unmodeled ancestral expansion led in some cases to a bias toward IM models ( fig. 4B), and in general to an underestimate of the proportion of divergence time for which the model inferred strict isolation (supplementary fig. 12A and B, Supplementary Material online). Not surprisingly, unmodeled ancestral expansions led also to strong overestimates of the ancestral population size N ANC (supplementary fig. 13, Supplementary Material online). All the above-reported biases were more severe when the unfolded jAFS was used for demographic inference. Particularly, T S misestimation was roughly twice as high when inference was carried out using the unfolded jAFS compared with the folded jAFS ( Material online). It should be noted, however, that estimates of contemporary N e and migration rates were always fairly accurate (see supplementary figs. 14 and 15, Supplementary Material online). Furthermore, in general, these biases were less severe when the smaller data set was used for inference (supplementary fig. 16, Supplementary Material online). These biases were not unique to the simulation engine we used, as the analyses of a subset of the data using dadi gave nearly identical results (supplementary fig. 18, Supplementary Material online). Gross model mis-specifications were, however, identifiable by inspecting the jAFS residuals produced by dadi and moments (supplementary fig. 19, Supplementary Material online).

Biases in 8-Model Comparisons in Recent Divergence Scenarios
Testing demographic scenarios that more closely approximate the real demographic history of simulated populations resulted in much less severe biases in model choice. Weight of evidence for the correct gene flow scenario was much stronger, with all simulations of SC scenarios being correctly identified and IM simulations being sometimes misidentified as SC only when population two experienced very severe bottlenecks followed by exponential growth (supplementary fig. 2A, Supplementary Material online). However, when SC scenarios were incorrectly chosen, the length of the inferred periods of strict isolation was usually very short (< 1 3 of total divergence), suggesting that IM and SC models were converging toward the same demographic history (supplementary figs. 2C and 12C and D, Supplementary Material online). Although ancestral expansions and bottlenecks still caused a systematic bias toward overestimating divergence times, this bias was negligible (supplementary figs. 2B and 11C and D, Supplementary Material online). . d xy was highly correlated with p in the North Sea (R 2 ¼ 0.85, P<1 eÀ16 Þ, which is the expectation since in early stages of divergence d xy is expected to approximate p in the ancestral population, which is likely still represented by the North Sea. The correlation between p in the North Sea and Baltic Sea populations breaks down within the two genomic islands of differentiation in chromosomes 1 and 13, where increased allelic differentiation (F ST ) is driven by a dramatic decrease in genetic diversity in the Baltic Sea, rather than an increase in absolute divergence ( fig. 6C 5D). The representative population for the Baltic Sea (DAB) shows signs of population contraction followed by growth following the end of the last glaciation, and lower contemporary N e compared with the North Sea. We also tested three simple one-population models using moments on all individuals from the North Sea and Baltic Sea separately: a standard neutral model (SNM, assuming constant population size at equilibrium), a two epochs model (2EP, assuming a sudden population change at time T 1 ), and a three epochs model including a sudden demographic change at time T 1 followed by a bottleneck at time T 2 and exponential growth (3EP B ). The results give strong support for population size changes, rejecting the SNM neutral mode for both populations. In the North Sea, the 2EP model had stronger support than the 3EP B model (W AIC of 0.88 and 0.12, respectively), whereas for the Baltic Sea population, both models had similar support (0.57 and 0.43, respectively). One-population models provide guidance in selecting realistic demographic scenarios to test in more complex models but should otherwise be interpreted with extreme caution as they ignore the effects of gene flow. Here, results from both stairway plots and one-population models performed in moments suggest that both an ancestral population expansion and a bottleneck followed by growth in the Baltic Sea population should be formally tested in two-population models.

Population Genetics Analyses
The folded jAFS used for testing two-population models included 16,270 biallelic SNPs. We tested the eight basic scenarios presented in figure 1C (IM, SC, IM B , SC B , IM AE , SC AE , IM AEB , and SC AEB ) as well as modifications of these eight basic models accounting for heterogeneous migration rates (2M models) and N e (2N models) across the genome as well as a combination of the two (2M2N models). Model choice and parameter estimates in two-population models were strongly affected by the inclusion or exclusion of an ancestral expansion and a bottleneck in the Baltic Sea population ( fig. 7). Scenarios including an ancestral population expansion (AE) and a bottleneck (B) had lower AIC than simple IM and SC models, with AEB models performing best ( fig. 7A). Within basic IM and SC models, SC models fitted the data significantly better showing much lower AIC than IM models. However, as more complex demographic scenario (AE and B models) were compared, the difference in AIC between IM and SC models became smaller, with IM models showing the lowest AIC in AEB scenarios. Interestingly a very similar pattern was observed in our simulations. In figure 7C, we show the AIC of IM and SC models (and their AE, B, and AEB variations) for a simulation with an ancestral expansion (N AE ¼ 2 Â N ANC ) and bottleneck (where N 2 at time of split is 4% of current N 2 ). Models accounting for heterogenous migration rates (2M) and N e (2N) across the genome fitted the data better ( fig. 7A), but the inclusion or exclusion of these parameters in the models did not change the relative support to IM and SC models.
The two best-fitting models were IM AEB2N and SC AEB2N , with a W AIC of 0.66 and 0.31, respectively ( fig. 7B). However, these two models converged to approximately the same scenario ( fig. 8). Both models suggest an ancestral population expansion approximately 30-100 kyr before divergence to about 1.7 times the ancestral size, a colonization of the Baltic Sea <5.5 kya, and a strong reduction in N e at the time of colonization followed by growth in the Baltic Sea population. The estimate of the period of strict isolation

Discussion
Reconstructing the demographic history of diverging populations is of central importance to understand the role of gene flow and periods of strict isolation in shaping the process of speciation. Here, we demonstrate that when the demographic history of the simulated taxa deviates from the tested scenarios, model choice and parameter estimation can be severely biased. Specifically, unmodeled changes in both ancestral and daughter populations led to biases in estimates of divergence times and to favor scenarios that include periods of strict isolation. These biases can be minimized by comparing more realistic models, but a small systematic bias toward the choice of secondary contact models always remained. Using data from turbot populations from the Baltic Sea and the North Sea, we further demonstrate using an empirical case study that not accounting for potential demographic changes in both ancestral and daughter populations can lead to overestimate divergence times and conclude that these populations diverged during a long allopatric phase (Le Moan et al. 2019) whereas our results indicate a very recent divergence with constant gene flow.

Lessons from Simulations
Testing basic models of divergence, such as AM, SC, IM, and SI, relies on the assumption that model choice is largely robust to unmodeled demographic events. Surprisingly, until now this assumption was never formally tested, though it has been shown that in the original IM program (Hey and Nielsen 2004) departures from assumptions can lead to strong biases in parameter estimation (Becquet and Przeworski 2009). Using extensive simulations, we demonstrate that under a Bias in Demographic Modeling . doi:10.1093/molbev/msab047 MBE very broad range of divergence scenarios this assumption does not hold. Recent bottlenecks followed by growth in a daughter population always lead to strong support for SC models, regardless of T S . However, not accounting for a recent bottleneck can lead to overestimate or underestimate T S depending on whether divergence is recent or older. This is because estimates of T S tend to reflect the strength of the recent bottleneck, rather than divergence time. Another pattern emerging from our simulations is that failure to model a change in N e in the ancestral population can lead to biases in model choice and to overestimate divergence time, but this effect depends both on how recent divergence is and on how much time has passed between T AE or T AB and T S . Biases caused by unmodeled demographic changes in the ancestral population are most severe for recent divergence scenarios (i.e., when T S (T AE or T AB ) and their effect on demographic inference fades when T S % 1. Similarly, the effect of a change in N e in the ancestral population on parameter estimation and model choice becomes irrelevant when the last change in N e occurred %4N e generations before T S . This is theoretically expected, as it takes approximately 4N e generations for a population to reach mutation-drift equilibrium (Kimura and Ohta 1969;Lande 1980). The most commonly tested demographic models assume an ancestral population at mutation-drift equilibrium and a change in N e is permitted only at the time T S . Therefore, an unmodeled ancestral expansion or contraction could push estimates of T S back to the time of the ancestral change in N e . When T S is small, demographic models that did not account for ancestral expansion tended to overestimate T S by up to a factor of ten (as T AE þ T S ¼ 0:5, i.e., 10 Â T S ). In our SC coalescent simulations of recent divergence scenarios T S ¼ 0:1 and T AE þ T S ¼ 0:5, and as expected, the overestimation of T S was up to a factor of five. In IM simulations, unmodeled ancestral expansions also led to a bias toward SC models.
In our simulations of recent divergence scenarios under the SC model, an ancestral population expansion in some cases led to a slight bias toward choosing IM models. This is most likely because the effect of not modeling an ancestral expansion is to push T S back in time, which under the scenario of long strict isolation will result in much longer divergence times. Roux et al. (2016) also demonstrated that when the period of strict isolation preceding SC is a small proportion (<60%) of the total divergence time, distinguishing between IM and SC can be very difficult. When the true divergence scenario is a basic SC model, we observed that IM models that included abrupt population size changes performed better than IM models that did not (but always worse than SC models). For example, IM B models fitted better than IM models, and IM AEB models had even stronger support (supplementary figs. 2-9, Supplementary Material online).
All these observations taken together suggest that extreme caution should be exercised when choosing among competing divergence scenarios using methods based on the jAFS or its summary statistics. It is known that, if one allows competing models to be arbitrarily complex, there is an infinite number of demographic histories that can produce the same AFS (Myers et al. 2008). In reality, when comparing more biologically realistic models, a unique function producing the expected AFS is identifiable (Bhaskar and Song 2014;Rosen et al. 2018). However, we demonstrate that when the models compared do not match closely the demographic history of the simulated populations, several demographic parameters in the model can yield better fits but lead to the wrong biological conclusion. Herein lies the major issue in interpreting results from demographic modeling: models are always extreme simplification of complex biological processes, and we often do not know what complexities can be safely excluded. Several studies (Roux et al. 2014(Roux et al. , 2016Ewing and Jensen 2016;Pouyet et al. 2018) clearly showed that the effects of barrier loci and linked selection should be accounted for and here we demonstrate that historical changes in N e cannot be ignored.
The systematic bias toward the choice of SC models when the real scenario generating the data had continuous gene flow, and the general overestimation of the proportion of MBE strict isolation for SC models, suggest that the use of simple IM and SC models to differentiate between primary and secondary divergence may often lead to the wrong conclusion. These findings have important repercussions on how we interpret demographic analyses of recent divergence scenarios; changes in N e in ancestral populations have almost inevitably happened during past glacial cycles, and bottlenecks followed by population expansions are a classic signature of colonization of novel environments (Hewitt 1999;Liu et al. 2016;Feng et al. 2020). Furthermore, as testing on a smaller number of simulations demonstrated, this bias is not unique to the main method employed in this manuscript (moments), but also applies to another very widely utilized approach to estimate demographic parameters based on the jAFS of multiple populations, dadi ( fore are also expected to be affected. However, statistics such as the decay of linkage disequilibrium are not, and could perhaps be less sensitive to these biases (Jay et al. 2019). It is interesting to notice that when reconstructing the demographic history of model organisms, it is common to test more realistic demographic scenarios modeling past demographic changes in ancestral and daughter populations as well as bottlenecks followed by population growth (Gutenkunst et al. 2009;Gravel et al. 2011;Garud et al. 2015;Jouganous et al. 2017). However, these realistic demographic scenarios are more seldom considered when testing IM and SC models in non-model species. This possibly is in part due to the assumption that the data are inadequate to deal with such model complexity, and in part due to the assumption that model choice and parameter estimation is robust to such unmodeled demographic events. Here, we showed that neither of these assumptions is correct. It should also be noted that a novel unsupervised approach for inferring demographic histories that performs jointly model structure and parameter optimization (GADMA; Noskova et al. 2020) has the potential to alleviate the biases we describe here. It is not our intention to suggest that most studies that compared IM and SC models and found strong support for SC have likely chosen the wrong scenario. Several very recent studies have indeed modeled the potential effects of bottlenecks (Montano et al. 2015;Christe et al. 2017;Rougeux et al. 2017Rougeux et al. , 2019Hartmann et al. 2020;Rougemont et al. 2020), and formal model testing is often only one of several lines of evidence suggesting secondary contact (Roux et al. 2014;Tine et al. 2014;Le Moan et al. 2016;Rougemont et al. 2017). For example, a correlation between F ST and d xy (i.e., elevated divergence in genomic islands of differentiation) provides further evidence for SC (Cruickshank and Hahn 2014;Duranton et al. 2018Duranton et al. , 2020Gagnaire et al. 2018;Nelson and Cresko 2018). Nevertheless, the results clearly show that realistic demographic scenarios can generate strong false support for SC models with long periods of strict isolation. This has clear implications for studying incipient speciation and recent population divergence, for example, as a result of range expansions and colonization of novel habitats following the end of the last glaciation.
It must also be considered that we explored a limited number of unmodeled demographic events. Spatial genetic structure, recent range expansions and admixture from ghost populations are all common demographic events, and these scenarios can also lead to biases in demographic inference Bias in Demographic Modeling . doi:10.1093/molbev/msab047 MBE (Delser et al. 2019). Similarly, although it is relatively simple to account (albeit in a coarse way) for linked background selection by modeling heterogeneity of N e across the genome, it is more difficult to model the possible effects of a reduction in N e in genomic regions of a single population (i.e., the expectation for linked positive selection). Since selective sweeps have the same local effect of a bottleneck (and can indeed lead to false inferences of changes in N e , e.g. Schrider et al. 2016), it is reasonable to assume that the presence of large recent selective sweeps may also lead to a bias toward SC models.

Lessons from the Turbot's Demographic History
A clear example of when the biases we describe in this study are especially problematic is given by the study of the origin of the Baltic Sea biodiversity. The Baltic Sea is a large body of brackish water which became connected to the North Sea about 8 kya. Its marine fauna has probably more than one origin, with evidence of populations and species in the Baltic Sea being the result of both primary and secondary divergence (reviewed in Johannesson et al. 2020). The evolutionary origin of some specific taxa, such as the Baltic Sea populations of Pleuronectiformes, remains controversial (Momigliano et al. 2017(Momigliano et al. , 2018Jokinen et al. 2019;Le Moan et al. 2019). In our empirical study, our model fit suggests that S. maximus from the Baltic Sea originated from a very recent invasion (<6 kya) from the North Sea and diverged with continuous gene flow. One-population models suggest that North Sea and Baltic Sea S. maximus share the same demographic history, with an ancestral population expansion that occurred 35-102 kya, until approximately 5 kya, roughly the divergence time estimated by our two-population models. After this, both stairway plots and two-population models show support for a founder event coincident with the time at which the Baltic Sea had the highest salinity in its history (Gustafsson and Westman 2002). At this time, as noted by Momigliano et al. (2017), there would have been broad opportunity for marine fish to colonize the Baltic Sea. Failure to account for these realistic demographic events would have resulted in a very strong (W AIC > 0.999) support for SC models and estimates of divergence times that predate the origin of the Baltic Sea. Le Moan et al. (2019) reconstructed the demographic history of North Sea-Baltic Sea population pairs for five flatfish species (including S. maximus) using similar data and approaches as in this study, but assuming an ancestral population at equilibrium and no bottleneck followed by population growth associated with the invasion of the Baltic Sea. The authors found strong support for SC in four out of the five populations studied, with estimates of divergence time for each of these population pairs predating the origin of the Baltic Sea by at least a factor of five (Le Moan et al. 2019). Since their estimates of timing of secondary contact was different for the four species, the authors concluded that the population-pairs diverged in strict isolation in several unidentified marine refugia, and colonized the Baltic Sea at different times following the end of the last glaciation (Le Moan et al. 2019). As there is no other evidence for such scenario apart from testing of IM and SC demographic models, it is possible that Le Moan et al. (2019) results are a product of the biases we described here. We also note that when we do not assume an ancestral population at equilibrium and model potential bottlenecks, IM and SC models converge toward the same demographic scenario ( fig. 8) giving strong support for postglacial divergence between flatfish populations in the North Sea and the Baltic Sea. Furthermore, the likelihood ratio test provided no support for the secondary contact model. Interestingly, Momigliano et al. (2017) used ABC to model the divergence of the flounder species pair (Platichthys flesus and P. solemdali) in the Baltic Sea taking into account both potential changes in N e in ancestral and daughter populations (but with no formal testing of IM and SC scenarios), finding support for postglacial colonization of these two flatfish species. Improved modeling in this study has led to support for a more biologically plausible history of S. maximus's invasion of the Baltic Sea but we wish to caution that it is entirely possible that new data and/or better modeling approaches will provide in the future support for an alternative evolutionary scenario. We also wish to caution against overinterpreting scaled parameters, since their value is dependent on the mutation rate, the exact value of which is not know.
The genomic landscape of differentiation between North Sea and Baltic Sea turbot populations also points toward shallow divergence and recent selection in the Baltic Sea. Differentiation across the genome was generally low, and genomic islands of high allelic differentiation (F ST ) were driven by strongly reduced p in the Baltic Sea rather than by a local increase in d xy , that is, the classic signature of a recent selective sweep. Indeed within these genomic islands, d xy approximates p in the North Sea; assuming the North Sea still represents ancestral population diversity, this suggests that net divergence among the selected haplotypes is close to 0. As noted by Cruickshank and Hahn (2014), genomic islands of differentiation that evolved in allopatry and resist introgression (the expected pattern under a model of SC and heterogeneous gene flow) are expected to instead show increased levels of absolute divergence.

Conclusion
In conclusion, using very extensive simulations as well as empirical data on turbots, we demonstrate that testing IM and SC models can be difficult when the demographic history of the studied taxa deviates from the tested scenarios. We conclude that statistical support for SC or IM in model testing can often be an artifact of unmodeled demographic events, and that estimates of T S can often reflect recent changes in N e , rather than divergence time. Given the centrality of formal testing between competing divergence scenarios in current research on local adaptation and speciation, these biases should not be ignored. Testing one-population models can provide guidance in identifying what demographic scenarios need to be incorporated in formal model testing, and testing more realistic demographic scenarios is paramount for avoiding at least the most severe biases described in this manuscript. However, even when testing more realistic divergence Momigliano et al. . doi:10.1093/molbev/msab047 MBE models, extreme caution should be exercised when interpreting results.

Simulations of Older Divergence Scenarios
We tested the effect of recent bottlenecks as well as ancestral expansions and contractions on model choice and parameter estimation within divergence scenarios with T S ranging from 4,000 to 128,000 generations. To do this, we first generated several simulations under an Isolation with Migration model using the software ms (Hudson 2002). All simulated scenarios within this study share a few common parameters. The simplest models represent a scenario where an ancestral population of size N ANC splits at time T S in two populations of size N 1 (which is always fixed at 20,000 individuals, and is used as the reference population size N REF ) and N 2 (N 2 ¼ 0:25 Â N 1 , i.e., 5,000 individuals). The migration rate m is, unless explicitly stated, symmetrical and set to four. The migration rate is given in units where M is the fraction of each population which is made up of migrants at a given generation. We explored the effects of unmodeled demographic events across six divergence times, with T S values ranging from 0.05 to 1.6 in units of 4N REF generations (i.e., 4,000 to 128,000 generations) in log2 steps (i.e., T S ¼ 0.05, 0.1, 0.2, 0.4, 0.8, 1.6).
For each T S , we generated simulations including a bottleneck in population two ( fig. 1A). The time of the bottleneck remains constant at 0.05 (T B ), as we aim to represent the effect of a recent bottleneck associated with fluctuations in N e within the last glacial cycle. We simulated bottlenecks of different strengths, so that N 2 at times T B ranges from 1% to 64% in log2 steps of current N 2 , and following the bottleneck population two experiences an exponential growth that starts at time T B and continues until present. This resulted in six simulations without bottlenecks (one for each T S ) as well as 48 simulations including a bottleneck in population two (six T S and eight strengths of bottleneck).
We further investigated the effects of changes in effective population size in the ancestral population, that is, an ancestral expansion (AE) and an ancestral bottleneck (AB) (fig. 1A). We included nine scenarios of population expansion at time T AE þ T S , and ancestral expansions were modeled as different values of ancestral population size N ANC (to 1 4 ; 1 16 and 1 64 of N REF ). We modeled every possible combination of six values of T S (as above), three values for T AE (T AE ¼ 0.025, 0.5, 1) and three strengths of ancestral expansion for a total of 54 independent simulations.
We simulated, in the same way, scenarios where the ancestral population underwent first a demographic contraction followed by expansion (fig. 1A). The time of the demographic contraction is set to T AB þ T S , where T AB represents the number of generations (in units of 4N REF ) before T S at which the demographic contraction takes place. At time T S þ T AB =2, the ancestral population returns to its original size, which is equal to N REF . As per the AE scenario, we modeled every possible combination of six values of T S (as above), three values for T AB (T AB ¼ 0.025, 0.5, 1) and three strengths of ancestral contraction for a total of 54 independent simulations.
For each of the simulations above, we simulated sampling of 20 individuals per populations and of 1 million unlinked loci of a length of 36 bp, which, when using a standard germline mutation rate (m) of 1 Â 10 À8 and a N REF of 20,000 individuals, results in roughly 80,000-120,000 unlinked SNPs (depending on the specific model). We then used the unfolded jAFS from the simulated data for demographic inference.

Simulations of Recent Divergence Scenarios
We furthermore tested more extensively the effects of unmodeled demographic events on model choice and parameter estimation on recent divergence scenarios (when T S ¼ 0.05 and 0.1, i.e., 4,000-8,000 generations). These simulated scenarios are particularly relevant to the empirical case study we present later (the divergence of turbot populations in the North Sea and Baltic Sea). Here, we simulated data under four gene flow scenarios: SC and IM, each with symmetric (M ¼ 4) and asymmetric gene flow (m 12 ¼ 4 and m 21 ¼ 16 where M ij ¼ 4N REF m ij and M ij is the proportion of individuals in population i which is made up of migrants from population j). Modeling gene flow as the proportion of migrants in a given population (rather than the proportion of individuals migrating from a population), while not always realistic, has the advantage to keep gene flow constant even while N e fluctuates among populations exchanging genes. For each of these gene flow scenarios, we modeled 64 combinations of demographic events in a fully orthogonal design ( fig. 1B). The simplest models represent IM and SC scenarios where an ancestral population of size N ANC ¼ N REF splits at time T S in two populations of size N 1 and N 2 (which have the same values as given above). We then included seven scenarios of population expansion at time T AE (T SC þ T AE ¼ 0:5, in units of 4 Â N REF generations, i.e., 40,000 generations), ranging from 2 Â N ANC to 128 Â N ANC on log2 steps. Since our reference population size (N REF ) for parameter scaling is always N 1 , ancestral expansions were modeled as different values of N ANC (to 1 2 ; 1 4 ; 1 8 ; 1 16 ; 1 32 ; 1 64 and 1 128 of N 1 ). Furthermore, we modeled seven scenarios of bottlenecks followed by exponential growth in population two. In these scenarios N 2 at time T S can be 1%, 2%, 4%, 8%, 16%, 32%, and 64% of contemporary N 2 , reflecting a range of strong to very mild reduction in N e at time of divergence. Following the bottleneck, exponential growth took place during a period lasting 2,000 generations following T S , after which the N e of population two reached N 2 . This kind of scenario is meant to reflect the invasion of a novel habitat, for example, a new environment that became available after the end of the last glaciation. In IM models, the time of divergence is set at 4,000 generations ago (T S ¼ 0:05 Â 4N REF ). In SC models, the time of divergence is set at 8,000 generations ago (T S ¼ 0:1 Â 4N REF ), whereas secondary contact is established at time T SC (T SC ¼ 0:25 Â T S ), that is, 2,000 generations ago.
Bias in Demographic Modeling . doi:10.1093/molbev/msab047 MBE Fully orthogonal combinations of all the demographic scenarios outlined above (SC and IM, symmetric and asymmetric gene flow, ancestral expansions, and bottlenecks) resulted in a total of 256 simulated scenarios. Firstly, for each of the 256 scenarios, we simulated sampling of 20 individuals per population and of 1 million unlinked loci of a length of 36 bp (as for the simulations above). We then estimated the folded and unfolded jAFS from the simulated data. These data sets represent standard data sets when working with high-quality whole-genome data, assuming both scenarios whereby the genome of a closely related species is and is not available to polarize the jAFS. Secondly, for each of the 256 scenarios, we simulated sampling of ten individuals per population and of 100,000 unlinked loci of a length of 36 bp, resulting in roughly 8-12,000 unlinked SNPs (depending on the specific model) and estimated again both the unfolded and the folded jAFS. These data sets represent a standard, small-scale 2b-RAD data set for a non-model species (i.e., the kind of data that most people working of non-model organisms can easily access). This led to a total of 512 coalescent simulations of recent divergence scenarios.

Demographic Modeling of Simulated Data
Demographic modeling of simulated data was carried out using the software package moments (Jouganous et al. 2017), which is a development of the dadi (Gutenkunst et al. 2009) method for demographic inference from genetic data based on diffusion approximation of the allele frequency spectrum. Moments introduces a new simulation engine based on the direct computation of the jAFS using a model of ordinary differential equations for the evolution of allele frequencies that is closely related to the diffusion approximation used in dadi but avoids some of its limitations (Jouganous et al. 2017). Firstly, for all simulated scenarios, we tested whether a simple isolation with migration (IM) or a secondary contact (SC) model fitted the data best (IM and SC models in fig. 1C). The models consist of an ancestral population of size N ANC that splits into two populations of sizes N 1 and N 2 at time T S . In the IM model, there is continuous asymmetric migration. In the SC model, there is a period of strict isolation starting at time T S followed by a period of secondary contact with asymmetric migration starting at time T SC .
For the 256 simulations of recent divergence scenarios, we also tested models that accounted for population size changes in the ancestral population (AE models) and bottlenecks followed by growth in population two (B models), as well as both demographic changes (AEB) models ( fig. 1C). Therefore, for both IM and SC scenarios, we had four alternative models: a basic scenario assuming an ancestral population at equilibrium and instantaneous size changes at time T S , and the three above-mentioned combinations of demographic changes in the ancestral population and in population two (AE, B, and AEB models). It is to be noted that models including bottlenecks intentionally do not match exactly the coalescent simulations ( fig. 1B), but rather mirror how growth is modeled in previously utilized models in speciation research (Rougeux et al. 2017). In IM B models, exponential growth starts at time T S and continues until the present, whereas in SC B models, exponential growth starts at time T SC and continues until the present ( fig. 1B).
Models were optimized for three rounds following an approach similar to Portik et al. (2017). In the first round, a Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm optimization (function "optimize.log"; max-iter ¼ 10) was run for ten sets of three-fold randomly perturbed parameters. In the second round, the parameters from the replicate with the highest likelihood from each model were used as a starting point and the same optimization algorithm was used on ten sets of two-fold randomly perturbed parameters, increasing max-iter to 20. We repeated the same procedure for round three, but using one-fold perturbed parameters and setting max-iter to 30. For models including ancestral sizes changes and bottlenecks, we optimized parameters using four rounds of optimization. We ran the entire optimization procedure ten times to check for convergence among independent optimizations. We selected the replicate with the highest likelihood for each of the two-population models and calculated the Akaike Information Criterion as À2ðlog ÀlikelihoodÞ þ 2K where K is the number of model parameters, and the DAIC i ¼ AIC i À AIC min . We then calculated Akaike weight of evidence (W AIC ) as outlined in Rougeux et al. (2017). The equation is outlined below, and R represents the total number of models compared.
Firstly, we looked at the effect of unaccounted demographic events when our model choice was restricted to the basic IM and SC models (i.e., excluding models that account for demographic size changes in the ancestral population and a bottleneck and growth in population two). We evaluated not only if the IM and SC models were correctly identified but also whether not accounting for ancestral expansion and bottlenecks affected parameter estimation.
Second, for all scenarios of recent divergence (i.e., the ones more relevant to our empirical study), we looked at whether including models with ancestral population expansions and bottlenecks improved model choice and parameter estimation. When comparing the eight full models, we calculated the W AIC for each gene flow scenario (IM vs. SC) as the sum of W AIC for all IM and SC models. This comparison among family of models was carried out since at times the correct gene flow scenario was identified, but one of the demographic changes modeled was not (i.e., this process maximized our chance to recover the simulated gene flow scenario). For the simulations of ancestral expansions (AE) and ancestral bottlenecks (AB), we used only the unfolded jAFS. Since in our empirical study we use the folded jAFS, for all 512 simulations of recent divergence scenarios, model optimization and parameter estimation were carried out using both the folded and unfolded jAFS, to test whether demographic inference using the unpolarized jAFS is less or more susceptible to biases in model choice and parameter estimation. Momigliano et al. . doi:10.1093/molbev/msab047

Comparison between Methods
To determine whether the biases reported in this study were specific to moments or reflected a general issue in inferring demographic histories from the jAFS, we repeated a portion of the analyses using the diffusion approximation approach implemented in dadi. We performed model choice and parameter estimation for the 126 simulations of recent divergence scenarios under the IM model ( fig. 1B), with symmetric and asymmetric gene flow and using only the simulations of the larger data sets (1 million loci). We used exactly the same optimization strategy, but only ran three independent optimization routines, which were sufficient to get convergence for independent runs of the simple IM and SC models.  Florin and Höglund (2007), whereas samples from the North Sea were originally collected by Nielsen et al. (2004).

Library Preparation
We built 2b-RAD libraries following the approach described by Wang et al. (2012), but with degenerate adaptors to allow identification of PCR duplicates. The protocol is described in detail by Momigliano et al. (2018). In short, DNA was extracted using a modified salting out protocol, and about 200 ng of DNA was digested with the type II b enzyme BcgI (New England Biolabs). This enzyme cuts both upstream and downstream of the 6-bp recognition site, creating fragments of a length of exactly 36 bp with 2-bp overhangs. Adaptors, one of which included degenerate bases, were ligated and the fragments amplified via ten cycles of PCR as described in Momigliano et al. (2018). Fragments of the expected size were isolated using a BluePippin machine (Sage Science). Libraries were sequenced on Illumina machines (NextSeq500 and Hiseq 4,000) to achieve a mean coverage of approximately 20Â.

Bioinformatics and Basic Population Genetic Analyses
Raw reads were demultiplexed and PCR duplicates were removed as per Momigliano et al. (2018). Reads were then mapped to the latest version of S. maximus reference genome (Figueras et al. 2016, Assembly ASM318616v1, GenBank accession number: GCA003186165) using Bowtie2 (Langmead and Salzberg 2012). SAM files were converted to BAM files and indexed using SAMTOOLS (Li et al. 2009). A genotype likelihood file in beagle format was produced in the software ANGSD (Korneliussen et al. 2014), using the following filters: no more than 20% missing data, retaining only biallelic loci, removing bases with mapping quality below 30 and Phred quality scores below 20. See supplementary figure 1, Supplementary Material online, for summary statistics. A principal component analysis (PCA) based on genotype likelihoods was performed using the software PCAngsd (Meisner and Albrechtsen 2018) using only variants with a minor allele frequency above 0.02. The folded AFS for each population as well as the jAFS were produced in ANGSD. ANGSD calculates folded jAFS where the minor allele is computed separately for each AFS, whereas moments expects minor alleles to be estimate for the jAFS. Thus, for the jAFS for North Sea and Baltic Sea, we produced the unfolded jAFS in ANGSD in the form of a dadi data dictionary. We then selected a random SNP within each locus and folded the jAFS in moments.
We produced also a variant call file (VCF) using the UnifiedGenotyper function from GATK v.3.8. Following UnifiedGenotyper, we removed individuals with an average read depth below seven. Then, we used four technical replicate pairs (i.e., four pairs of individuals for which we constructed and sequenced two independent libraries) to generate a list of SNPs for which we have high confidence (i.e., which show 100% matches between all replicate pairs). We used this list of SNPs to carry out variant quality score recalibration (VQSR), following GATK best practice (Dixon et al. 2015). Finally, we removed genotype calls with low sequencing depth (<7), removed indels, triallelic SNPs, SNPs with minor allele frequencies below 0.01 and with more than 10% missing data. This resulted in a final VCF containing 12,678 sites genotyped for 154 individuals. This data set was used to calculate Weir and Cockerham F ST for each SNP. Furthermore, we thinned the data retaining, for each tag, the SNP with the highest minor allele frequency and used this data for inferring population structure using fastSTRUCTURE (Raj et al. 2014). Bioinformatic steps, scripts for analyses, and the jAFS used are publicly available (see Data Availability).

Inferring the Genomic Landscape of Differentiation
We used several approaches to identify potential islands of differentiation across the genome between North Sea and Baltic Sea turbots. Firstly, we calculated from the called genotypes F ST between the North Sea and Baltic (excluding individuals from the transition zone) using VCFTOOLS (Danecek et al. 2011). This is the only measure of differentiation we calculated from called genotypes. Secondly, we used the software package PCAngsd to run a selection scan using an extended model of fastPCA (Galinsky et al. 2016) working directly on genotype likelihoods, based on the input beagle file we used for the PCA. This approach identifies variants whose differentiation along a specific principal component (in our case the first PC) is greater than the null distribution under genetic drift. To account for multiple comparisons, we converted P values to q-values (false discovery rate, FDR) following Benjamini and Hochberg (1995). As a second approach to classify SNPs as outliers, we used a Hidden Markov model (HMM) approach to detect genomic islands, Bias in Demographic Modeling . doi:10.1093/molbev/msab047 MBE based on the uncorrected P values from PCAngsd selection scan (Hofer et al. 2012;Marques et al. 2016). We counted as candidate outliers SNPs that show and FDR <0.1 and that simultaneously were identified as outliers by the HMM test. If adjacent SNPs identified by both approaches lied within a distance of <500 kb, we identified them as part of the same genomic island of differentiation.
We then obtained estimates of within population genetic diversity (p) and absolute divergence (d xy ) across the genome. In order to maximize the usable data and account for differences in coverage among samples, we performed all analyses in ANGSD directly from genotype likelihoods. Firstly, we generated windows of 250 kb across the genome using BEDTools (Quinlan and Hall 2010). Then, we calculated the unfolded AFS (for the North Sea and Baltic Sea individuals) as well as the jAFS for each 250-kb window across the genome with ANGSD, using only filters that do not distort the AFS (-uniqueOnly 1 -remove_bads 1 -minMapQ 20 -minQ 20 -C 50). We finally used custom R scripts to calculate p and d xy for each window, retaining only windows for which the AFS was derived from at least 1,000 sequenced bases. All scripts to calculate the AFS in windows and derive summary statistics are available from GitHub and Zenodo (sea Data Availability). Note that, for diversity analyses, we used the unfolded AFS even if we do not have an appropriate outgroup, assuming the reference allele as the ancestral state. However, this is not an issue since the summary statistics calculated are based on allele frequencies, which are symmetrical with respect to folding.

Demographic Modeling of Empirical Data
The demographic history of the North Sea and Baltic Sea populations of S. maximus were reconstructed using several approaches based on the 1d-AFS (for one-population models) and the jAFS (for two-population models). Since taking into account possible changes in effective population size may have important effects on the estimation of parameters such as migration rate and divergence times (Gravel et al. 2011), we first determined the demographic history of each population independently using the 1d-AFS, using both moments and stairway plots (Liu and Fu 2015). We then proceeded to compare 32 two-population models to determine the demographic history from the jAFS. We used a m of 1 Â 10 À8 and a generation time of 3.5 years for scaling demographic events to make results directly comparable to Le Moan et al. (2019).

One-Population Models
Firstly, we estimated past demographic changes in the North Sea populations and from the Baltic Sea populations (i.e., excluding samples from the transition zone) using the multi-epoch model implemented in the software Stairway Plot v2 (Liu and Fu 2015). Stairway plots use composite likelihood estimations of H at different epochs, which is then scaled using the mutation rate. For Baltic Sea populations, we estimated past demographic changes from the 1d-SFS from each sampling location independently. Stairway plots were generated including singletons, using 2/3 of the sites for training and four numbers of random break points for each try ( 1 4 ; 1 2 ; 2 3 , and 1 time the number of samples in each population). Since demographic histories were similar in all locations, and there was no evidence of population structure from other analyses, all subsequent analyses were performed using the jAFS estimated from pooling all samples from populations in the Baltic Sea.
Secondly, we compared three simple one-population models using moments: a standard neutral model (SNM, assuming constant population size at equilibrium), a two-epoch model (2EP, assuming a sudden population change at time T1), and a three-epoch model including a sudden demographic change at time T AE followed by a bottleneck at time T B followed by exponential growth (3EP B ). The 2EP model represents a single demographic change and a scenario where a demographic expansion/contraction occurred either in the ancestral population from which the North Seaand Baltic populations are derived or in the North Sea and Baltic Sea populations themselves. The 3EP B model represents a scenario where, in addition to an ancestral expansion/contraction, there was a recent bottleneck followed by growth; this could be a realistic scenario for the Baltic Sea populations, which must have invaded the Baltic Sea following its connection to the North Sea within the past 8 kyr.

Two-Population Models
Given the existence of a well-known hybrid-zone between the Baltic Sea and the North Sea ), we tested the two main divergence scenarios that include contemporary migration: the isolation with migration model (IM), and the secondary contact (SC) models. All models consist of an ancestral population of size N ANC that splits into two populations of size N 1 and N 2 at time T S . Migration is continuous and asymmetric in the IM model. The SC model includes a period of isolation starting at time T S and a period of secondary contact when asymmetric migration starts at time T SC . For each of these basic models, we tested models that included heterogeneous migration rates across the genome (2M, i.e., islands resisting migration), and heterogeneous N e across the genome (2N, a way to model linked selection) as described in Rougeux et al. (2017). We therefore had four possible variations of each of the basic model (e.g., SC, SC 2M , SC 2N , SC 2M2N ), for a total of eight basic divergence models.
Both stairway plots and moments analyses of the empirical data suggest (see Results) a demographic expansion 20-100 kya, and that the Baltic Sea population may also have undergone a recent bottleneck. If such demographic changes had happened at the time of split between the two populations, this would be well captured by the eight models described above which take into account a single change in N e from N ANC to N 1 and N 2 at time T S . However, given that the timing and magnitude of the population expansion are very similar in all populations, another possibility is that the ancestral population underwent a demographic expansion prior to the split between the North Sea and the Baltic Sea. We Momigliano et al. . doi:10.1093/molbev/msab047 MBE incorporated this hypothesis by extending the eight models described above to include an ancestral population expansion (AE, ancestral expansion models), a recent bottleneck followed by population growth in the Baltic Sea (B, bottleneck models) or both (AEB models). In the AE models, the ancestral population undergoes a demographic change at time T AE , after which population size remains constant until time of split (T S ). In the B models, the Baltic Sea population undergoes a bottleneck followed by population growth at time T S . This scenario mimics a possible invasion of the Baltic Sea from a small founder population. For both IM and SC models, all possible combinations of heterogeneous N e (2N models), heterogeneous migration rates (2M models), ancestral expansion (AE models), and bottlenecks (B models) were tested, yielding 16 variations of the IM and SC models and a total of 32 models tested. It should be note that in 2N2M models, it is assumed that regions experiencing lower migration rates and regions experiencing lower effective population size do not overlap. This made convergence of the complex models easier, and it should not be problematic assuming the proportion of the genome experiencing lower migration rates and with lower N e are relatively small (as is our case, see supplementary table 3, Supplementary Material online).

Model Optimization and Model Selection
With the exception of the SNM models, which has no free parameters, all one-and two-population models were optimized in five independent optimization routines, each consisting of five rounds of optimization using an approach similar to Portik et al. (2017). In the first round, we generated 30 sets of three-fold randomly perturbed parameters and for each ran a BFGS optimization setting max-iter to 30. In the second round, we chose for each model the replicate with the highest likelihood from the first round and generated 20 sets of three-fold randomly perturbed parameters, followed by the same optimization strategy. We repeated the same procedure for rounds three, four, and five, but using two-fold (round three) and one-fold (round four and five) perturbed parameters, respectively. In the final round, we also estimated 95% confidence intervals (CI) of the estimated parameters using the Fisher Information Matrix, as described by Coffman et al. (2016), and 95% CI of T S and T SC were estimated according to the rules of propagation of uncertainty. We selected the best replicate for each of the final models (three one-population models and 32 two-population models). We then calculated W AIC as a relative measure of support for each model. Parameters in coalescent units were scaled based on estimates of N ANC as outlined in dadi's manual. N ANC was calculated as H=ð4LlÞ, where L is the total sequence length from which SNPs used in demographic analyses originated and l ¼ 1 Â 10 À8 . The total sequence length was calculated as S Â ðV U =V T Þ, where S is the number of sites (variant and invariant) retained in ANGSD to calculate the jAFS, V U is the number of unlinked SNPs retained for demographic modeling and V T is total number of variants in the jAFS before linked SNPs were removed. Time parameters were scaled assuming a generation time of 3.5 years to be directly comparable with a previous study on the same species (Le Moan et al. 2019).
The use of AIC to rank models and of the Fisher Information Matrix to estimate parameter uncertainty relies on the assumption that genetic data are independent. For RAD data, it is generally assumed that keeping one SNP per RAD locus is sufficient to satisfy this assumption. Nevertheless, we carried out further analyses to ensure that unaccounted linkage did not lead to 1) biased estimates of parameter uncertainty and 2) favoring more complex models. Firstly, we estimated parameter uncertainty for the two best models (that according to W AIC had similar support) using block-bootstraps and the Godambe Information Matrix (GIM). Secondly, after the two best-fitting models had been identified (in this case, IM AEB2N and SC AEB2N ), we used LRT to test for the support of specific parameters among nested models, controlling for Type I error by normalizing the difference in log-likelihoods as outlined in Coffman et al. (2016).
We used LRT to test support for an SC scenario by comparing IM AEB2N and SC AEB2N models. Since the test failed to reject the simpler IM scenario, we further used LRT to test for support for a past bottleneck in the Baltic Sea population (comparing IM 2N and IM B2N models) and for an ancestral expansion (comparing IM 2N and IM AE2N models). The correct weighting of v 2 distributions for the LRT were calculated according to Ota et al. (2000).