GADMA: Genetic algorithm for inferring demographic history of multiple populations from allele frequency spectrum data

Abstract Background The demographic history of any population is imprinted in the genomes of the individuals that make up the population. One of the most popular and convenient representations of genetic information is the allele frequency spectrum (AFS), the distribution of allele frequencies in populations. The joint AFS is commonly used to reconstruct the demographic history of multiple populations, and several methods based on diffusion approximation (e.g., ∂a∂i) and ordinary differential equations (e.g., moments) have been developed and applied for demographic inference. These methods provide an opportunity to simulate AFS under a variety of researcher-specified demographic models and to estimate the best model and associated parameters using likelihood-based local optimizations. However, there are no known algorithms to perform global searches of demographic models with a given AFS. Results Here, we introduce a new method that implements a global search using a genetic algorithm for the automatic and unsupervised inference of demographic history from joint AFS data. Our method is implemented in the software GADMA (Genetic Algorithm for Demographic Model Analysis, https://github.com/ctlab/GADMA). Conclusions We demonstrate the performance of GADMA by applying it to sequence data from humans and non-model organisms and show that it is able to automatically infer a demographic model close to or even better than the one that was previously obtained manually. Moreover, GADMA is able to infer multiple demographic models at different local optima close to the global one, providing a larger set of possible scenarios to further explore demographic history.

"1) Supporting data / archival snapshot of code in GigaDB: Our data curators will contact you shortly to prepare any supporting data (and/or an archival copy of the code) for release via our repository GigaDB. Once this is sorted, please include a citation to your upcoming GigaDB dataset (including the DOI link) to your reference list, and cite this dataset in the "data availability" section and elsewhere in the manuscript, where appropriate.
Please follow this example for the reference: [xx] Author1 N, Author2 N, AuthorX N. Supporting data for "Title of your manuscript". GigaScience Database. 2019. http://dx.doi.orgxxxxxxxxxxxx You will get the exact citation, including the DOI link, from our curators." We have published the supplementary materials as GigaDB dataset and cited it in the manuscript.
"2) Citing your bitbucket repository. I see you link to a bitbucket repository in the supplemental files. Please also cite this in in the "availability" section of the paper and include a reference to the bibliography, including a DOI if available. (i.e. please cite the bitbucket repo the same way as the GigaDB data, see above).
If data and code has been modified in the revision process please be sure to update the public versions (e.g. in github or bitbucket) of this too." Bitbucket repository was added to the references and cited in the "Availability of supporting data and materials" section of the manuscript.
"3) We seem to have the tables and parameters in the supplemental files in PDF format only. I feel it would be better for re-use and reproducibility if these data and parameters are made available in re-usable, tabulated form (you may be able host them in GigaDB or bitbucket -our data curators can give advice if needed)." Supplementary tables are presented in both formats (PDF and tabular format) in GigaDB dataset. "4) Please register your new software application in the bio.tools and SciCrunch.org databases to receive RRID (Research Resource Identification Initiative ID) and biotoolsID identifiers, and include these in the "availability and requirements" section of your manuscript. This will facilitate tracking, reproducibility and re-use of your tool." We have registered GADMA in this databases and included received ID's in the "Availability of source code and requirements" section of the manuscript.
"5) At this stage, please remove any yellow highlighting that was made for the purpose of peer review." We removed highlighting.

Introduction
To understand the evolution of species and their populations, it is important to understand what events occurred in their past and when. The genetic diversity and structure of species are shaped by the combined processes of changes in e ective population size, population divergence and/or migration (gene ow) operating over the course of thousands of generations. Records of population history are imprinted in the genomes of individ-uals within species and this history can be inferred using a variety of algorithmic and statistical methods. With the rise of next-generation sequencing (NGS) technologies and abundant genome data, it has become possible to explore complex and parameter-rich demographic models that include the estimation of mutation rate, changes in e ective population size, nonrandom mating, admixture, and selection [1,2]. However, given the in nitely large number of permutations at which these processes operate over various time intervals, there is no method that can guarantee to nd the demographic model that best ts the observed data.
One of the primary methods for inferring demographic models from genomic data is based on the allele frequency spectrum (AFS), also known as the site frequency spectrum [2,3]. In essence, the AFS describes the distribution of derived allele frequencies of bi-allelic loci (SNVs) in a population or sample of populations based on the number of sequenced chromosomes [4]. An AFS can provide information about how the populations developed based on observed genetic variation sampled from current individuals of those populations. Many studies have been devoted to testing and understanding the behavior of allele-frequency spectra under di erent demographic scenarios [5,6,7,8,9].
Two of the most popular methods of historical demographic inference based on AFS are the faster continuous-time sequential Markovian coalescent approximation (fastsimcoal2, [10]) and the di usion approximation (∂a∂i, [11]). fastsimcoal2 can successfully handle more than three populations, but it is computationally challenging because it simulates multiple AFS simultaneously to estimate the most stable one. ∂a∂i simulates AFS using a numerical solution of the partial di usion equation (PDE), which corresponds to the presented demographic model and then provides the likelihood of the model (Figure 1). Unfortunately, PDE leads to some computational di culties associated with analyses of complicated demographic models and large sample sizes. As a result, ∂a∂i can only handle up to three populations. More recently, two new methods called moments [12] and Moran Models of Inference (momi2; [13]) have been introduced. moments, like ∂a∂i, is based on two models of population genetics: the Wright-Fisher generation model and the in nite sites mutational model, whereas momi2 is based on another generation model -the Moran model [14]. As momi2 is a new method, we have not considered it in the present study. moments uses ordinary di erential equations to simulate AFS, which is faster and more stable than di usion approximation in ∂a∂i, based on simulations comparing the two methods [12]. moments presents a tradeo between speed and accuracy in AFS-based demographic inference, can handle up to ve populations and provides the same interface as ∂a∂i.
Ideally, researchers seek to nd the model of demographic history that best describes or " ts" their data. ∂a∂i and moments provide an opportunity to run multiple optimizations to help t parameters of a given demographic model that maximizes the value of the composite likelihood. But optimizations based on gradient descent, for example, the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm [15,16,17,18] or its modi cations, use numerical di erentiation and are ine ective in practice, because of the complex structure of the search space for demographic models. Optimizations may also be ine cient due to another set of methods that o er existing solutions based on local search algorithms without gradients, such as the Nedler and Mead method [19] or Powell's method [20]. As a result, all existing optimizations nd local optima close to the initial values and require many runs to be performed using di erent initial model parameters, most of which are unknown or lack empirical data. Despite these drawbacks, ∂a∂i and moments are e cient instruments, since they can simulate an allele frequency spectrum from any demographic model. In other words, the problem of nding a demographic model from the allele frequency spectrum is the inverse problem, which can be solved by solving the direct problem, that is simulation of the AFS from a given demographic model, with approximate numerical methods, such as di usion or moments approximations. The lack of accurate and rapid differentiation and the complex structure of the search space led us to consider the usage of global optimization methods, such as the genetic algorithm.
The genetic algorithm (GA) [21] is one of the most e cient heuristic algorithms for global searching of complex and rich parameter space. Its primary application is optimization of a tness function, which is either not di erentiable or it cannot be di erentiated in a su ciently e ective way, for example, when the function is not representable in "closed-form expression". GA is based on the principle of evolution and simulates natural selection using operations of "mutation" and "crossover", which ideally results in the most adapted individual, the one which has the best value of the tness function. The versatility of the GA has led to its wide application, including reconstruction of phylogenetic trees [22], ancestral genome composition inference [23], and evolutionary biology in general [24].
In this paper we present a new method based on the genetic algorithm to automatically infer the best tting demographic model from AFS data for two or three populations. Our method assumes the ability to simulate the AFS from the demographic model, for example, using either ∂a∂i or moments. The GA framework overcomes limitations of local search optimizations and is more exible in handling the complexity of demographic models by allowing an increase in the number of parameters and estimation of parameters such as functions of population size changes that are sudden, linear or exponential. We implemented our method in the software GADMA (Genetic Algorithm for Demographic Model Analysis), which is written in Python and available at Github (https://github.com/ctlab/GADMA).

Materials and Methods
This section provides de nitions of the allele frequency spectrum and the composite likelihood scheme that is used in existing optimizations (∂a∂i and moments) and implemented in our method. After this background, the problem of demographic model search from observed AFS data is formulated in terms of computer science. We then introduce a developed representation of a demographic model for a general approach to our method, including the genetic algorithm with its operations of "mutation" and "crossover".

Basic de nitions and concepts
Assume there are P populations and for each population i there exists information about n i chromosomes. The AFS is the Pdimensional array A, where each entry A[d 1 , . . . , d P ] ∈ N, d i ∈ [0, n i ], ∀i ∈ [1, P] records the number of SNVs (relative to the common reference genome), which are exactly seen at d 1 chromosomes from population 1, d 2 chromosomes from population 2, . . . and d P chromosomes from population P. For example, if we have two populations, then the AFS is a two-dimensional matrix, where A[i, j] represents the number of polymorphisms that occurred exactly in i individuals in the rst population and in j individuals in the second population ( Figure 1). Several examples of allele frequency spectra are presented in Figure 2.
Assume that we can simulate the AFS M from the proposed demographic model. Assuming no linkage between derived alleles, each element of the allele frequency spectrum S[d 1 , . . . , d P ] is an independent Poisson value with a mean equal to M[d 1 , . . . , d P ]. We then calculate the likelihood -the probability of obtaining the observed spectrum S, if the expectation spectrum is M, as the product of (n 1 + 1)(n 2 + 1) . . . (n P + 1) Poisson likelihood functions: In the case of linked alleles, L(M|S) is the composite likelihood. Demographic models inferred by ∂a∂i and moments can be compared by computing the log(L(M|S)). Because L(M|S) ∈ [0, 1], then log(L(M|S)) ∈ [-inf, 0] and the greater is the log-likelihood, the better the model ts the observed AFS. In this paper, log(L(M|S)) was chosen as the tness function of the genetic algorithm, as discussed below.

Formulation of the problem
Consider a function f(Θ, A, C) that takes the parameters Θ = and returns the measure of the correspondence between the parameters Θ and the AFS, A.
The function f(Θ, A, C) builds a demographic model with respect to the parameters Θ that unambiguously determine it, calculates the expected AFS M with respect to the constants C, and then determines the degree of similarity between M and the observed A by the composite likelihood. The constants can be various parameters of algorithms for calculating the expected AFS, such as grid sizes for the numerical solution of a di erential equation in ∂a∂i, or population models parameters, including the average number of new mutated sites per individual in a generation θ 0 for the in nite-sites mutational model, or the time T g for one generation in the Wright-Fisher model. The function f can have di erent implementation details. Here ∂a∂i and moments were selected for this purpose.
The purpose of this work is to develop an algorithm to search for the demographic model that best corresponds to the observed AFS. Formally, the problem can be formulated as follows: -the set of constants.

Output
• The set Θ ∈ R N Θ of values, that maximize the value of f: There are approximate solutions of this problem with an additional input -a demographic model with a xed number of parameters, using various local search algorithms, but in practice, as mentioned above, these algorithms have proven to be ine ective. We present a new algorithm for the approximate solution of this more general problem using one of the most e ective methods of global optimization -the genetic algorithm.

Representation of the demographic model and its structure
Assume a division of one ancestral population into two new isolated subpopulations. Then the number of population splitting events directly depends on P, the number of considered populations. We represent the demographic model as a sequence of "time intervals" and population splits, each of which has a xed number of parameters. Assume a xed temporal order of the current observed populations: from an ancient ancestral population to the more recently formed subpopulations. This temporal order is usually known or can be imputed. If the number of populations is ≤ 3, then each split will divide the last formed population. Thus, a splitting event has only one parameter -the fraction of the population, which separates to form a new subpopulation.
The next important component of the demographic model is the concept of the time interval. First, we de ne this as a segment of time during which a certain dynamic of change of e ective size is maintained for each population. We consider three main demographic dynamics of population growth: sudden, linear and exponential change of the e ective population size ( Figure S1). Sudden change is very popular for its simplicity, but exponential change is a commonly used model for population growth as well. We include linear change as it is tradeo between sudden and exponential change and is more realistic than sudden change. Secondly, the parameters of migration rates between populations are constant during a given time interval. Thus, each time interval has the following parameters: • time, • e ective population sizes at the end of the time interval, • demographic dynamics of e ective population size change, • migration between populations, if there is more than one population.
The sizes of the populations at the beginning of any time interval are equal to the sizes of the populations at the end of the previous interval. The rst time interval is a special one: we consider that it lasts from the beginning of the existence of the species and assume a demographic dynamic of sudden change for the e ective population size of the ancestral population [11]. Therefore, in this interval the only parameter estimated is the size of the ancestral population. Note that the number of splitting events is determined by the number of populations under consideration, but the number of intervals can be varied and thus change the number of parameters of the demographic model, its detail, and complexity.
We now can de ne the concept of the structure of the demographic model. In the case of an ancestral population splitting into two subpopulations, the structure of the model will include a number of time intervals that occur before and after a single splitting event. In the case of three populations, the structure includes a number of time intervals prior to the rst split, those between the rst and second split, and the ones after the second split. For example, assume we observe three populations. At the beginning, there was an ancestral population (P A ) and this population started to grow in e ective size. Then a split occurred that divided this ancestral population into two daugh- Example of a demographic model with a (2,1,1) structure. Time is shown along the x-axis and population size on the y-axis. Four time intervals are shown: T 1 , T 2 , T 3 and T 4 , and two populations splits: S 1 and S 2 . The structure of this model is (2,1,1) because T 1 and T 2 are time intervals for one ancestral population, followed by split S 1 , T 3 is the time interval for two populations, and T 4 is the time interval for three nal populations after the second split S 2 . Time interval T 2 has the following parameters: time of this interval, size of the ancestral population at the end and the dynamic of size change. Time intervals T 3 and T 4 for each population will contain the same parameters plus migrations between populations. Split events S 1 and S 2 have fraction of size split as parameter. The rst interval contains the size of an ancestral population, but it can be ignored as it could be implicitly evaluated from other parameters [11].
ter populations (P 1 and P 2 ) that changed in e ective size during one interval, followed by a split of the second population (into P 2a and P 2b ), resulting in three descendant populations that changed within one subsequent time interval. The structure of such a model would be described as (2,1,1) ( Figure 3). The simplest model structures would be for two populations -(1,1), and for three populations -(1,1,1).
More formally, the structure of the model is a sequence of the form where P ∈ {2, 3} is the number of populations. In this case, the number of parameters N Θ (S * ) of the demographic model with the structure S * will be determined as follows: The term (P -1) corresponds to the number of split parameters, and P i=0 N i θ (s * i ) is the number of time intervals parameters. This number of parameters is valid for the genetic algorithm that is described below, but the number of nal model parameters is di erent. During the local search, which occurs after the genetic algorithm, the dynamics of population size change are xed and the nal number of parameters N Θ (S * ) is: Thus, we can unambiguously interpret the demographic model according to the list of parameters and its structure by xing for each time intervals a certain order of parameters.

General approach
The general algorithm is a series of executions of the genetic algorithm ( Figure S2). Suppose we have the initial and nal structures that de ne the initial and nal number of parameters (de nition of the model structure is presented in the previous section) of the demographic models, derived from considerations about the populations we are trying to model. During the genetic algorithm, the structure of the model does not change; that is, the model's parameters for the current structure are optimized. This restriction makes the procedure of crossover in the genetic algorithm simpler. As soon as the genetic algorithm stops, the parameters of the best-tting model are adjusted in order to improve the likelihood with a local search algorithm. If the structure of the model is not quite complex enough, that is, some value in the model structure (e.g., model structure (1,1,1)) is less than the corresponding value in nal model structure (e.g., model structure (2,1,1)), its complexity (in terms of parameters) is increased and the genetic algorithm is run again for the new model structure. The genetic algorithm and local search are executed until the best-tting parameters of the model with the nal model structure are obtained.

Akaike information criterion.
With an increase in the number of model parameters, we risk over tting the model. A model with a large number of parameters will be better able to nd parameter values corresponding to the observed data than a model with a smaller number of parameters, but at the same time it will correspond less to reality, for example, due to data errors. The Akaike information criterion (AIC) [25] is commonly used to compare models with di erent numbers of parameters: where k is number of parameters of the model and log(L(M(θ)|S)) is the value of the log-likelihood function.
The composite likelihood Akaike information criterion (CLAIC) is a modi cation of the AIC for composite likelihoods, which is important to implement when SNPs that are used to build the AFS are linked [26,27]. CLAIC is de ned as follows: where J(θ) and H(θ) are the variability and Hessian matrices, respectively: The smaller the AIC or CLAIC score is, the better the model ts the observed data. In practice, calculation of the CLAIC is very challenging. Co man et al., 2015 [26] applied bootstrapping to adjust composite likelihoods during statistical inference of demographic history using the programs ∂a∂i and TRACTS [28] and thereby calculate the CLAIC. This implementation was included in GADMA. Therefore, to obtain an accurate CLAIC score, one should perform block bootstrapping over unlinked regions of the genome.
Obviously, for AIC it is enough to compare only the nal models for each model structure after the local search, since the number of parameters between the increases in model structure does not change, and therefore the value of the AIC score depends only on the likelihood values. In the implementation of GADMA, if the models with the best likelihood and best AIC or CLAIC (depending on whether SNPs are linked or not) score do not match, the user is informed about the over tting. There are other methods for determining the selection of the best demographic model given an AFS data set . For example, likelihood ratio tests, which were introduced by Co man et al., (2015) [26]. However, it is not possible to use them, because they assume nested demographic models, which is incorrect in our case, as the dynamics of population change can vary during the genetic algorithm.

Genetic algorithm
The genetic algorithm is one of the most e ective heuristic algorithms [21]. It is based on the principles of evolution, where the aim of the algorithm is to nd an approximate solution to a problem that has the maximum or minimum value of the tness function. At the beginning of the algorithm, a xed-sized set of random solutions, called individuals, is created. The set itself is called a generation. Each individual is assigned a value of tness, which is determined by the value of the tness function. After this, new generations are iteratively produced with the help of mutations, crossover and selection of the ttest individuals (i.e,. the models with the highest likelihood scores). All these operations can be either deterministic or random, and their order can vary in di erent implementations. In our case, individuals are demographic models of the same structure, and the tness function is the log-likelihood, log(L(M|S)), as described in Materials and Methods.
In the rst step of GADMA, a set of demographic models are randomly generated, if they have not been already speci ed. To form a new generation of demographic models, we select the most adapted models among a set of mutated, crossed, and random models in the previous generation. The value of the tness function is used to select the most adapted models. The choice of models to be mutated or have crossover is random, but the probability of choice is directly proportional to the value of tness: the better the tness of the model is, the more likely it is to be selected. The genetic algorithm stops when it can no longer obtain a better demographic model by the value of the tness function for several iterations ( Figure 4).

Mutation of the demographic model.
The mutation of the demographic model ( Figure 5) is equal to the process of changing the values of several parameters. There are two constants associated with mutation of the model: the rate and the strength of mutation. The number of parameters to be mutated is sampled from a binomial distribution with a mean equal to the mutation strength. Parameters that are mutated are chosen with the probability that is directly proportional to their weights, which at the beginning are equal (that is, the choice is equally probable), and then each weight can be increased when a mutation of the corresponding parameter has occurred, which leads to an improvement in the model. The measure of how much the value of each parameter is mutated is determined by the sign (+1 or -1, which are equally probable) and the rate of the mutation that is randomly sampled from the normal distribution, with the mean equal to the mutation rate and a variance equal to half of the mean. Among the parameters that can be mutated during estimation of the demographic model is the mode of population size change (sudden, growth, and exponential). If this parameter is chosen to be mutated, then the value (mode) will change to one of the other two population size change dynamics with equal probability.

Adaptive mutation rate and mutation strength.
In the initial iterations of the genetic algorithm, strong mutations of a large number of parameters are much more e ective than weaker mutations of a small number of parameters, whereas when approaching the optimal solution the opposite is true. Therefore, the rate and strength of mutation can be adaptive, that is, it changes during the operation of the algorithm. There are several ways to make an adaptive value, one of the most popular being the one-fth algorithm [29]. First, we apply it to the mutation rate: at each iteration, if we have a "successful" solution, that is, it becomes better after mutation, then we multiply the mutation rate by the constant C ∈ [1,2]. If the solution is not "successful", then we divide by a fourthdegree root of C, decreasing the mutation rate. In the case of the mutation strength for the "successful" solution, it is necessary to additionally check whether the decision has become the best solution during the entire course of the algorithm's run.
Thus, often getting a new best solution with a mutation that occurs at the beginning of the genetic algorithm, we increase the number of parameters that are changed during the mutation operation and the degree to which they are changed. As we approach the optimum solution and decrease in frequency, the number of parameters will also decrease and lead to a more accurate search. An increase in the number of parameters being mutated leads to a more e cient crossover. At the same time, the mutation rate is changed more frequently than the mutation strength, which makes the mutation procedure more e ective.

Crossover of two demographic models.
In order to have crossover of two demographic models, the models are represented as sequences of parameters. Each parameter of their descendant is chosen randomly with equal probability from one or the second parent ( Figure 6). Since the structure of models do not change during the operation of the genetic algorithm, the number of parameters for all models will be the same. Consequently, the parameters can be unambiguously interpreted and easily crossed according to these criteria.

Local search algorithms.
Local optimizations are e ective in cases when the initial solution is close to optimal. They are more accurate in adjusting parameters than the genetic algorithm, and can signi cantly improve its result. ∂a∂i and moments provide the following choice of algorithms for local optimization: • The Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm. • L-BFGS-B -a modi cation of the BFGS algorithm, which is more e cient when the optimal parameters are close to the bounds. • The Nelder-Mead method or simplex-method. • Powell's method.
The rst two methods are gradient descent optimization algorithms [30] and the last two do not use a gradient. Powell's method was proposed by the authors of moments and was noted as the most e ective, so it was adapted for usage with ∂a∂i and for our experimental studies (see below) it was chosen as the local search algorithm.

Increasing the complexity of the structure of the demographic models
We need to be able to increase the complexity of the structure of the demographic model in order to nd an optimal solution. To do this, a time interval is selected and then divided into two equal intervals (i.e., the division based on the median). The time interval is chosen randomly on the basis that the new structure S * should not become greater than the nal S F according to one of its values: The values of the parameters of the newly formed time intervals are calculated for the parent: the size of the population of the rst time interval is equal to the size in the middle of the parental time interval, and the parameters of the second time interval are equal to the population size at the end, with the time of both intervals equal to half of the parental time, while migration between populations and the demographic dynamics of population size change remain the same. In essence, the demographic model has more parameters after its structure is increased, however the history and likelihood remain the same.

Results
We implemented the method described above in the program GADMA (Genetic Algorithm for Demographic Model Analysis), written in the Python programming language. We explored the e ciency of our method using simulated data and several previously published data sets. It is important to note that tests using simulated AFS data should be interpreted with caution, these analyses and their associated demographic models are simpli ed for the sake of computational e ciency and additional tests must be performed. For previously published data sets, GADMA was used to rst infer demographic models for two and three modern human populations using the data set analyzed by Gutenkunst et al., (2009) [11]. For these analyses, di erent parameter values within GADMA were examined, including the initial structure of the demographic models and using either ∂a∂i or moments to infer the optimal model of demographic history. The case of two human populations when initial structure and the nal structure are not equal corresponds to the usage of the increase of the model complexity feature that improve the result by nding simple models rst and detail them later. We then inferred demographic models for the history of two populations of the Gillette's checkerspot buttery, Euphydryas gillettii, and these were compared to the previous models reported by McCoy et al., (2014) [31]. Lastly, we used GADMA to reconstruct the demographic history of the Gaboon forest frog, Scotobleps gabonicus, which occurs in Central and Western African rainforests, based on a data set generated and originally reported by Portik et al. (2017) [32].

Tests on the simulated AFS data
In order to demonstrate that GADMA is computationally and statistically e cient, three datasets were simulated with moments using the following sudden population size dynamics (all parameter values are presented in Tables S6-S8): i. Bottleneck model for one population (4 parameters), ii. Simple ancestral population division with asymmetric migration between two descendant populations (5 parame-ters), iii. Secondary contact with symmetrical migration for three populations following split of one the two descendant populations (8 parameters).
All simulated AFS were unfolded with a size of 20 chromosomes per population. Powell's method was chosen as local search algorithm. All runs were repeated 50 times for one and two populations and 10 times for three populations.
GADMA was compared with two methods: 1) local searches starting from di erent initial values, and 2) using the ∂a∂i pipeline, which was readjusted for moments usage. The number of initial parameters for the rst method and the number of replicates in the ∂a∂i pipeline were selected so that the mean number of tness function evaluations was almost the same as in GADMA. For example, in case of one population: 40 initial points for the local search and ve rounds of 10, 10, 10, 10 and 20 replicates for the ∂a∂i pipeline. For the comparison, GADMA was launched for the same demographic models as the local search approaches. But two additional GADMA variations were included: one with all parameters, but with xed dynamics of population sizes (sudden), and a second with variation of these dynamics. Therefore, ve di erent optimizations were compared and the results including parameters values, maximum, mean and standard deviation of log-likelihood are provided in Tables S6-S8. For one population, 50 runs of the ∂a∂i pipeline showed the best demographic model compared to the other methods. Although, the best model obtained using GADMA had slightly higher likelihood score, GADMA was better on average. During demographic inferences without limitations on the population dynamics, the known problem about the uncertainties of the AFS appeared: the model with an early exponential bottleneck has almost the same likelihood as the model with sudden population size changes. All three versions of GADMA showed nearly identical results with the better mean and standard deviation of the likelihood score compared to local search and ∂a∂i pipeline.
For two populations, all methods were comparable in terms of the best model inferred after 50 replicate runs. All optimizations except the GADMA without limitation on size dynamics were comparable in terms of mean and variation of likelihood. The local search from di erent initial points and GADMA with prior knowledge about model were both able to infer the true model. Two additional simulations with GADMA also showed close to the optimum demographic models and the optimization without limitation on size dynamics inferred sudden changes as expected.
For three populations GADMA with presized model received the best maximum, mean and variation of likelihood among all observed methods. Both additional optimizations without prior demographic model knowledge showed symmetric migrations and all extra parameters close to true values. GADMA without limitation on size change dynamics inferred not zero migration after split of ancestral population, however, we assume that this could be because not enough runs were performed. All inferred dynamics in spite of the fact that not all of them are sudden showed the constancy of size change during time intervals. Thus we could argue that GADMA could infer the true demographic history without any knowledge of its model.

Testing the human Out of Africa model with GADMA
One of the most popular demographic history models for human populations is the so-called "Out of Africa" model, which consists of three populations [11,12,33,34]: • YRI -Yoruba individuals from Ibadan, • CEU -Utah residents with ancestry from northern and western Europe, • CHB -Han Chinese individuals from Beijing.
In order to demonstrate the e ectiveness of our method, we choose to use the allele frequency spectrum from the paper of Gutenkunst et al, (2009) [11], in which the ∂a∂i method is introduced and the demographic history models for two (YRI, CEU) and three (YRI, CEU, CHB) populations are inferred from this spectrum. These models (Figure 10a and Figure 11a) were obtained from a large number of local optimization launches and also have a number of restrictions on the number of population parameters: the size of the YRI population does not change after the rst expansion of the ancestral population, migrations are symmetrical and the dynamics of population size change are xed as sudden, except for the last time interval for CEU and CHB, where exponential growth occurs. The model for two populations has a total of 6 parameters ( Table 2) whereas the model for three populations has a total of 14 ( Table 4).
The 21 × 21 × 21 AFS was constructed in Gutenkunst et al., (2009) [11] on the basis of the Environmental Genome Project [35]. All biallelic SNVs from non-coding regions of 219 genes (totaling 5.01 Mb) were considered and the e ective length of the used sequence was equal to L = 4.04 × 10 6 . We used the same neutral mutation rate equal to µ = 2.35 × 10 -8 per site per generation and the same generation time equal to T g = 25 years as in Gutenkunst et al., (2009). Thus, the average frequency of mutation in one individual per generation is equal to θ 0 = 4µL = 0.37976.
The following parameters of the genetic algorithm were used: the size of the generation of the demographic models was chosen to be equal to 10, the strength and mutation rate was equal to 0.2, and the proportions of the best, mutated, crossed and random models in the new generation were 0.2 : 0.3 : 0.3 : 0.2. The strength and rate of mutation were adaptive with the constants of 1.05 and 1.02, respectively. The AFS was simulated using ∂a∂i with a G = {40, 50, 60} grid size, the value of the likelihood was considered signi cant to the second decimal point, and the genetic algorithm stopped after 100 iterations without improvement. As for the local optimization search, the Powell method was chosen.
For the human population data, we used the block bootstrapped data set from Gutenkunst et al. 2009, where it was done over 219 sequenced loci under the assumption that the loci are well separated and can be treated as independent. The con dence intervals reported in Table 2 and Table 4 were calculated as θ * ± σ(θ * ), where θ * denotes the maximum likelihood values of parameters, θ * and σ(θ * ) -the mean and standard deviation of the bootstrap results. All our model parameters are positive by de nition, so their logarithms were used to calculate con dence intervals. In case of three human population con dence intervals are di erent to those from the original paper Gutenkunst et al., (2009). However this fact popped out in our analysis and we argue that is because of di erent way of optimization: for each bootstrapped data only one local optimization with wider search intervals for parameters values was launched from found optimum point.

The YRI, CEU two population example
Three demographic models were inferred from the same AFS: using the same parameters as in Gutenkunst et al. (2009) [11] (model 1) and two with all possible nine parameters (models 2, 3) but with di erent initial demographic model structures. Model 2 had a structure (1,1), which then expanded to (2,1) during the GADMA run, whereas model 3 had an initial structure of (2,1). We ran GADMA 10 times for each of the three models (Table 2, Figure 7). All three models resulted in likelihoods better than the nal demographic model originally inferred in Gutenkunst et al., (2009). The parameters of models 1 and 2 are not signi cantly di erent, and model 3 has the best likelihood value and CLAIC score. Model 3 shows a lower population size of Europeans, a larger rate in their growth (from 25 individuals to 9 thousand) and a shorter separation time than the best model found in Gutenkunst et al., (2009) as well as models 1 and 2. Migration rates between the populations were chosen to be asymmetric, but they are largely equal to each other and coincide with values among the three models.
To demonstrate the ine ciency of the methods of local optimization for the model from Gutenkunst et al., (2009) [11], one of the methods proposed by ∂a∂i, BFGS, was launched 100 times. In each run, the initial parameters values were chosen randomly. The best value of the log-likelihood was -1629.24, which is quite far from the optimal value of -1066.35 reported in Gutenkunst et al., (2009). The average time of one optimization run was about 21 minutes. We then used a more e cient local optimization method implemented in the dadi-pipeline developed by Portik et al., (2017) [32]. It implements a scheme of sequential runs using the Nelder-Mead local optimization with initial random parameter values and perturbation of values between runs. We used the following settings of the dadi-pipeline tool: three rounds with 10, 20, 50 replications with 3-, 2-, and 1-fold perturbations, respectively, for each round. The dadipipeline was launched 50 times and the best resulting model had a log-likelihood equal to -1073.98. The average run time for one launch of optimization was approximately 10 minutes. To compare the runs with di erent initial demographic model structures (models 2 and 3), characteristics such as time for one iteration of the genetic algorithm, number of iterations, and mean and standard deviation of the log-likelihood value were calculated (Table 1). Launches with a simple initial model structure show a more stable result in terms of the likelihood value but they have a longer average run time for one iteration. Furthermore, all the models obtained for simplestructure launches have the same demographic dynamics of population size change and similar parameters as the nal model reported in Gutenkunst et al., (2009), which is incorrect in cases involving launches of complex-structure models, as the best model shows a linear growth of the European (CEU) population as opposed to exponential growth in other cases. At the same time, although the launches of models with complex model structures result in a nal model with a better loglikelihood score, it di ers from the other models in terms of parameters values, which may indicate that it is inaccurate.

The YRI, CEU, CHB three population example
We also applied GADMA to infer demographic models for the case of three human populations based on same the AFS used in Gutenkunst et al., (2009) [11]. The rst model (model 1) used the same parameters as in Gutenkunst et al., (2009), and their corresponding values were inferred (Table 4). GADMA yielded The demographic model from the same allele frequency spectrum inferred using GADMA with 12 parameters. Note that the di erences between the inferred models is the slightly later age of the split between the YRI and CEU populations and the linear population growth (as opposed to exponential growth) in the CEU population in the model obtained with GADMA. Table 2. Maximum likelihood parameters for di erent demographic models for two human populations (YRI and CEU) inferred using either ∂a∂i or GADMA (the latter under three di erent parameter settings). The migration rates are per generation. 95% con dence intervals are indicated in brackets.

Model from ∂a∂i example
Model from GADMA (1) Model from GADMA (2) Model from GADMA (3  better log-likelihood values for parameters than those reported in Gutenkunst et al., (2009). However, the timing of the split between the YRI population and the CEU+CHB populations was dated to 400,000 years ago, which is 250 thousand years older than that inferred in any previous studies. To correct this, we restricted the age of this splitting event to 150,000 years ago based on previously published estimates [36,37,38]. A demographic model (model 2) was inferred with this age restriction, which yielded a better log-likelihood value than that in Gutenkunst et al., (2009). Next, we inferred the demographic model (model 3) that included all 20 parameters. Here we also observed an unrealistic earlier age for the divergence between YRI and CEU+CHB (results not shown). When we applied the 150, 000 year age constraint as in model 2, we inferred a demographic model (model 3) that not only showed the highest log-likelihood value, but also the best CLAIC value.
As in the case of the two human population example, we also tested the three population case with the BFGS local optimization used in ∂a∂i. We launched the optimization 100 times from randomly selected initial parameters and the best log-likelihood value obtained was -6323.99, which slightly differs from the optimal log-likelihood -6316.89, which is much less than that from the comparison for two populations.
With the exception of the earlier age of divergence between the YRI and CEU+CHB populations in model 1, demographic models 1 and 2 and the one inferred by Gutenkunst et al., (2009) have similar log-likelihood values and parameter estimates. In model 3, which has 20 parameters and the best CLAIC value, some parameters are also quite similar to the values in the other two models. The major exceptions, however, are the inferred migration rates and population size of the Eurasian population, which exponentially grows from 200 to 1500 individuals after the split of the ancestral population. For comparison, in other models this number is a constant equal to 2,000 in-dividuals, which seems less realistic than exponential growth. Migration rates vary considerably: they are higher in model 3 compared to models 1 and 2 or the one found by Gutenkunst et al., (2009). Model 3 shows that the largest migration occurred between the YRI and CEU populations, and following the ancestral division, between the CEU and CHB populations. Moreover, the more geographically distant the populations are, the smaller is the observed migration rate. GADMA was launched 10 times for each of the three models using ∂a∂i and the best solutions were observed ( Table 2). We also launched GADMA using moments 10 times to compare its e ectiveness with ∂a∂i. The authors of moments conducted similar comparisons on simulated data [12]. Various characteristics of run time and stability of the log-likelihood value based on the results of 20 GADMA runs (10 using ∂a∂i, 10 using moments) are presented in the Table 3. Log-likelihood values of models inferred using moments were recalculated using ∂a∂i with the G = {40, 50, 60} grid size simulated AFS so that log-likelihoods obtained with the two methods were comparable. Analyses using moments were 7.5 times faster than ∂a∂i,   whereas those using ∂a∂i were more accurate: the average and variance of the likelihood values of the inferred models were better than the values inferred using moments.

Estimation of the CLAIC for human population data
In order to compare demographic models of the human data set with di erent numbers of parameters, we estimated the CLAIC scores for each model.

Demographic history of Gillette's checkerspot buttery
We next tested GADMA using data from McCoy et al., (2013) [31], which examined the demographic history of Gillette's checkerspot butter y (Euphydryas gillettii). We used the same AFS as that used in the original paper, which consisted of eight individuals from a population in Colorado (CO) and eight individuals from the native population in Wyoming (WY). For our ana that used in the original paper, which consisted of eight individuals from a population in Colorado (CO) and eight individuals from the native population in Wyoming (WY). For our analyses, we used two allele frequency spectra of size 13 × 13, one for synonymous SNVs only and one including all SNVs.
McCoy et al. tested three types of demographic models: 1) type A -models without migration between populations; 2) type B1 -models with unidirectional migration from CO to WY, and 3) type B2 -models with bidirectional migration between CO and WY. In the original paper, the three demographic models were tested using the AFS based on synonymous SNVs only. Model A had the best CLAIC value, so the type A demo- graphic model was inferred from the AFS using all SNVs. We used models A and B2 to infer the demographic models from both allele frequency spectra (synonymous SNVs and all SNVs) with GADMA.
Without considering migration, the models used in McCoy et al., (2013) had the following structure: there was one population of N A size, which at some point in time divided into two subpopulations, the size of which did not change further (sudden change of population size). All parameters were calculated with respect to N A and had the following notation: η WY , η CO -relative population sizes at the current time; τ SPLIT -time/age of the population splitting event; and M WY-CO , M CO-WY -scaled migration rates. For the models we inferred using GADMA, we also included the parameter η WY0 ∈ [0, 1]the size of the WY population immediately after the splitting event or the fraction of the size of the ancestral population that forms the WY population. The size of CO population immediately following the splitting event is equal to 1η WY0 , because N A = 1 before the ancestral population splits. However, in the case when the population size change is sudden, the size of the populations following the splitting event is equal to the size at the present time.
We ran four executions of GADMA with di erent data inputs: 1) the AFS using synonymous SNVs only without migration; 2) the AFS using synonymous SNVs only with migration; 3) the AFS using all SNVs without migration; and 4) the AFS using all SNVs with migration. For each execution, the analysis was repeated 50 times. moments was used to simulate the AFS due to its faster computational speed. However, of the nal log-likelihood scores that are presented in Tables S1, 5  The length of one generation of the demographic models was selected to be 10, the strength and mutation rate were set to 0.2, with constants 1.0 and 1.02, respectively, and the proportions of the best, mutated, crossed and random models in the new generation equal to 0.2 : 0.3 : 0.3 : 0.2. The likelihood was considered signi cant to two signi cant digits and the genetic algorithm stopped after 100 iterations without improvement. As for the local search, the Powell method was chosen. Since the structure of the demographic model from McCoy et al., (2013) corresponds to the simplest structure (1,1), it was chosen as both the initial and nal structure of the demographic model.
The con dence intervals were calculated for all resulting models (Table S1 and Table 5) the same way as was done for the human population data. In the case of the demographic models for the Gillette's checkerspot butter y, all our parameters are positive. McCoy et al., (2013) did not use logarithms to calculate con dence intervals; they just assumed the lower bound of the intervals to be positive. However, when we used logarithms to calculate con dence intervals, we found them to be extremely wide for migration between the two populations. Another problem we encountered was the performance associated with bootstrapping the data for the Gillette's checkerspot butter y. The data was derived from RNA-seq data, in which di erent alleles are likely linked. Bootstraping should be performed over unlinked regions of the genome. McCoy et al. 2013 provide the assembly of the transcriptome, so it became possible for us to perform bootstrapping over the contigs of the assembly. This was applied to generate con dence intervals and CLAIC scores.
Several runs of GADMA produced di erent local minima and the resulting alternative models and their inferred parameters are presented in Figure 9 and Table 5, respectively. We note that one of the models selected by GADMA, Model 26 (Figure 12d For all models, the CLAIC values were calculated and are Table 5. Demographic models and their associated parameters inferred in GADMA using an allele frequency spectrum based on all SNVs for Gillette's checkerspot butter y populations. 95% con dence intervals are indicated in brackets. The rst column shows the parameters inferred by McCoy et al., (2013  presented in the Table S5. The scores were calculated on the bootstrapped data to generate con dence intervals by sampling over the contigs of the transcriptome assembly to avoid linkage (see above). For the AFS generated from synonymous SNPs, the model without migrations showed the best CLAIC score. However, the model with migration, which had the best likelihood, had the best CLAIC score for the AFS generated from all SNPs.
One of the ndings reported by McCoy et al., (2013) was that the demographic models that included migration may not be applicable when the real population history shows no evidence for migration. This conclusion was based on two factors: 1) the type A demographic model, inferred from the AFS using synonymous SNVs, had the best AIC score; and 2) the estimate of the splitting time of the ancestral population in models that included migration had wide 95% con dence intervals, such that the parameter boundaries included zero (see McCoy et al., (2013)). However, during our regeneration of these authors' results, we found errors associated with some of the parameters of the demographic models: the split time of the ancestral population and migrations were mixed up with each other, resulting in migrations, but not split times, having wide con dence intervals that included zero. Moreover, our analyses using GADMA showed that the 95% con dence interval for the split time of the ancestral population is rather good (e. g. [0.189 -0.323] for the case of the best B2 model) and that the migration rates are so small that they can be considered as zero. Therefore, the best-tting model that includes migration suggests a demographic history without migrations. The demographic models inferred by GADMA also have negligible migration values, and demographic models of type B2 (those with bidirectional migration), inferred as the best-tting models, had better CLAIC values than type A models.
The average population size of butter ies from Colorado (CO) was estimated as N CO = 34 individuals by McCoy et al., (2013). If we scale the parameters of the best-tting model so that the average size of the CO population is equal to this value (x · (η CO + (1η WY0 ))/2 = N CO ), then for best model we get 33.6 generations (t SPLIT = 2 · x · τ SPLIT ) after the division of the ancestral population, which corresponds to the actual 33 generations observed . Such estimates were made for all resulting models (Table 5). However, the best-tting model had the best estimated value of split time of the ancestral population (33.6 generations).

Demographic history of the Gaboon forest frog
For our third evaluation of performance, we compared the demographic models inferred using GADMA with those inferred with the recently developed optimization method implemented in the dadi pipeline [32]. This pipeline uses ∂a∂i to simulate an AFS and infers parameters of the researcher-speci ed demographic model by several rounds of consistent runs using the Nelder-Mead local optimization method. During the rst round, random parameter values are estimated and during each successive round, the best parameters from the previous rounds are used. Prior to each Nelder-Mead local optimization, current parameter values are perturbed. Portik et al., (2017) demonstrated the dadi pipeline using allele frequency spectra generated from the Gaboon forest frog (Scotobleps gabonicus) and these same data was used to perform the analyses using GADMA. Sampling included eighty-four samples from 33 localities of Lower Guinea, West Africa, which were divided into Northern and Southern populations according to hierarchical Bayesian clustering analysis of 7, 633 unlinked SNPs generated using RADseq. Each Northern and Southern group was further divided into three clusters: Cameroon Volcanic Line North (CVLN) and Cameroon Volcanic Line South (CVLS) and Cross River (CrossRiver) populations for the Northern group, and North Coast, South Coast and East Gabon populations for the Southern group.
In order to perform our analyses using GADMA, three twodimensional folded allele frequency spectra were chosen from Portik et al., (2017) [32]: 1) a 41 × 19 spectrum for Northern and Southern populations; 2) a 31 × 19 spectrum for CVLN and CVLS; and 3) a 15 × 31 spectrum for the CrossRiver and CVLN populations. In generating the allele frequency spectra, only a single SNP per RAD locus was kept, so loci are assumed to be independent. For each spectrum, we estimated best-tting loglikelihood values, AIC scores, Akaike weights (ω i ) [39] and parameters for 14 demographic models. Two demographic models assumed no divergence between populations, while the remaining 12 models assumed a split of the ancestral population and di erent assumptions related to migration rates, isolation, and population size changes. All population size changes were considered to have sudden change dynamics, which we included in our analyses as well. For all models, we provide inferred parameter values in Tables S2, S3, and S4, including the value of θ = 4N A µL, where µ is the mutation rate per generation per site and L is e ective sequence length.
The best-tting demographic model inferred using the dadi pipeline for the Northern and Southern populations only suggests a population expansion, followed by secondary contact and symmetrical migration between populations (∆AIC = 13.6, ω i = 0.99). For the CVLN and CVLS populations, the best demographic model included population expansion, secondary contact and asymmetrical gene ow from CVLN to CVLS (∆AIC = 94.2, ω i = 0.99). Finally, two demographic models explain the AFS data equally well for the CrossRiver and CVLN populations: one with secondary contact but no population size change and asymmetrical migration from CVLN to Cross River (ω i = 0.53) and another that included an ancestral population division with continuous asymmetric gene ow (ω i = 0.43).
We inferred all 12 demographic models with divergence as in Portik et al., (2017) for each of the three allele frequency spectra with GADMA. Each of 36 runs was repeated 10 times as opposed to the 50 runs launched with the the dadi pipeline. We originally had planned to conduct the analyses with ∂a∂i to simulate the allele frequency spectra with the same grid size as that used by Portik Tables  S2-S4. We report demographic models estimated with moments, and if a similar model was inferred by both ∂a∂i and moments, we only report the result with the best log-likelihood value.
For almost all 12 demographic scenarios, GADMA was able to infer better models in terms of both log-likelihood scores and parameters than were previously inferred using the dadi pipeline in Portik et al., (2017) [32]. Moreover, we obtained better alternative models for the CrossRiver, CVLN, Northern, and Southern populations. These models included an initial split of the ancestral population with ancestral asymmetric migration and a population size change (∆AIC = 3.24, ω i = 0.82) for the CrossRiver and CVLN populations; and secondary contact with asymmetric migration and population size change (∆AIC = 6.82, ω i = 0.97) for the Northern and Southern pop-   2 Divergence, unidirectional migration, followed by population size growth and bidirectional asymmetric migrations. 3 Divergence, secondary contact with no population change and asymmetric migration. 4 Divergence, secondary contact with no population change and asymmetric migration. We next inferred the demographic model with a structure equal to (1,2) with GADMA (Table 6). For all three allele frequency spectra, such a model had the best log-likelihood among previously inferred models (Tables S2, S3 and S4). For the Northern and Southern populations, the demographic model with the (1,2) structure also had the best AIC score. This model showed similar features for all observed data: an interval of time after the splitting of the ancestral population with unidirectional migration, followed by a time interval with bidirectional migration and population size change. The direction of the unidirectional migration was as follows: from CVLN to CrossRiver; from CVLN to CVLS; and from Southern to Northern. We also constructed two additional demographic models based on observed features and inferred their parameters: splitting of the ancestral population with unidirectional migration followed by either symmetric or asymmetric migrations with population size change. The demographic model with unidirectional migration followed by asymmetric migration and population size change was found to be the best for all three allele frequency spectra (Table 6) demographic models that explained the data equally well: the same demographic model as just described but which also included symmetric migration for CrossRiver, CVLN (ω i = 0.43) and secondary contact with asymmetric migration and population size change for CVLN and CVLS (ω i = 0.34).

Discussion
We report the development and mathematical justi cation of GADMA and demonstrate its e ectiveness using several previously published data sets. Our method is based on the genetic algorithm and uses existing solutions from either ∂a∂i and moments to simulate the AFS from the proposed demographic model. GADMA is the rst program that allows the automatic inference of demographic history of up to three populations from an allele frequency spectrum. Existing optimizations, implemented in either ∂a∂i and moments, require prior speci cation of demographic models to be tested and are thus ine cient in practice, given the large number of possible demographic scenarios that can be constructed for one or more populations. Our method is implemented in the GADMA software, which is openly accessible via the link https://github.com/ctlab/GADMA.
GADMA was shown to be e cient in performance: it was applied to both simulated data and to three empirical datasets, representing di erent organismal systems and associated demographic histories. The inferred demographic models had better log-likelihood scores than those reported in the original papers, which were derived from optimizations using either ∂a∂i or moments alone. Moreover, the demographic his-tories inferred with GADMA were consistent with the known history of the three taxa [11,37,31]. We also demonstrated the stability of the search, starting with demographic models with simpler structures rather than more complex ones, which reects the pro tability of using a search scheme that includes an increase in model structure complexity. Additionally, we compared pipelines using moments or ∂a∂i and showed that the computational speed of moments was much greater than for ∂a∂i. Thus, GADMA is the rst software that e ectively infers a demographic model from an allele frequency spectrum with nothing required from the user, except the structure of the demographic model, which determines the extent of the model complexity and associated details.
Despite the increasing use of allele frequency spectra for inferring demographic history, there are some limitations of the informativeness of AFS with regards to historical demographic inference that have been noted (see Beichman et al., (2018) [3]). For example, previous studies have shown that the AFS of a single panmictic population can be matched to di erent demographic scenarios [9]. We also observed this issue in our simulated data for the demographic model that included a bottleneck event of a panmictic population. The inferred model with an earlier exponential bottleneck had almost the same likelihood score as the model with the sudden population size dynamic. We should expect the same behavior in the case of multiple populations, which requires estimating the joint AFS. Moreover Rosen et al., (2018) [40] showed the limitation of AFS based methods and their pathological behaviour. They also proved that expected AFS data from n samples under any demographic model could be generated by a piecewise-constant model with at least 2n -1 time intervals. This research is a complement to the paper Terhorst and Song, (2015) [41], where the minimax error of the inference of population size history was shown to be at least O(1/ log (s)), where s is the number of segregating sites. This result means that the accuracy of demographic inference does not depend on the size of AFS but on the number of considered segregating sites. Although it was provided only for the populations that have experienced a bottleneck, the authors argue that this behaviour should be expected for the real data. We should consider described limitations of AFS and because of such behavior, the structure of the demographic model should not be very complex. We suggest to use structures no more than (2,1) and (2,1,1). This limitation can be solved by using additional information about observed populations, for example, two-locus statistics [42] or focus on IICR summary [43]. Incorporating genetic linkage information with AFS data could also improve the accuracy of inference of demographic history [42]. Including such information in the GADMA pipeline will allow the use of more complex demographic model structures in the future.
In our analyses using the AFS from three human populations (YRI, CEU and CHB), we inferred a best-tting demographic model that showed an expansion out of Africa around 400 thousand years ago, which is not supported by previous studiess [11,44,45]. This can be caused by the fact that treebased models often do not take into account processes such as admixture and introgression that can under-or overestimate population divergence times (e.g., Kamm et al., (2019) [13]). Alternatively, these di erences could result from the limitation of the informativeness of the AFS or by noise in the spectrum as a result of including low-quality variant calls as AFS based methods should be highly sensitive to noise [40]. However, Gutenkunst et al., (2009) [11] noted the high quality of this data set. The demographic model inferred with GADMA using the same parameters as in Gutenkunst et al., (2009), under the assumption that expansion took place not earlier than 150 thousand years ago, resulted in parameter values similar to those reported in Gutenkunst et al., (2009). We also in-ferred all possible parameters, including asymmetric migration rates and di erent dynamics of population size changes, and obtained a demographic model with the best CLAIC score. With GADMA, we observed higher asymmetric migration rates and the growth of the CEU+CHB population after its split from the African population.
Allele frequency spectra of two isolated populations of Gillette's checkerspot buttery from Wyoming and Colorado showed several alternative models with values very close to the best value of the composite likelihood found by McCoy et al., (2014) [31]. All migrations that were inferred are negligible, which con rmed the isolation of the two populations. One of the inferred models is consistent with the demographic history that was estimated previously by McCoy et al., (2014). The demographic model inferred using GADMA with the best likelihood value seems to be a better model overall because, in addition to the best CLAIC score (for the AFS using all SNPs), it correctly inferred the timing of the population split to the actual known value of~33 years. However, the model without migration showed the best CLAIC score for the AFS using all synonymous SNPs and could be the right choice too, so we suggest that further research is necessary to identify alternative models that may better t the demographic history of these checkerspot butter y populations.
We conducted a series of experiments for selecting demographic models for the Gaboon forest frog, also repeating the analyses performed by Portik et al., (2017) [32]. Nearly all of the 12 models inferred previously were found to be suboptimal. For two of the three population sets analyzed with GADMA, demographic models with higher log-likelihoods were chosen compared to those previously inferred. For the comparison that included the CVLN and CVLS populations, the demographic model with the highest log-likelihood was consistent with model inferred by Portik et al., 2017, but new values of parameters with better likelihood values were found. Demographic model optimization using ∂a∂i proved to be unstable, and we were consequently forced to either specify lower limits on speci c parameter values or instead use moments for model optimization. moments proved to be indeed more stable in simulating the expected AFS from the demographic model. We then inferred the full parameters of the demographic model with a structure equal to (1,2) for each of three AFS datasets. For Northern and Southern populations, models using this structure resulted in higher log-likelihood scores. However, we noticed some peculiar properties in parameter values and generated new demographic models based on these properties and inferred their parameters using the three allele frequency spectra. These analyses resulted in a model with improved AIC scores for each of the three allele frequency spectra. This model contains divergence of the ancestral population, a time interval with unidirectional migration followed by a time interval with population size change and bidirectional asymmetric gene ow.
In this work, we focused on benchmarking the e ectiveness of the genetic algorithm for demographic model inference on real data. We found that GADMA managed to infer better demographic models than those that were found in the original analyses for all tested data sets. However, we highlight two caveats that warrant further research. First, our method only used allele frequency spectrum data. Other data, such as those based on haplotypes or SNP data may prove to be more informative about demographic history, but current methods using such data are restricted to simple population size change dynamics [e.g., IBS tract method [34], DIYABC [46], MSMC [47]] whereas ∂a∂i and moments, which are implemented in GADMA, are less restricted in these dynamics. Second, it is possible to add other solutions like fastsimcoal2 or momi2 for simulating the expected AFS from demographic models and make compar-isons between all methods. Also, even though we performed some analyses on simulated data that included models with known global optima, we suggest our method could be further veri ed through the use of additional simulated data sets.
While the optimization search implemented in GADMA is able to nd demographic models with the best likelihood score and their associated parameters, it is important to minimize the number of parameters so as to avoid the possibility of overtting the model to the AFS data. Fortunately, such a strategy is included in in GADMA using AIC and CLAIC scores: it informs the user about over tting when the demographic model with the best likelihood score and best AIC (or CLAIC) score do not match. Additionally, GADMA can infer demographic models with all possible parameters, allowing researchers to explore additional models based on the inferred model, as we demonstrated in the case of the Gaboon forest frog. However, we note that GADMA does not sort out all possible numbers of parameters, so it is not guaranteed to nd the model(s) with best AIC (or CLAIC) score(s). Moreover, we recommend performing block bootstrapping of data sets over unlinked regions of the genome for accurate estimation of the CLAIC score. We note, however, that detection of such regions can be di cult.
Another direction in the further development of our work is increasing the number of considered populations. Currently, GADMA can analyze up to three populations, similar to ∂a∂i. In contrast, moments can simulate allele frequency spectra for up to ve populations. As the limitations of AFS based methods that were presented above are the important restrictions in increasing the number of populations and complexity of demographic models (we again propose the simplest structures then), we expect that this modi cation should be done in parallel with the incorporation of additional summaries of genetic data. Moreover, including the estimation of selection coecients (e.g., Gutenkunst et al., (2009) [11]) and the development of a user-friendly interface for various types of data sets (e.g., all SNPs, synonymous SNPs only, etc.), will help to further expand the capabilities of GADMA. In this work we were focused on the application of the genetic algorithm as global optimization on the demographic inference. We have shown that e ective global search is possible and we assume the existence of more powerful optimization method. So it is also possible to improve the proposed method, using another optimizations or various modi cations of the genetic algorithm, for example, one that infers deliberately di erent demographic models [48].

Availability of supporting data and materials
All data, parameters for GADMA runs and results are available in the Bitbucket repository [49]. The supplementary materials and the archival snapshot of code are published as GigaDB dataset [50]. Supplementary tables are presented both in PDF and tabular formats.

Declarations Competing Interests
The authors declare that they have no competing interests. Click here to access/download; Figure;figure_12.pdf  Figure 10 Click here to access/download; Figure;figure_13.pdf