-
PDF
- Split View
-
Views
-
Cite
Cite
Sarah P Flanagan, Adam G Jones, Constraints on the FST–Heterozygosity Outlier Approach, Journal of Heredity, Volume 108, Issue 5, July 2017, Pages 561–573, https://doi.org/10.1093/jhered/esx048
Close -
Share
Abstract
The FST–heterozygosity outlier approach has been a popular method for identifying loci under balancing and positive selection since Beaumont and Nichols first proposed it in 1996 and recommended its use for studies sampling a large number of independent populations (at least 10). Since then, their program FDIST2 and a user-friendly program optimized for large datasets, LOSITAN, have been used widely in the population genetics literature, often without the requisite number of samples. We observed empirical datasets whose distributions could not be reconciled with the confidence intervals generated by the null coalescent island model. Here, we use forward-in-time simulations to investigate circumstances under which the FST–heterozygosity outlier approach performs poorly for next-generation single nucleotide polymorphism (SNP) datasets. Our results show that samples involving few independent populations, particularly when migration rates are low, result in distributions of the FST–heterozygosity relationship that are not described by the null model implemented in LOSITAN. In addition, even under favorable conditions LOSITAN rarely provides confidence intervals that precisely fit SNP data, making the associated P-values only roughly valid at best. We present an alternative method, implemented in a new R package named fsthet, which uses the raw empirical data to generate smoothed outlier plots for the FST–heterozygosity relationship.
In the field of evolutionary biology, interest in identifying genome-level targets of selection has grown dramatically since the introduction of next-generation sequencing. Population genetic approaches tackle this problem by comparing loci from multiple populations of the same species to identify loci with divergent allele frequencies. The most common measure of differentiation used is Wright’s fixation index, FST (Wright 1943). Lewontin and Krakauer (1973) proposed that all loci experience the same background processes such as drift and breeding structure, so the overall distribution and variance of FST values can be used to identify loci that are more or less divergent than expected and are likely under selection. Beaumont and Nichols (1996) used the relationship between FST and heterozygosity to identify loci putatively under balancing or positive selection. They also implemented a null model approach, in which an empirical dataset is compared to datasets simulated under a null model, to identify outliers.
The FST–heterozygosity relationship has been widely used to identify putative loci under selection and to select a subset of loci that are likely neutral for other analyses that require neutrality. This relationship serves as the basis of both a command-line program, FDIST2 (Beaumont 2000), and a program with a graphical user interface, LOSITAN (Antao et al. 2008). Additional methods for identifying outlier loci exist, including bootstrapping FST values to generate confidence intervals (Catchen et al. 2013) and approaches that correct for evolutionary non-independence of populations (Günther and Coop 2013), since FST is notoriously poor at measuring differentiation in populations with hierarchical structure (Nei and Maruyama 1975; Robertson 1975; Jakobsson et al. 2013; Jost 2008). Recent comparisons of the various approaches to outlier analysis have found mixed results for the power and reliability of the different methods. FDIST2, which uses the FST–heterozygosity relationship to identify outliers, had higher power than other methods in an isolation-by-distance population structure but performed less well in a strict island model (Lotterhos and Whitlock 2014). However, in the same isolation-by-distance scenario FDIST2 had more false positives than other methods (Lotterhos and Whitlock 2014).
The advent of next-generation sequencing has led to an increase in the number of loci included in outlier studies. With more loci, the distribution of the FST–heterozygosity relationship becomes more obvious. A surprising pattern, which serves as the motivation for this article, is that the distribution of points for actual loci assayed in empirical studies often deviates substantially from the null expectation generated by FDIST2 or LOSITAN. Thus, articles have been reporting significant outlier loci, even when the majority of loci show clear departures from the null model implemented by these FST–heterozygosity outlier approaches.
Many caveats surrounding the use of FDIST2 or LOSITAN have been discussed, but these warnings have not stopped apparent misapplication of the method. For example, as Beaumont and Nichols (1996) stated, the FST–heterozygosity relationship performs poorly when the number of sampled demes is small, when mean FST values are large, and when the selection coefficients associated with selected loci are less than about 5 times the migration rate (Caballero et al. 2008). In addition, if the actual population structure in the organism under consideration differs considerably from the idealized island model used by FDIST2 and LOSITAN, then inferences from these methods may be unreliable (Excoffier et al. 2009). Several simulation studies have found that FDIST and its companion program for dominant markers, DFDIST, have lower power (François et al. 2016) and result in more false positives than other programs (Narum and Hess 2011; Vilas et al. 2012; De Mita et al. 2013; Lotterhos and Whitlock 2014), even under a neutral island model (Pérez-Figueroa et al. 2010). Mathematical bounds on the relationship between FST and heterozygosity, as described by Jakobsson et al. (2013), also result in situations under which FST–heterozygosity outlier approaches will not perform well. While several studies, including those that proposed the methods, have used simulation-based approaches to evaluate their efficacy (Beaumont and Nichols 1996; Pérez-Figueroa et al. 2010; Narum and Hess 2011; Vilas et al. 2012; De Mita et al. 2013; Lotterhos and Whitlock 2014; François et al. 2016), none of these studies have focused specifically on extremely large numbers of biallelic single nucleotide polymorphism (SNP) markers, as would be available from a next-generation, population genomics study.
To understand how widely comparisons of FST and heterozygosity are used to identify outliers, and to understand how frequently the FST–heterozygosity distribution deviates from the expected distribution in LOSITAN, we perform a survey of the literature, paying particular attention to the suitability of the FST–heterozygosity outlier approach for each sampling design. We then use forward-in-time simulations to investigate the impacts of population structure parameters (number of demes, migration rate, strength of selection, and number of sampled populations) on the FST–heterozygosity relationship, especially in nonideal cases. We demonstrate that each parameter set has a characteristic distribution of neutral points and that LOSITAN confidence intervals correctly reflect the shape of the distribution only under ideal circumstances. Our analysis further extends the results of other simulation-based studies by providing examples of patterns in empirical datasets that can serve as indicators that LOSITAN may not be reliably identifying outlier loci. Finally, we present an alternative approach, fsthet, which generates smoothed quantiles for the FST–heterozygosity relationship from the empirical distribution.
Methods
Surveying the Literature
Studies that cited LOSITAN (Antao et al. 2008) were found on Google Scholar. We focused on the use of LOSITAN because it can use multiple CPU cores when running FDIST2 and therefore is better suited to large datasets (Antao et al. 2008). In each article that cited the program, we looked for a figure containing the plot of FST–heterozygosity in either the main text or the supplementary material and evaluated the distribution of the FST–heterozygosity relationship by eye. For those studies providing a graph, we examined the FST–heterozygosity relationship and compared the actual data points to the null expectation generated by LOSITAN. We observed 3 categories of distributions: (1) “well-behaved,” where the distribution approximated the null model, with the exception of the outliers (Figure 1A); (2) “inclined,” where the points followed a line representing a positive relationship between heterozygosity and FST with a dearth of points representing low heterozygosity and large FST (Figure 1B); and (3) “skewed,” which lacked points with low heterozygosity and high FST but had many points falling near a heterozygosity of 0.5 that spanned a large range of FST values (Figure 1C). For each of these mutually exclusive patterns, LOSITAN produced confidence intervals that were either smooth (Figure 1A,C) or jagged (Figure 1B), so each distribution type could also be associated with a jagged confidence interval. In addition to classifying the FST–heterozygosity relationship for the studies citing LOSITAN, for each study we recorded information regarding the type of markers used, the number of populations sampled, and the number of clusters identified by population structuring programs, if such an analysis was conducted.
Examples of the different patterns we observed in the literature. In the left-hand column, we present examples of FST–heterozygosity distributions from studies using SNP data. In the right-hand column, we show the number of populations sampled and the number of demes underlying the populations for the studies with the FST–heterozygosity distribution shown in that row. The top row shows a well-behaved pattern, and data in the left-hand column are from Hess et al. (2013). The second row shows an incline pattern and data are from Hebert et al. (2013). The data also show a jagged confidence interval. The bottom row shows a skewed pattern, where only highly divergent loci also have high heterozygosity, and the data are from Dann et al. (2012).
Examples of the different patterns we observed in the literature. In the left-hand column, we present examples of FST–heterozygosity distributions from studies using SNP data. In the right-hand column, we show the number of populations sampled and the number of demes underlying the populations for the studies with the FST–heterozygosity distribution shown in that row. The top row shows a well-behaved pattern, and data in the left-hand column are from Hess et al. (2013). The second row shows an incline pattern and data are from Hebert et al. (2013). The data also show a jagged confidence interval. The bottom row shows a skewed pattern, where only highly divergent loci also have high heterozygosity, and the data are from Dann et al. (2012).
Because the majority of the articles providing a figure from LOSITAN did not have a well-behaved distribution (see Results), we were motivated to explore which parameter combinations resulted in well-behaved distributions when the FDIST2 simulator was run on its own. Additionally, we wished to ensure that the results were not unique to the coalescent simulations, so we used forward-in-time simulations to test the FDIST2 algorithm.
A Note on Terminology
Throughout this article, we will refer to independent populations as demes (i.e., corresponding to “islands” in Wright’s island model). Demes are distinct from the number of sampled populations, and the true number of demes is often unknown in empirical studies. In principle, a population clustering algorithm could reveal the number of demes represented by a suite of population samples. However, in practice, almost any level of differentiation is possible among demes connected by gene flow, so an absolute cutoff for the diagnosis of distinct demes cannot be established. An important additional consideration is that any empirical study has the potential to collect multiple population samples per deme, because the underlying structure of demes is not normally known at the outset. In this article, we include this possibility in our simulations, without formally investigating hierarchical population structure, which is beyond the scope of this article and has been addressed elsewhere (e.g., Excoffier et al. 2009).
Testing FDIST2 Simulations with Various Parameter Values
Since LOSITAN is a user-friendly wrapper for FDIST2, we begin by using the FDIST2 simulator to see if the coalescent model recapitulates the patterns present in our literature survey. The FDIST2 coalescent simulation model uses the statistic to estimate FST, where 1 − is the between-populations measure of heterozygosity and 1 − is the within-populations measure of heterozygosity (Beaumont and Nichols 1996). This statistic is a sample-size corrected version of Cockerham and Weir’s (1993)’s (Beaumont and Nichols 1996; Lotterhos and Whitlock 2014). The value is:
where N is the number of populations, n is the sample size in each population j, there are a alleles, and cij is the count of allele i in population j. And, is given by:
The value 1 − is what is termed heterozygosity by FDIST2, and here, we will refer to it as HB. The coalescent simulations use an island model with independent demes, where each sample originates from a different deme. Genetic variation depends only on the scaled mutation rate, θ = Nµ. The distribution of is characterized by estimating the quantiles of a Johnson’s distribution (Hill et al. 1976), and these quantiles are output as confidence intervals by FDIST2 and LOSITAN (Beaumont and Nichols 1996; Antao et al. 2008).
To run FDIST2, the user specifies 6 different parameters: the total number of demes, the number of sampled demes, the number of individuals sampled in each sampled deme, the estimated FST value, the number of iterations (which is equivalent to the number of loci), and whether the demes adhere to an infinite alleles model or a stepwise mutation model. We tested over 500 combinations of these 6 parameters. We specified 100 demes, sampling 10%, 25%, 50%, 75%, or 100% of the demes. The sample sizes we tested were 10, 25, 50, 75, and 100 individuals, and we ran simulations using 4000 and 20000 loci (Supplementary Table 1). For each parameter combination, we generated FST–heterozygosity plots in R (R Core Team 2016), calculated the proportion of loci that were outside of the confidence intervals and visually inspected each plot to evaluate whether the confidence intervals fit the simulated distributions. This analysis relied solely on the coalescent simulator built into FDIST2 and LOSITAN.
Forward-in-Time Simulations of the Island Model
To test the performance of FDIST2 for biallelic loci, as might be obtained from a next-generation sequencing study of SNPs, we wrote a C++ program to run a simulation of an infinite islands model for a sexually reproducing diploid organism with 2 alleles per locus. Starting with uniform allele frequencies, we simulated genetic drift by drawing alleles from a binomial distribution, which arises from the binomial probability equation:
where p is the frequency of allele A and N is the population size (Hartl and Clark 2007). This approach simulates a single generation of genetic drift, producing a new frequency of A alleles (pd) in each of the d simulated demes after drift. We then simulated migration using the island model equation:
where m is the migration rate, pd is the frequency of the A allele in deme d before migration, and is the allele frequency of the total population (i.e., the average allele frequency across an infinite number of demes), which was assigned a random value for each simulation run (Wright 1931, 1951). Each run represented one locus and we simulated 2000 loci by running the simulation repeatedly (thus assuming the loci were in linkage equilibrium). Thus, each locus was treated as being independent from other loci in the genome, an unrealistic approach that is nevertheless very common due to the difficulties associated with modeling and measuring linkage disequilibrium. An example of this problem comes from the estimation of Ne, where ignoring linkage disequilibrium creates bias (Waples et al. 2016). By extension, statistical tests of outliers that do not account for nonindependence of loci may well be biased. A thorough treatment of this topic was beyond the scope of this article, because FDIST2 and LOSITAN also assume all loci are independent. The drift-migration process was repeated for 5000 generations to allow the populations to reach migration-drift equilibrium (although the rate at which populations reach equilibrium depends on factors including population size and migration rate, Slatkin 1991). We chose not to include mutation because the effects of mutation on the allele frequencies of an individual SNP are negligible compared to migration and genetic drift in an island model (Wright 1951). Although mutation is often needed in models involving a small number of demes to maintain polymorphism, it is not necessary in this case because the allele frequencies in the total population do not change over time in the island model (Wright 1931, 1951).
To evaluate both the FST–heterozygosity relationships that emerged from these simulations and the performance of LOSITAN, we sampled the populations after 5000 generations. Sampling occurred by drawing alleles at random based on their frequencies, and we sampled 50 diploid genotypes per locus per sampled population. These samples were then run through LOSITAN, which calculated HB and values as described above and generated 95% confidence intervals.
To ensure that our simulations accurately represented the island model, we compared observed FST values to theoretical expectations. We calculated HS, HT, and FST metrics that were uncorrected for sample size to approximate values similar to those from LOSITAN for comparison. We calculated the FST statistic as defined by Wright (1943), without a correction for sampling bias:
We calculated expected heterozygosity as 2p(1 − p) within each sampled population and averaged across populations to calculate HS. The total heterozygosity was calculated as , where is the mean frequency of the A allele (Wright 1943) across samples. The expected value of FST was calculated as 1/(4Nm + 1). We compared the observed and expected FST values from the simulation model when it was run with 100 demes, the same number of demes used in our FDIST2 simulations. Each run used a population size of 1000, 100 random population samples, and Nm = 0.1, Nm = 1, or Nm = 10. For each value of Nm we ran 2000 replicates to generate a mean observed FST value and confidence intervals around the mean. In all 3 cases, the expected FST was well within the 95% confidence intervals surrounding the mean observed FST (Supplementary Figure S1). We also calculate this expected FST value in all of our simulation runs for comparison as a reference point, but acknowledge that the 1/(4Nm + 1) expectation is based on the assumption of infinite demes.
To identify the parameter combinations (Nm, the number of demes, and the number of sampled populations) under which LOSITAN produced the expected proportion of false positives (5% of the sampled neutral loci at the 95% confidence level), we output the sampled genotypes from our simulations in GENEPOP format and imported them into LOSITAN. We used the stepwise mutation model, 50000 simulations, and a confidence interval of 0.95. The LOSITAN manual also recommends using the “Neutral” mean FST function, and since we were interested in emulating empirical datasets we used this option. This function removes potential loci under selection after a first simulation run to compute the initial mean FST. After running LOSITAN, the number of loci output as under balancing and positive selection were counted and compared to expectations (i.e., 5% at α = 0.05 since we simulated only neutral loci).
The parameter combinations we tested with our forward-in-time simulations are found in Table 1. We altered 3 parameters: the number of demes, the number of sampled populations, and Nm (Nm = population size × migration rate). Each parameter combination was run with 2000 repetitions, representing 2000 independently segregating loci, with set to a randomly drawn value between 0.05 and 0.95. We either sampled each population an exact number of times or drew a specified number of samples from randomly chosen demes (such that a given deme could be sampled multiple times). When a deme was sampled multiple times, 50 new diploid genotypes were drawn from the distribution of alleles, such that each population sample contained an independent set of individuals from the deme. Although Beaumont and Nichols (1996) recommend sampling a large number (≥10) of independent populations (i.e., demes), this approach is not commonly used (see Results). Many researchers sample a set of locales once and determine both population structure (the independence of populations) and the signature of selection in one fell swoop. However, provisions are generally not made for population clustering (e.g., nonindependence of sampled populations) when analyzing data with LOSITAN (see Results), even though clustering analyses could be used to remove or merge samples that originated from the same, apparently panmictic, deme. As a result, some samples may be from the same deme in terms of underlying population structure, even though the researchers treat each sample as a separate population in their outlier analysis.
The results of running LOSITAN on the output from the forward-in-time simulations
| Demes . | Nm . | 2 Samples . | 5 Samples . | 10 Samples . | 20 Samples . | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Balancing . | Positive . | Total . | Balancing . | Positive . | Total . | Balancing . | Positive . | Total . | Balancing . | Positive . | Total . | ||
| 2 | 0.1 | 0.000 | 0.426 | 0.426 | 0.178 | 0.275 | 0.453 | 0.348 | 0.301 | 0.649 | 0.387 | 0.420 | 0.807 |
| 1 | 0.000 | 0.245 | 0.245 | 0.019 | 0.196 | 0.216 | 0.289 | 0.186 | 0.475 | 0.438 | 0.184 | 0.622 | |
| 10 | 0.000 | 0.057 | 0.057 | 0.000 | 0.061 | 0.061 | 0.000 | 0.089 | 0.089 | 0.004 | 0.156 | 0.160 | |
| 5 | 0.1 | 0.004 | 0.402 | 0.406 | 0.053 | 0.175 | 0.229 | 0.106 | 0.243 | 0.349 | 0.187 | 0.271 | 0.458 |
| 1 | 0.000 | 0.283 | 0.283 | 0.059 | 0.146 | 0.205 | 0.099 | 0.145 | 0.244 | 0.205 | 0.185 | 0.390 | |
| 10 | 0.000 | 0.060 | 0.060 | 0.000 | 0.054 | 0.054 | 0.000 | 0.071 | 0.071 | 0.019 | 0.105 | 0.124 | |
| 10 | 0.1 | 0.001 | 0.395 | 0.396 | 0.029 | 0.129 | 0.158 | 0.050 | 0.171 | 0.220 | 0.095 | 0.186 | 0.281 |
| 1 | 0.000 | 0.277 | 0.277 | 0.039 | 0.107 | 0.145 | 0.064 | 0.112 | 0.175 | 0.108 | 0.147 | 0.255 | |
| 10 | 0.000 | 0.061 | 0.061 | 0.000 | 0.033 | 0.033 | 0.003 | 0.042 | 0.045 | 0.010 | 0.080 | 0.090 | |
| 25 | 0.1 | 0.005 | 0.368 | 0.373 | 0.012 | 0.128 | 0.140 | 0.025 | 0.111 | 0.136 | 0.035 | 0.123 | 0.157 |
| 1 | 0.000 | 0.306 | 0.306 | 0.021 | 0.107 | 0.127 | 0.034 | 0.083 | 0.117 | 0.054 | 0.104 | 0.158 | |
| 10 | 0.000 | 0.070 | 0.070 | 0.000 | 0.036 | 0.036 | 0.001 | 0.035 | 0.036 | 0.008 | 0.040 | 0.048 | |
| 50 | 0.1 | 0.001 | 0.478 | 0.479 | 0.010 | 0.108 | 0.118 | 0.015 | 0.090 | 0.105 | 0.031 | 0.102 | 0.133 |
| 1 | 0.000 | 0.296 | 0.296 | 0.023 | 0.096 | 0.119 | 0.025 | 0.076 | 0.101 | 0.044 | 0.061 | 0.105 | |
| 10 | 0.000 | 0.070 | 0.070 | 0.000 | 0.037 | 0.037 | 0.002 | 0.036 | 0.037 | 0.006 | 0.030 | 0.035 | |
| 75 | 0.1 | 0.002 | 0.403 | 0.405 | 0.008 | 0.126 | 0.134 | 0.015 | 0.083 | 0.097 | 0.021 | 0.082 | 0.103 |
| 1 | 0.000 | 0.289 | 0.289 | 0.023 | 0.086 | 0.109 | 0.024 | 0.098 | 0.122 | 0.026 | 0.072 | 0.098 | |
| 10 | 0.000 | 0.078 | 0.078 | 0.000 | 0.036 | 0.036 | 0.001 | 0.027 | 0.028 | 0.006 | 0.029 | 0.034 | |
| 100 | 0.1 | 0.000 | 0.372 | 0.372 | 0.012 | 0.115 | 0.127 | 0.009 | 0.093 | 0.102 | 0.017 | 0.085 | 0.101 |
| 1 | 0.000 | 0.312 | 0.312 | 0.016 | 0.087 | 0.102 | 0.026 | 0.069 | 0.095 | 0.027 | 0.053 | 0.079 | |
| 10 | 0.000 | 0.070 | 0.070 | 0.000 | 0.034 | 0.034 | 0.003 | 0.026 | 0.029 | 0.003 | 0.021 | 0.023 | |
| Demes . | Nm . | 2 Samples . | 5 Samples . | 10 Samples . | 20 Samples . | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Balancing . | Positive . | Total . | Balancing . | Positive . | Total . | Balancing . | Positive . | Total . | Balancing . | Positive . | Total . | ||
| 2 | 0.1 | 0.000 | 0.426 | 0.426 | 0.178 | 0.275 | 0.453 | 0.348 | 0.301 | 0.649 | 0.387 | 0.420 | 0.807 |
| 1 | 0.000 | 0.245 | 0.245 | 0.019 | 0.196 | 0.216 | 0.289 | 0.186 | 0.475 | 0.438 | 0.184 | 0.622 | |
| 10 | 0.000 | 0.057 | 0.057 | 0.000 | 0.061 | 0.061 | 0.000 | 0.089 | 0.089 | 0.004 | 0.156 | 0.160 | |
| 5 | 0.1 | 0.004 | 0.402 | 0.406 | 0.053 | 0.175 | 0.229 | 0.106 | 0.243 | 0.349 | 0.187 | 0.271 | 0.458 |
| 1 | 0.000 | 0.283 | 0.283 | 0.059 | 0.146 | 0.205 | 0.099 | 0.145 | 0.244 | 0.205 | 0.185 | 0.390 | |
| 10 | 0.000 | 0.060 | 0.060 | 0.000 | 0.054 | 0.054 | 0.000 | 0.071 | 0.071 | 0.019 | 0.105 | 0.124 | |
| 10 | 0.1 | 0.001 | 0.395 | 0.396 | 0.029 | 0.129 | 0.158 | 0.050 | 0.171 | 0.220 | 0.095 | 0.186 | 0.281 |
| 1 | 0.000 | 0.277 | 0.277 | 0.039 | 0.107 | 0.145 | 0.064 | 0.112 | 0.175 | 0.108 | 0.147 | 0.255 | |
| 10 | 0.000 | 0.061 | 0.061 | 0.000 | 0.033 | 0.033 | 0.003 | 0.042 | 0.045 | 0.010 | 0.080 | 0.090 | |
| 25 | 0.1 | 0.005 | 0.368 | 0.373 | 0.012 | 0.128 | 0.140 | 0.025 | 0.111 | 0.136 | 0.035 | 0.123 | 0.157 |
| 1 | 0.000 | 0.306 | 0.306 | 0.021 | 0.107 | 0.127 | 0.034 | 0.083 | 0.117 | 0.054 | 0.104 | 0.158 | |
| 10 | 0.000 | 0.070 | 0.070 | 0.000 | 0.036 | 0.036 | 0.001 | 0.035 | 0.036 | 0.008 | 0.040 | 0.048 | |
| 50 | 0.1 | 0.001 | 0.478 | 0.479 | 0.010 | 0.108 | 0.118 | 0.015 | 0.090 | 0.105 | 0.031 | 0.102 | 0.133 |
| 1 | 0.000 | 0.296 | 0.296 | 0.023 | 0.096 | 0.119 | 0.025 | 0.076 | 0.101 | 0.044 | 0.061 | 0.105 | |
| 10 | 0.000 | 0.070 | 0.070 | 0.000 | 0.037 | 0.037 | 0.002 | 0.036 | 0.037 | 0.006 | 0.030 | 0.035 | |
| 75 | 0.1 | 0.002 | 0.403 | 0.405 | 0.008 | 0.126 | 0.134 | 0.015 | 0.083 | 0.097 | 0.021 | 0.082 | 0.103 |
| 1 | 0.000 | 0.289 | 0.289 | 0.023 | 0.086 | 0.109 | 0.024 | 0.098 | 0.122 | 0.026 | 0.072 | 0.098 | |
| 10 | 0.000 | 0.078 | 0.078 | 0.000 | 0.036 | 0.036 | 0.001 | 0.027 | 0.028 | 0.006 | 0.029 | 0.034 | |
| 100 | 0.1 | 0.000 | 0.372 | 0.372 | 0.012 | 0.115 | 0.127 | 0.009 | 0.093 | 0.102 | 0.017 | 0.085 | 0.101 |
| 1 | 0.000 | 0.312 | 0.312 | 0.016 | 0.087 | 0.102 | 0.026 | 0.069 | 0.095 | 0.027 | 0.053 | 0.079 | |
| 10 | 0.000 | 0.070 | 0.070 | 0.000 | 0.034 | 0.034 | 0.003 | 0.026 | 0.029 | 0.003 | 0.021 | 0.023 | |
To evaluate the parameter space in which LOSITAN might be most relevant, we varied 3 parameters in the simulations: the number of demes, the number of sampled populations, and Nm, with a per-deme population size of 1000. We analyzed the output from our simulations in LOSITAN and calculated the proportion of loci that fell outside the confidence intervals, excluding loci with an expected total heterozygosity of zero (i.e., loci with no variation). LOSITAN reports loci that are either putatively experiencing “balancing” or “positive” selection, so we used that terminology for consistency. The combinations resulting in a proportion of outlier loci near the expected number (0.025–0.075) are in italicized, boldface font.
The results of running LOSITAN on the output from the forward-in-time simulations
| Demes . | Nm . | 2 Samples . | 5 Samples . | 10 Samples . | 20 Samples . | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Balancing . | Positive . | Total . | Balancing . | Positive . | Total . | Balancing . | Positive . | Total . | Balancing . | Positive . | Total . | ||
| 2 | 0.1 | 0.000 | 0.426 | 0.426 | 0.178 | 0.275 | 0.453 | 0.348 | 0.301 | 0.649 | 0.387 | 0.420 | 0.807 |
| 1 | 0.000 | 0.245 | 0.245 | 0.019 | 0.196 | 0.216 | 0.289 | 0.186 | 0.475 | 0.438 | 0.184 | 0.622 | |
| 10 | 0.000 | 0.057 | 0.057 | 0.000 | 0.061 | 0.061 | 0.000 | 0.089 | 0.089 | 0.004 | 0.156 | 0.160 | |
| 5 | 0.1 | 0.004 | 0.402 | 0.406 | 0.053 | 0.175 | 0.229 | 0.106 | 0.243 | 0.349 | 0.187 | 0.271 | 0.458 |
| 1 | 0.000 | 0.283 | 0.283 | 0.059 | 0.146 | 0.205 | 0.099 | 0.145 | 0.244 | 0.205 | 0.185 | 0.390 | |
| 10 | 0.000 | 0.060 | 0.060 | 0.000 | 0.054 | 0.054 | 0.000 | 0.071 | 0.071 | 0.019 | 0.105 | 0.124 | |
| 10 | 0.1 | 0.001 | 0.395 | 0.396 | 0.029 | 0.129 | 0.158 | 0.050 | 0.171 | 0.220 | 0.095 | 0.186 | 0.281 |
| 1 | 0.000 | 0.277 | 0.277 | 0.039 | 0.107 | 0.145 | 0.064 | 0.112 | 0.175 | 0.108 | 0.147 | 0.255 | |
| 10 | 0.000 | 0.061 | 0.061 | 0.000 | 0.033 | 0.033 | 0.003 | 0.042 | 0.045 | 0.010 | 0.080 | 0.090 | |
| 25 | 0.1 | 0.005 | 0.368 | 0.373 | 0.012 | 0.128 | 0.140 | 0.025 | 0.111 | 0.136 | 0.035 | 0.123 | 0.157 |
| 1 | 0.000 | 0.306 | 0.306 | 0.021 | 0.107 | 0.127 | 0.034 | 0.083 | 0.117 | 0.054 | 0.104 | 0.158 | |
| 10 | 0.000 | 0.070 | 0.070 | 0.000 | 0.036 | 0.036 | 0.001 | 0.035 | 0.036 | 0.008 | 0.040 | 0.048 | |
| 50 | 0.1 | 0.001 | 0.478 | 0.479 | 0.010 | 0.108 | 0.118 | 0.015 | 0.090 | 0.105 | 0.031 | 0.102 | 0.133 |
| 1 | 0.000 | 0.296 | 0.296 | 0.023 | 0.096 | 0.119 | 0.025 | 0.076 | 0.101 | 0.044 | 0.061 | 0.105 | |
| 10 | 0.000 | 0.070 | 0.070 | 0.000 | 0.037 | 0.037 | 0.002 | 0.036 | 0.037 | 0.006 | 0.030 | 0.035 | |
| 75 | 0.1 | 0.002 | 0.403 | 0.405 | 0.008 | 0.126 | 0.134 | 0.015 | 0.083 | 0.097 | 0.021 | 0.082 | 0.103 |
| 1 | 0.000 | 0.289 | 0.289 | 0.023 | 0.086 | 0.109 | 0.024 | 0.098 | 0.122 | 0.026 | 0.072 | 0.098 | |
| 10 | 0.000 | 0.078 | 0.078 | 0.000 | 0.036 | 0.036 | 0.001 | 0.027 | 0.028 | 0.006 | 0.029 | 0.034 | |
| 100 | 0.1 | 0.000 | 0.372 | 0.372 | 0.012 | 0.115 | 0.127 | 0.009 | 0.093 | 0.102 | 0.017 | 0.085 | 0.101 |
| 1 | 0.000 | 0.312 | 0.312 | 0.016 | 0.087 | 0.102 | 0.026 | 0.069 | 0.095 | 0.027 | 0.053 | 0.079 | |
| 10 | 0.000 | 0.070 | 0.070 | 0.000 | 0.034 | 0.034 | 0.003 | 0.026 | 0.029 | 0.003 | 0.021 | 0.023 | |
| Demes . | Nm . | 2 Samples . | 5 Samples . | 10 Samples . | 20 Samples . | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Balancing . | Positive . | Total . | Balancing . | Positive . | Total . | Balancing . | Positive . | Total . | Balancing . | Positive . | Total . | ||
| 2 | 0.1 | 0.000 | 0.426 | 0.426 | 0.178 | 0.275 | 0.453 | 0.348 | 0.301 | 0.649 | 0.387 | 0.420 | 0.807 |
| 1 | 0.000 | 0.245 | 0.245 | 0.019 | 0.196 | 0.216 | 0.289 | 0.186 | 0.475 | 0.438 | 0.184 | 0.622 | |
| 10 | 0.000 | 0.057 | 0.057 | 0.000 | 0.061 | 0.061 | 0.000 | 0.089 | 0.089 | 0.004 | 0.156 | 0.160 | |
| 5 | 0.1 | 0.004 | 0.402 | 0.406 | 0.053 | 0.175 | 0.229 | 0.106 | 0.243 | 0.349 | 0.187 | 0.271 | 0.458 |
| 1 | 0.000 | 0.283 | 0.283 | 0.059 | 0.146 | 0.205 | 0.099 | 0.145 | 0.244 | 0.205 | 0.185 | 0.390 | |
| 10 | 0.000 | 0.060 | 0.060 | 0.000 | 0.054 | 0.054 | 0.000 | 0.071 | 0.071 | 0.019 | 0.105 | 0.124 | |
| 10 | 0.1 | 0.001 | 0.395 | 0.396 | 0.029 | 0.129 | 0.158 | 0.050 | 0.171 | 0.220 | 0.095 | 0.186 | 0.281 |
| 1 | 0.000 | 0.277 | 0.277 | 0.039 | 0.107 | 0.145 | 0.064 | 0.112 | 0.175 | 0.108 | 0.147 | 0.255 | |
| 10 | 0.000 | 0.061 | 0.061 | 0.000 | 0.033 | 0.033 | 0.003 | 0.042 | 0.045 | 0.010 | 0.080 | 0.090 | |
| 25 | 0.1 | 0.005 | 0.368 | 0.373 | 0.012 | 0.128 | 0.140 | 0.025 | 0.111 | 0.136 | 0.035 | 0.123 | 0.157 |
| 1 | 0.000 | 0.306 | 0.306 | 0.021 | 0.107 | 0.127 | 0.034 | 0.083 | 0.117 | 0.054 | 0.104 | 0.158 | |
| 10 | 0.000 | 0.070 | 0.070 | 0.000 | 0.036 | 0.036 | 0.001 | 0.035 | 0.036 | 0.008 | 0.040 | 0.048 | |
| 50 | 0.1 | 0.001 | 0.478 | 0.479 | 0.010 | 0.108 | 0.118 | 0.015 | 0.090 | 0.105 | 0.031 | 0.102 | 0.133 |
| 1 | 0.000 | 0.296 | 0.296 | 0.023 | 0.096 | 0.119 | 0.025 | 0.076 | 0.101 | 0.044 | 0.061 | 0.105 | |
| 10 | 0.000 | 0.070 | 0.070 | 0.000 | 0.037 | 0.037 | 0.002 | 0.036 | 0.037 | 0.006 | 0.030 | 0.035 | |
| 75 | 0.1 | 0.002 | 0.403 | 0.405 | 0.008 | 0.126 | 0.134 | 0.015 | 0.083 | 0.097 | 0.021 | 0.082 | 0.103 |
| 1 | 0.000 | 0.289 | 0.289 | 0.023 | 0.086 | 0.109 | 0.024 | 0.098 | 0.122 | 0.026 | 0.072 | 0.098 | |
| 10 | 0.000 | 0.078 | 0.078 | 0.000 | 0.036 | 0.036 | 0.001 | 0.027 | 0.028 | 0.006 | 0.029 | 0.034 | |
| 100 | 0.1 | 0.000 | 0.372 | 0.372 | 0.012 | 0.115 | 0.127 | 0.009 | 0.093 | 0.102 | 0.017 | 0.085 | 0.101 |
| 1 | 0.000 | 0.312 | 0.312 | 0.016 | 0.087 | 0.102 | 0.026 | 0.069 | 0.095 | 0.027 | 0.053 | 0.079 | |
| 10 | 0.000 | 0.070 | 0.070 | 0.000 | 0.034 | 0.034 | 0.003 | 0.026 | 0.029 | 0.003 | 0.021 | 0.023 | |
To evaluate the parameter space in which LOSITAN might be most relevant, we varied 3 parameters in the simulations: the number of demes, the number of sampled populations, and Nm, with a per-deme population size of 1000. We analyzed the output from our simulations in LOSITAN and calculated the proportion of loci that fell outside the confidence intervals, excluding loci with an expected total heterozygosity of zero (i.e., loci with no variation). LOSITAN reports loci that are either putatively experiencing “balancing” or “positive” selection, so we used that terminology for consistency. The combinations resulting in a proportion of outlier loci near the expected number (0.025–0.075) are in italicized, boldface font.
Using Smoothed Quantiles to Identify Outliers in the F ST–Heterozygosity Distribution
As an alternative to the LOSITAN approach to calculating confidence intervals, we propose using smoothed quantiles of the empirical distribution of FST–heterozygosity to identify outliers. Although quantiles identify a number of outlier loci based on the threshold used (e.g., 5% of the loci will be outliers with a 95% confidence level), and this approach does not use a null model based on population genetics first principles, using the empirical distribution itself has the advantage of not assuming a particular distribution or model of evolution. This feature is appealing because the FST–heterozygosity distribution changes based on many potentially unknowable features of population structure (see Results).
We implement our smoothed quantiles approach in R, available as an R package called fsthet. For the analyses conducted here, we use the heterozygosity and FST values output by LOSITAN to ensure valid comparisons between software packages. Our package has the capability to read a GENEPOP file and calculate FST and heterozygosity according to 1 of 4 methods specified by the user: Wright’s FST (Wright 1943), Weir’s θ (Weir 1990), the β based on the variance in allele frequencies (Cockerham and Weir 1993), or the sample size corrected (Cockerham and Weir 1993). FST and heterozygosity are calculated and loci are binned based on their HT values, so that 25 equally sized bins are created. Bins overlap, ultimately covering the entire range of HT values in the dataset. Bins containing fewer than 20 loci are merged with neighboring bins. Within each bin, loci are sorted by FST value and quantiles are calculated by discarding the most extreme FST values. A proportion of α/2 values is discarded from each end of the distribution, where α is a user-specified threshold (the default is 0.05). The fsthet package contains a function to generate plots similar to those output by LOSITAN, and fsthet generates a list of all outlier loci and their corresponding FST and heterozygosity values. We note that fsthet simply identifies outliers from the background FST–heterozygosity distribution, and therefore these loci merely represent candidates for further investigation.
To test the utility of fsthet, we used the GENEPOP files generated by the forward-in-time simulations above and ran LOSITAN. We used fsthet to generate smoothed quantiles for the FST and heterozygosity values output by LOSITAN (rather than values calculated by fsthet) to facilitate comparisons between the output of fsthet and LOSITAN. These simulations did not contain selected loci, so we evaluated how closely the smoothed quantiles came to designating 5% of the loci as outliers at the 5% level under each parameter combination.
We also added selection to the simulations and evaluated the ability of LOSITAN to identify selected loci. We imposed selection on 10 randomly chosen loci per simulation run. Half of the demes were assigned to experience selection and the other half were not. For loci under selection, we modeled allele frequency changes according to the following equations:
Directional selection was simulated by changing the values of ω11, ω12, and ω22, such that ω11 > ω12 > ω22. Values for ω11, ω12, and ω22 were set by changing the value of s, so that ω11 = 1, ω12 = 1 − s/2, and ω22 = 1 − s (Wright 1931; Herron and Freeman 2014). To test the effect of selection strength, number of migrants (Nm), and number of demes, we ran the simulations with selection using the parameter combinations presented in Table 3 and randomly sampled the demes 20 times.
Results
Literature Survey
In our survey of the literature, we found 504 articles that cited LOSITAN, only 112 of which provided the FST–heterozygosity plots either in the main text or in the Supplementary Material (Supplementary References). The 112 articles that provided plots used a variety of marker types, including amplified fragment length polymorphisms, allozymes, microsatellites, and SNPs. The majority of the 112 articles (81%) also performed an analysis of population structure. Of the 112 articles, 63% analyzed fewer than 10 populations, disregarding the advice provided by Beaumont and Nichols (1996; Figure 1). Moreover, 18% of the 112 articles compared only 2 populations. Importantly, of the articles that analyzed population structure, 68% found hierarchical population structure, suggesting that the populations they treated as independent in their LOSITAN analysis were re-sampling the same demes multiple times (i.e., the number of clusters was fewer than the number of sampled populations). Only one of those studies (Mwacharo et al. 2013) used the information regarding population structure to inform their FST–heterozygosity analysis (by conducting separate analyses for each cluster). Only 7% of the 91 articles containing clear population clusters included 10 or more clusters, which can be interpreted as distinct demes.
Each distribution was easily classified by eye into 1 of the 3 categories (well-behaved, inclined, or skewed), and confidence intervals were similarly classified as jagged or smooth. Of the 112 studies, 25 (39%) had a well-behaved distribution with smooth confidence intervals (Figure 1). We also observed 21 (19%) studies with an inclined pattern and smooth confidence intervals and 39 (35%) with smooth confidence intervals and a skewed distribution, with the majority of points falling near a heterozygosity of 0.5 and spanning a large range of FST values (Figure 1). An additional 12 (11%) studies had well-behaved distributions but LOSITAN produced jagged confidence intervals. The remaining 15 studies (13%) presented a combination of either the skewed or inclined patterns and jagged confidence intervals (Figure 1). By testing the FDIST2 simulations and running the simulations, we were able to identify several important factors resulting in these different distributions.
A Small Number of Total Demes
A small number of demes results in skewed or inclined distributions, in which large values are only possible when the overall heterozygosity (HB) is high, in both the FDIST2 coalescent simulations (Figure 2) and the forward-in-time simulations (Figure 3). As Jakobsson et al. (2013) demonstrated, for the 2-population case and using Wright’s FST estimator (Wright 1943), the upper bound on FST is a function of total homozygosity (1 − HT):
where hT is total homozygosity (Jakobsson et al. 2013). This can be re-written in terms of heterozygosity, HT, as:
In either formulation, FST is maximized when each allele occurs in only one population (e.g., when different alleles are fixed in different populations), such that HT = 0.5 for 2 populations (Jakobsson et al. 2013). In the 2-population case, FST can only range from 0 to 1 when HT is 0.5, and fixation of different alleles can only occur when HT = 0.5. Thus, in the 2-population case, the inclined pattern is a clear consequence of this mathematical relationship between HT and FST. A similar constraint occurs when the samples come from a small number of demes, regardless of how many samples are taken from each deme (see below).
While this relationship between FST and HT is a real constraint, as appreciated by Jakobsson et al. (2013), it becomes problematic for outlier analyses only because the LOSITAN confidence intervals do not capture the true FST–heterozygosity relationship for biallelic loci when few demes are sampled. This limitation arises from the fact that LOSITAN was developed to analyze samples originating from many independent populations (Beaumont and Nichols 1996). Because the FST–heterozygosity space is discrete in cases such as those explored here (finite sample sizes from a small number of populations using biallelic loci), the relationship is not easily captured by any smooth function (Jakobsson et al. 2013). Even though FDIST2 and LOSITAN use a different measure of heterozygosity (HB, see Methods) than Jakobsson et al. (2013) used, HT and HB are highly correlated (Supplementary Figure S2) and the same patterns in the datasets emerge regardless of which FST estimator is used. Consequently, when the true population structure contains a small number of demes, many of the loci diagnosed by LOSITAN to be under selection may simply be following the expected FST–heterozygosity distribution and may not actually be outliers.
Resampling of a Small Number of Demes
In the FDIST2 coalescent simulations, if 5–50% of the total number of demes were sampled, regardless of the overall sample size, the inclined pattern appeared (Figure 2). The inclined pattern also emerged whenever only 2 populations were sampled, regardless of the underlying number of demes (Figures 3–5 and Supplementary Figure S4). These results are due to the constraints imposed on FST by HT (see above; Jakobsson et al. 2013).
Typical patterns from the FDIST2 coalescent simulations. All 4 figures present the results of simulations run with a total of 100 demes, a sample size of 50, an infinite alleles model, and 20000 iterations. The expected heterozygosity (HB) is output from FDIST2. The left column had an estimated FST of 0.2 and the right column had an estimated FST of 0.8. The top row was run with 10 sampled demes (10% of the total number of demes) and the bottom row was run with 75 sampled demes (75% of the total number of demes).
Typical patterns from the FDIST2 coalescent simulations. All 4 figures present the results of simulations run with a total of 100 demes, a sample size of 50, an infinite alleles model, and 20000 iterations. The expected heterozygosity (HB) is output from FDIST2. The left column had an estimated FST of 0.2 and the right column had an estimated FST of 0.8. The top row was run with 10 sampled demes (10% of the total number of demes) and the bottom row was run with 75 sampled demes (75% of the total number of demes).
The underlying structure of populations played a role in whether the confidence intervals generated by LOSITAN reflected the empirical distribution of points. We found in the forward-in-time simulations that the skewed and inclined patterns were maintained if the underlying population structure consisted of only a few demes even if each deme was sampled multiple times (Figure 5 and Supplementary Figures S4 and S5).
Although the structure of populations played a role in shaping the FST–heterozygosity distribution, population size was unimportant as long as the migration rate was held constant. Population size merely marginally increased the observed mean FST value at lower heterozygosity values (i.e., the distribution was still skewed but not as drastically; Supplementary Figure S3).
Jagged Patterns in the Distribution
In addition to the 3 patterns observed in the empirical datasets (normal, skewed, and inclined), a jagged pattern emerged in both the coalescent and forward-in-time simulations when the number of sampled demes was small or the total number of demes was small and FST was large (0.6–0.8; Figures 2 and 5). This pattern is simply a feature of the FST–heterozygosity distribution, but it nevertheless causes problems in the identification of outliers because the confidence intervals usually do not fit the actual distribution of points and therefore many more loci than expected are identified as outliers by LOSITAN (Table 2; Figure 5 and Supplementary Figures S4–S7).
Jagged Confidence Intervals
Jagged confidence intervals produced by LOSITAN were observed in the datasets in the literature (Figure 1) and occasionally when the results of the forward-in-time simulation were run through LOSITAN, especially when only 2 demes were sampled (Figure 5 and Supplementary Figure S4) or when the underlying distribution was jagged (Figure 5). As in the case of jagged distributions, jagged confidence intervals are not necessarily a problem provided they actually fit the data. The confidence intervals used by Beaumont and Nichols (1996) are calculated by binning the loci into overlapping sets of simulated FST–heterozygosity points and using Johnson distributions to estimate the confidence intervals. An inappropriate bin size, given the number of simulated values or the underlying FST–heterozygosity distribution, could be one cause of the jagged confidence intervals. Jagged confidence intervals may signify other, less obvious, issues with the confidence intervals as well. An adapted version of FDIST2 that is appropriate for dominant markers, DFDIST and its wrapper program Mcheza (Antao and Beaumont 2011), replaces the Johnson distribution approach with an iterative smoothing method to estimate confidence intervals from the coalescent simulations. This approach ameliorates the problem of jagged confidence intervals, but has yet to be implemented for codominant markers.
Evaluating LOSITAN
All of the loci in our initial set of forward-in-time simulations were neutral, so we expected LOSITAN to identify approximately 5% of loci as being under either balancing or positive selection. However, in most cases LOSITAN identified many more significant loci than should have been found at the 0.05 level (mean = 0.185, σ2 = 0.161; Table 1). The 23 (27.4%) parameter combinations that resulted in a proportion of selected loci near the expected 0.05 (Table 1) only occurred when the number of migrants was high (Nm = 10). When migration rates were high, the skewed and inclined patterns were not always obvious (Figure 3 and Supplementary Figures S4–S7), which may explain the lower rates of false positives. LOSITAN produced the best results when many independent demes were each sampled once, and when the subpopulations had high migration rates and a low overall FST (e.g., Figure 3, right column).
The results of the forward-in-time simulations for different values of Nm with 2 demes (top row) or 5 demes (bottom row). This figure illustrates the effect of migration rate (Nm) and of the number of demes on the FST–heterozygosity distribution. For all of the runs presented here, the model was run for 5000 generations, with 2000 loci, and a population size of 1000. The points represent the actual values of and HB. Each deme was sampled once, with each sample containing 50 diploid individuals, to generate the 95% confidence intervals in LOSITAN, which are the red lines. In the top of each panel, we present the mean FST from the sample and the expected FST (“Exp. FST”), which was calculated as FST = 1/(4Nm + 1).
The results of the forward-in-time simulations for different values of Nm with 2 demes (top row) or 5 demes (bottom row). This figure illustrates the effect of migration rate (Nm) and of the number of demes on the FST–heterozygosity distribution. For all of the runs presented here, the model was run for 5000 generations, with 2000 loci, and a population size of 1000. The points represent the actual values of and HB. Each deme was sampled once, with each sample containing 50 diploid individuals, to generate the 95% confidence intervals in LOSITAN, which are the red lines. In the top of each panel, we present the mean FST from the sample and the expected FST (“Exp. FST”), which was calculated as FST = 1/(4Nm + 1).
Smoothed Quantiles as an Alternative Method to Generate Confidence Intervals
Using smoothed quantiles significantly decreased the average proportion of false positives compared to LOSITAN (1-tailed paired t-test, t = −6.7446, df = 83, P = 9.558 × 10−10; Figure 6). Of the 84 parameter combinations tested, fsthet identified as outliers a proportion of loci between 0.025 and 0.075 (near the expected 0.05) in 65 (77.38%) combinations (Table 2). However, using quantiles is not a perfect solution. Several parameter combinations yielded unexpectedly high proportions of loci called as outliers, primarily when only 2 samples were taken or when Nm = 0.1. In all cases when 20 samples were taken, the quantiles identified approximately 5% of the loci as outliers (Table 2).
The results of running fsthet on the simulated data from the forward-in-time simulation without selection
| Demes . | Nm . | 2 Samples . | 5 Samples . | 10 Samples . | 20 Samples . |
|---|---|---|---|---|---|
| 2 | 0.1 | 0.0993 | 0.0841 | 0.0559 | 0.0421 |
| 2 | 1 | 0.1044 | 0.0651 | 0.0618 | 0.0638 |
| 2 | 10 | 0.0945 | 0.0635 | 0.0590 | 0.0585 |
| 5 | 0.1 | 0.0897 | 0.0840 | 0.0751 | 0.0628 |
| 5 | 1 | 0.0789 | 0.0612 | 0.0660 | 0.0610 |
| 5 | 10 | 0.0825 | 0.0625 | 0.0620 | 0.0600 |
| 10 | 0.1 | 0.0591 | 0.0770 | 0.0712 | 0.0610 |
| 10 | 1 | 0.0772 | 0.0627 | 0.0615 | 0.0565 |
| 10 | 10 | 0.0885 | 0.0560 | 0.0575 | 0.0580 |
| 25 | 0.1 | 0.0780 | 0.0787 | 0.0716 | 0.0600 |
| 25 | 1 | 0.0628 | 0.0656 | 0.0635 | 0.0590 |
| 25 | 10 | 0.0890 | 0.0645 | 0.0660 | 0.0580 |
| 50 | 0.1 | 0.0631 | 0.0736 | 0.0715 | 0.0535 |
| 50 | 1 | 0.0683 | 0.0610 | 0.0625 | 0.0595 |
| 50 | 10 | 0.0715 | 0.0670 | 0.0585 | 0.0580 |
| 75 | 0.1 | 0.0742 | 0.0830 | 0.0700 | 0.0570 |
| 75 | 1 | 0.0786 | 0.0626 | 0.0606 | 0.0580 |
| 75 | 10 | 0.0775 | 0.0580 | 0.0535 | 0.0590 |
| 100 | 0.1 | 0.0536 | 0.0724 | 0.0715 | 0.0605 |
| 100 | 1 | 0.0805 | 0.0631 | 0.0575 | 0.0605 |
| 100 | 10 | 0.0685 | 0.0590 | 0.0595 | 0.0600 |
| Demes . | Nm . | 2 Samples . | 5 Samples . | 10 Samples . | 20 Samples . |
|---|---|---|---|---|---|
| 2 | 0.1 | 0.0993 | 0.0841 | 0.0559 | 0.0421 |
| 2 | 1 | 0.1044 | 0.0651 | 0.0618 | 0.0638 |
| 2 | 10 | 0.0945 | 0.0635 | 0.0590 | 0.0585 |
| 5 | 0.1 | 0.0897 | 0.0840 | 0.0751 | 0.0628 |
| 5 | 1 | 0.0789 | 0.0612 | 0.0660 | 0.0610 |
| 5 | 10 | 0.0825 | 0.0625 | 0.0620 | 0.0600 |
| 10 | 0.1 | 0.0591 | 0.0770 | 0.0712 | 0.0610 |
| 10 | 1 | 0.0772 | 0.0627 | 0.0615 | 0.0565 |
| 10 | 10 | 0.0885 | 0.0560 | 0.0575 | 0.0580 |
| 25 | 0.1 | 0.0780 | 0.0787 | 0.0716 | 0.0600 |
| 25 | 1 | 0.0628 | 0.0656 | 0.0635 | 0.0590 |
| 25 | 10 | 0.0890 | 0.0645 | 0.0660 | 0.0580 |
| 50 | 0.1 | 0.0631 | 0.0736 | 0.0715 | 0.0535 |
| 50 | 1 | 0.0683 | 0.0610 | 0.0625 | 0.0595 |
| 50 | 10 | 0.0715 | 0.0670 | 0.0585 | 0.0580 |
| 75 | 0.1 | 0.0742 | 0.0830 | 0.0700 | 0.0570 |
| 75 | 1 | 0.0786 | 0.0626 | 0.0606 | 0.0580 |
| 75 | 10 | 0.0775 | 0.0580 | 0.0535 | 0.0590 |
| 100 | 0.1 | 0.0536 | 0.0724 | 0.0715 | 0.0605 |
| 100 | 1 | 0.0805 | 0.0631 | 0.0575 | 0.0605 |
| 100 | 10 | 0.0685 | 0.0590 | 0.0595 | 0.0600 |
To evaluate the performance of fsthet, we varied 3 parameters in the simulations: the number of demes, the number of sampled populations, and Nm, with a population size of 1000. We calculated the smoothed quantiles with fsthet, excluding loci with an expected total heterozygosity of zero (i.e., loci with no variation in the simulated samples). The combinations resulting in a proportion of outlier loci near the expected number (0.025–0.075) are in italicized, boldface font.
The results of running fsthet on the simulated data from the forward-in-time simulation without selection
| Demes . | Nm . | 2 Samples . | 5 Samples . | 10 Samples . | 20 Samples . |
|---|---|---|---|---|---|
| 2 | 0.1 | 0.0993 | 0.0841 | 0.0559 | 0.0421 |
| 2 | 1 | 0.1044 | 0.0651 | 0.0618 | 0.0638 |
| 2 | 10 | 0.0945 | 0.0635 | 0.0590 | 0.0585 |
| 5 | 0.1 | 0.0897 | 0.0840 | 0.0751 | 0.0628 |
| 5 | 1 | 0.0789 | 0.0612 | 0.0660 | 0.0610 |
| 5 | 10 | 0.0825 | 0.0625 | 0.0620 | 0.0600 |
| 10 | 0.1 | 0.0591 | 0.0770 | 0.0712 | 0.0610 |
| 10 | 1 | 0.0772 | 0.0627 | 0.0615 | 0.0565 |
| 10 | 10 | 0.0885 | 0.0560 | 0.0575 | 0.0580 |
| 25 | 0.1 | 0.0780 | 0.0787 | 0.0716 | 0.0600 |
| 25 | 1 | 0.0628 | 0.0656 | 0.0635 | 0.0590 |
| 25 | 10 | 0.0890 | 0.0645 | 0.0660 | 0.0580 |
| 50 | 0.1 | 0.0631 | 0.0736 | 0.0715 | 0.0535 |
| 50 | 1 | 0.0683 | 0.0610 | 0.0625 | 0.0595 |
| 50 | 10 | 0.0715 | 0.0670 | 0.0585 | 0.0580 |
| 75 | 0.1 | 0.0742 | 0.0830 | 0.0700 | 0.0570 |
| 75 | 1 | 0.0786 | 0.0626 | 0.0606 | 0.0580 |
| 75 | 10 | 0.0775 | 0.0580 | 0.0535 | 0.0590 |
| 100 | 0.1 | 0.0536 | 0.0724 | 0.0715 | 0.0605 |
| 100 | 1 | 0.0805 | 0.0631 | 0.0575 | 0.0605 |
| 100 | 10 | 0.0685 | 0.0590 | 0.0595 | 0.0600 |
| Demes . | Nm . | 2 Samples . | 5 Samples . | 10 Samples . | 20 Samples . |
|---|---|---|---|---|---|
| 2 | 0.1 | 0.0993 | 0.0841 | 0.0559 | 0.0421 |
| 2 | 1 | 0.1044 | 0.0651 | 0.0618 | 0.0638 |
| 2 | 10 | 0.0945 | 0.0635 | 0.0590 | 0.0585 |
| 5 | 0.1 | 0.0897 | 0.0840 | 0.0751 | 0.0628 |
| 5 | 1 | 0.0789 | 0.0612 | 0.0660 | 0.0610 |
| 5 | 10 | 0.0825 | 0.0625 | 0.0620 | 0.0600 |
| 10 | 0.1 | 0.0591 | 0.0770 | 0.0712 | 0.0610 |
| 10 | 1 | 0.0772 | 0.0627 | 0.0615 | 0.0565 |
| 10 | 10 | 0.0885 | 0.0560 | 0.0575 | 0.0580 |
| 25 | 0.1 | 0.0780 | 0.0787 | 0.0716 | 0.0600 |
| 25 | 1 | 0.0628 | 0.0656 | 0.0635 | 0.0590 |
| 25 | 10 | 0.0890 | 0.0645 | 0.0660 | 0.0580 |
| 50 | 0.1 | 0.0631 | 0.0736 | 0.0715 | 0.0535 |
| 50 | 1 | 0.0683 | 0.0610 | 0.0625 | 0.0595 |
| 50 | 10 | 0.0715 | 0.0670 | 0.0585 | 0.0580 |
| 75 | 0.1 | 0.0742 | 0.0830 | 0.0700 | 0.0570 |
| 75 | 1 | 0.0786 | 0.0626 | 0.0606 | 0.0580 |
| 75 | 10 | 0.0775 | 0.0580 | 0.0535 | 0.0590 |
| 100 | 0.1 | 0.0536 | 0.0724 | 0.0715 | 0.0605 |
| 100 | 1 | 0.0805 | 0.0631 | 0.0575 | 0.0605 |
| 100 | 10 | 0.0685 | 0.0590 | 0.0595 | 0.0600 |
To evaluate the performance of fsthet, we varied 3 parameters in the simulations: the number of demes, the number of sampled populations, and Nm, with a population size of 1000. We calculated the smoothed quantiles with fsthet, excluding loci with an expected total heterozygosity of zero (i.e., loci with no variation in the simulated samples). The combinations resulting in a proportion of outlier loci near the expected number (0.025–0.075) are in italicized, boldface font.
The smoothed quantiles detected at least 50% of loci subjected to strong directional selection (s ≥ 0.1) when Nm = 10 (Table 3) in 23 of the parameter combinations. We tested the detection of selected loci in the case when fsthet behaved as expected with neutral loci (i.e., when 20 samples were taken). LOSITAN had significantly more false positives compared to fsthet (1-tailed paired t-test, t = −7.1865, df = 51, p = 1.381 × 10−9) with only 13.46% of the runs with the expected number of false positives compared to 100% of the analyses based on fsthet. However, even under ideal circumstances and with corrections for multiple testing, it can be impossible to differentiate between selected loci and false positives.
The results of running fsthet on the output from the forward-in-time simulations with selection
| . | s = 0.01 . | s = 0.1 . | s = 0.5 . | S = 1 . | |||||
|---|---|---|---|---|---|---|---|---|---|
| Demes . | Nm . | Neutral . | Selected . | Neutral . | Selected . | Neutral . | Selected . | Neutral . | Selected . |
| 2 | 0.1 | 0.050 | 0.000 | 0.056 | 0.000 | 0.057 | 0.100 | 0.058 | 0.000 |
| 2 | 1 | 0.066 | 0.100 | 0.063 | 0.100 | 0.064 | 0.400 | 0.061 | 0.300 |
| 2 | 10 | 0.060 | 0.200 | 0.059 | 0.500 | 0.058 | 0.600 | 0.063 | 0.600 |
| 5 | 0.1 | 0.065 | 0.000 | 0.066 | 0.200 | 0.060 | 0.200 | 0.063 | 0.400 |
| 5 | 1 | 0.060 | 0.500 | 0.062 | 0.500 | 0.062 | 0.500 | 0.061 | 0.700 |
| 5 | 10 | 0.054 | 0.100 | 0.057 | 0.800 | 0.061 | 0.800 | 0.060 | 0.700 |
| 10 | 0.1 | 0.060 | 0.000 | 0.061 | 0.200 | 0.057 | 0.200 | 0.059 | 0.300 |
| 10 | 1 | 0.060 | 0.600 | 0.058 | 0.300 | 0.059 | 0.500 | 0.059 | 0.500 |
| 10 | 10 | 0.060 | 0.100 | 0.058 | 0.700 | 0.061 | 0.900 | 0.061 | 0.900 |
| 50 | 0.1 | 0.056 | 0.000 | 0.058 | 0.500 | 0.060 | 0.200 | 0.054 | 0.300 |
| 50 | 1 | 0.059 | 0.400 | 0.058 | 0.500 | 0.062 | 0.900 | 0.057 | 0.900 |
| 50 | 10 | 0.061 | 0.100 | 0.062 | 1.000 | 0.058 | 1.000 | 0.062 | 0.900 |
| . | s = 0.01 . | s = 0.1 . | s = 0.5 . | S = 1 . | |||||
|---|---|---|---|---|---|---|---|---|---|
| Demes . | Nm . | Neutral . | Selected . | Neutral . | Selected . | Neutral . | Selected . | Neutral . | Selected . |
| 2 | 0.1 | 0.050 | 0.000 | 0.056 | 0.000 | 0.057 | 0.100 | 0.058 | 0.000 |
| 2 | 1 | 0.066 | 0.100 | 0.063 | 0.100 | 0.064 | 0.400 | 0.061 | 0.300 |
| 2 | 10 | 0.060 | 0.200 | 0.059 | 0.500 | 0.058 | 0.600 | 0.063 | 0.600 |
| 5 | 0.1 | 0.065 | 0.000 | 0.066 | 0.200 | 0.060 | 0.200 | 0.063 | 0.400 |
| 5 | 1 | 0.060 | 0.500 | 0.062 | 0.500 | 0.062 | 0.500 | 0.061 | 0.700 |
| 5 | 10 | 0.054 | 0.100 | 0.057 | 0.800 | 0.061 | 0.800 | 0.060 | 0.700 |
| 10 | 0.1 | 0.060 | 0.000 | 0.061 | 0.200 | 0.057 | 0.200 | 0.059 | 0.300 |
| 10 | 1 | 0.060 | 0.600 | 0.058 | 0.300 | 0.059 | 0.500 | 0.059 | 0.500 |
| 10 | 10 | 0.060 | 0.100 | 0.058 | 0.700 | 0.061 | 0.900 | 0.061 | 0.900 |
| 50 | 0.1 | 0.056 | 0.000 | 0.058 | 0.500 | 0.060 | 0.200 | 0.054 | 0.300 |
| 50 | 1 | 0.059 | 0.400 | 0.058 | 0.500 | 0.062 | 0.900 | 0.057 | 0.900 |
| 50 | 10 | 0.061 | 0.100 | 0.062 | 1.000 | 0.058 | 1.000 | 0.062 | 0.900 |
To evaluate conditions under which fsthet accurately detects selected loci, we varied 3 parameters: the number of demes, Nm, and the strength of selection (s), with a population size of 1000 and sampling 20 demes. We used fsthet to calculate smoothed quantiles. Here, we present the proportion of loci that fell outside the smoothed quantiles, excluding loci with an expected total heterozygosity of zero (i.e., loci with no variation). The combinations resulting in a proportion of outlier loci near the expected number (0.025–0.075) are in boldface font in the “neutral” columns. We also calculated the proportion of selected loci that fell outside of the 95% smoothed quantile and present those proportions in the “selected” columns. Parameter combinations resulting in 50% or more of the selected loci being designated as outliers are in italicized, boldface font.
The results of running fsthet on the output from the forward-in-time simulations with selection
| . | s = 0.01 . | s = 0.1 . | s = 0.5 . | S = 1 . | |||||
|---|---|---|---|---|---|---|---|---|---|
| Demes . | Nm . | Neutral . | Selected . | Neutral . | Selected . | Neutral . | Selected . | Neutral . | Selected . |
| 2 | 0.1 | 0.050 | 0.000 | 0.056 | 0.000 | 0.057 | 0.100 | 0.058 | 0.000 |
| 2 | 1 | 0.066 | 0.100 | 0.063 | 0.100 | 0.064 | 0.400 | 0.061 | 0.300 |
| 2 | 10 | 0.060 | 0.200 | 0.059 | 0.500 | 0.058 | 0.600 | 0.063 | 0.600 |
| 5 | 0.1 | 0.065 | 0.000 | 0.066 | 0.200 | 0.060 | 0.200 | 0.063 | 0.400 |
| 5 | 1 | 0.060 | 0.500 | 0.062 | 0.500 | 0.062 | 0.500 | 0.061 | 0.700 |
| 5 | 10 | 0.054 | 0.100 | 0.057 | 0.800 | 0.061 | 0.800 | 0.060 | 0.700 |
| 10 | 0.1 | 0.060 | 0.000 | 0.061 | 0.200 | 0.057 | 0.200 | 0.059 | 0.300 |
| 10 | 1 | 0.060 | 0.600 | 0.058 | 0.300 | 0.059 | 0.500 | 0.059 | 0.500 |
| 10 | 10 | 0.060 | 0.100 | 0.058 | 0.700 | 0.061 | 0.900 | 0.061 | 0.900 |
| 50 | 0.1 | 0.056 | 0.000 | 0.058 | 0.500 | 0.060 | 0.200 | 0.054 | 0.300 |
| 50 | 1 | 0.059 | 0.400 | 0.058 | 0.500 | 0.062 | 0.900 | 0.057 | 0.900 |
| 50 | 10 | 0.061 | 0.100 | 0.062 | 1.000 | 0.058 | 1.000 | 0.062 | 0.900 |
| . | s = 0.01 . | s = 0.1 . | s = 0.5 . | S = 1 . | |||||
|---|---|---|---|---|---|---|---|---|---|
| Demes . | Nm . | Neutral . | Selected . | Neutral . | Selected . | Neutral . | Selected . | Neutral . | Selected . |
| 2 | 0.1 | 0.050 | 0.000 | 0.056 | 0.000 | 0.057 | 0.100 | 0.058 | 0.000 |
| 2 | 1 | 0.066 | 0.100 | 0.063 | 0.100 | 0.064 | 0.400 | 0.061 | 0.300 |
| 2 | 10 | 0.060 | 0.200 | 0.059 | 0.500 | 0.058 | 0.600 | 0.063 | 0.600 |
| 5 | 0.1 | 0.065 | 0.000 | 0.066 | 0.200 | 0.060 | 0.200 | 0.063 | 0.400 |
| 5 | 1 | 0.060 | 0.500 | 0.062 | 0.500 | 0.062 | 0.500 | 0.061 | 0.700 |
| 5 | 10 | 0.054 | 0.100 | 0.057 | 0.800 | 0.061 | 0.800 | 0.060 | 0.700 |
| 10 | 0.1 | 0.060 | 0.000 | 0.061 | 0.200 | 0.057 | 0.200 | 0.059 | 0.300 |
| 10 | 1 | 0.060 | 0.600 | 0.058 | 0.300 | 0.059 | 0.500 | 0.059 | 0.500 |
| 10 | 10 | 0.060 | 0.100 | 0.058 | 0.700 | 0.061 | 0.900 | 0.061 | 0.900 |
| 50 | 0.1 | 0.056 | 0.000 | 0.058 | 0.500 | 0.060 | 0.200 | 0.054 | 0.300 |
| 50 | 1 | 0.059 | 0.400 | 0.058 | 0.500 | 0.062 | 0.900 | 0.057 | 0.900 |
| 50 | 10 | 0.061 | 0.100 | 0.062 | 1.000 | 0.058 | 1.000 | 0.062 | 0.900 |
To evaluate conditions under which fsthet accurately detects selected loci, we varied 3 parameters: the number of demes, Nm, and the strength of selection (s), with a population size of 1000 and sampling 20 demes. We used fsthet to calculate smoothed quantiles. Here, we present the proportion of loci that fell outside the smoothed quantiles, excluding loci with an expected total heterozygosity of zero (i.e., loci with no variation). The combinations resulting in a proportion of outlier loci near the expected number (0.025–0.075) are in boldface font in the “neutral” columns. We also calculated the proportion of selected loci that fell outside of the 95% smoothed quantile and present those proportions in the “selected” columns. Parameter combinations resulting in 50% or more of the selected loci being designated as outliers are in italicized, boldface font.
Discussion
Here, we demonstrate that many studies have misused the FST–heterozygosity outlier analysis by using too few independent populations. The misapplication of LOSITAN results in an excess of false positives, particularly when the number of independent populations sampled is small and the migration rate is low. The excess of false positives arises from variability in the FST–heterozygosity relationship, based on population structure and demographics, which is often not appropriately represented by FDIST2 and LOSITAN. To circumvent these problems, we suggest using a smoothed quantile of the empirical dataset to identify loci with extreme values.
The Shape of the F ST–Heterozygosity Distribution
Recent studies involving hundreds or thousands of loci have made the patterns in FST–heterozygosity distributions obvious. With more data being incorporated into population genetics analyses, the full distribution of the FST–heterozygosity relationship is being better represented by empirical data, so the discordance between expectations and observed patterns are much more obvious. In our survey of articles presenting figures from LOSITAN, 67% displayed skewed or inclined distributions, and 68% included nonindependent samples due to hierarchical population structure (i.e., analogous to sampling demes multiple times). However, we did not include any studies using Arlequin (Excoffier et al. 2009) or other methods that account for hierarchical population structure in their search for outlier FST values. Additionally, the FDIST2 simulation module and an infinite islands simulation resulted in similar FST–heterozygosity distributions. These skewed and inclined patterns result from the mathematical relationship between HT and maximum FST for biallelic loci, which has been explored by Jakobsson et al. (2013) for the case of 2 populations.
The jagged patterns that emerge in the simulations are likely tied to the relationship between FST and the major allele frequency described by Jakobsson et al. (2013). Their analysis found that, in the case of numerous alleles in 2 populations, when the major allele frequency is less than 0.5, FST is bound by a jagged curve. The jagged curves we observed are likely due a similar constraint on the relationship between FST and HT.
The Role of Migration, Number of Demes, and Nonindependence of Populations in Shaping the F ST–Heterozygosity Distribution
The number of demes and migration rate greatly impact the FST–heterozygosity distribution by affecting the distribution of HT, which constrains the possible FST values (Jakobsson et al. 2013). Generally, the FST–heterozygosity distribution results in a pattern that is poorly fit by the confidence intervals from LOSITAN when the number of demes and Nm are small (Figures 3–5). In other words, when there are few populations that have little gene flow, the FST–heterozygosity distribution generally exhibits a ramp-like shape that is not well captured by LOSITAN, resulting in an excess of false positives (Table 1). Neither of these parameters is known a priori in most population genetics studies, nor can they be manipulated by researchers. Increasing the number of samples per deme does not change the distribution of the FST–heterozygosity relationship (Figure 4). This point is important because, even if researchers are thorough in their sampling efforts, the FST–heterozygosity distribution in their dataset may still be skewed or have an incline, rendering the LOSITAN analysis inappropriate for that dataset. Moreover, it is important to recognize that some organisms may possess underlying patterns of population structure that preclude LOSITAN (or any other related method) from detecting outliers, regardless of the design of sampling efforts.
The results of the forward-in-time simulations for sampling each of 2 demes (top row) or 5 demes (bottom row) multiple times. This figure demonstrates that sampling a few demes multiple times does not greatly affect the FST–heterozygosity distribution. Thus, if the true population structure of a species of interest includes only a few well-differentiated populations, sampling more localities will not repair the FST–heterozygosity relationship. For all of the runs presented here, the model was run for 5000 generations, with 2000 loci, the population size was 1000, and Nm = 1. The points here represent the sampled values of and HB, with each sample containing 50 diploid individuals. The 95% confidence intervals in LOSITAN are shown in red lines.
The results of the forward-in-time simulations for sampling each of 2 demes (top row) or 5 demes (bottom row) multiple times. This figure demonstrates that sampling a few demes multiple times does not greatly affect the FST–heterozygosity distribution. Thus, if the true population structure of a species of interest includes only a few well-differentiated populations, sampling more localities will not repair the FST–heterozygosity relationship. For all of the runs presented here, the model was run for 5000 generations, with 2000 loci, the population size was 1000, and Nm = 1. The points here represent the sampled values of and HB, with each sample containing 50 diploid individuals. The 95% confidence intervals in LOSITAN are shown in red lines.
The jagged distribution emerges from the forward-in-time simulations over a wide range of sample sizes and numbers of demes. These numerical analyses were run with 2000 loci and a population size of 1000 with Nm = 0.1. The points are the and HB values calculated by LOSITAN and the lines are the 95% confidence intervals generated by LOSITAN. The numbers in the upper left-hand corner of the graph show the proportion of loci designated as under balancing or positive selection, and the overall (“total”) proportion of loci that were significant. The jagged pattern is not reflected in most of the LOSITAN confidence intervals.
The jagged distribution emerges from the forward-in-time simulations over a wide range of sample sizes and numbers of demes. These numerical analyses were run with 2000 loci and a population size of 1000 with Nm = 0.1. The points are the and HB values calculated by LOSITAN and the lines are the 95% confidence intervals generated by LOSITAN. The numbers in the upper left-hand corner of the graph show the proportion of loci designated as under balancing or positive selection, and the overall (“total”) proportion of loci that were significant. The jagged pattern is not reflected in most of the LOSITAN confidence intervals.
The results of forward-in-time simulations analyzed with LOSITAN and fsthet. All panels show numerical analyses run with Nm = 1, a population size of 1000, and 20 samples randomly taken from the demes. The loci marked with red stars experienced directional selection with a selection coefficient indicated at the top of the panel. The blue lines show 95% smoothed quantiles calculated by fsthet and the red dashed lines depict the 95% LOSITAN confidence intervals. The and HB values were calculated by LOSITAN.
The results of forward-in-time simulations analyzed with LOSITAN and fsthet. All panels show numerical analyses run with Nm = 1, a population size of 1000, and 20 samples randomly taken from the demes. The loci marked with red stars experienced directional selection with a selection coefficient indicated at the top of the panel. The blue lines show 95% smoothed quantiles calculated by fsthet and the red dashed lines depict the 95% LOSITAN confidence intervals. The and HB values were calculated by LOSITAN.
One possible solution for minimizing the number of false positives is to use a hierarchical island model in the coalescent simulations as opposed to the infinite alleles model used by FDIST2 (Excoffier et al. 2009). However, if hierarchical population structure is not accommodated correctly, by specifying the number of clusters and the number of populations within each cluster, this technique also produces an excess of false positives (Excoffier et al. 2009). In an empirical study, confounding factors such as isolation by distance can bias the estimation of the number of populations (Meirmans 2012); in such cases, using a hierarchical island model would still yield biased outlier results. Although our approach using smoothed quantiles is not a statistical test, it does not require any assumptions about the underlying population structure and therefore may be more broadly applicable than other outlier detection methods.
False Positives in LOSITAN due to Misleading Confidence Intervals
In most cases of a skewed or inclined distribution, the confidence intervals generated by LOSITAN clearly do not fit the data. Under these circumstances, LOSITAN produces many more false positives than expected. In our forward-in-time simulations, for example, we found that only 27.4% of parameter combinations identified the expected number of false positives (5% at α = 0.05) when the simulated output was analyzed in LOSITAN (Table 1). This finding demonstrates that most parameter combinations investigated in the present study resulted in a large number of loci being designated as targets of selection when they were actually neutral, suggesting that LOSITAN may overestimate the number of selected loci.
Results of forward-in-time simulations analyzed with fsthet and LOSITAN when gene flow is high. These show numerical analyses run with Nm = 10, a population size of 1000, and 20 samples randomly taken from the demes. The loci marked with red stars experienced directional selection (s = 0, 0.01, 0.1, or 0.5, as indicated). The blue lines show 95% smoothed quantiles calculated by fsthet and the red dashed lines depict the 95% LOSITAN confidence intervals. The and HB values were calculated by LOSITAN.
Results of forward-in-time simulations analyzed with fsthet and LOSITAN when gene flow is high. These show numerical analyses run with Nm = 10, a population size of 1000, and 20 samples randomly taken from the demes. The loci marked with red stars experienced directional selection (s = 0, 0.01, 0.1, or 0.5, as indicated). The blue lines show 95% smoothed quantiles calculated by fsthet and the red dashed lines depict the 95% LOSITAN confidence intervals. The and HB values were calculated by LOSITAN.
Conclusions
The unexpected patterns appearing in the empirical literature, the FDIST2 simulations, and our forward-in-time simulations provide multiple lines of evidence that using the FST–heterozygosity relationship to identify loci under selection may not be a valid approach for many common population genetics scenarios. The LOSITAN analysis succeeds only when the migration rate is high and many genetically independent demes are sampled. However, the true number of demes (and their delineations) and migration rate are often unknown variables when sampling natural populations. Our review of the literature suggests that these ideal conditions are not the norm for most population genetic studies. These observed limitations of the FDIST2–LOSITAN approach also call for all articles employing these approaches to include a graph of FST versus HT for their data, either in the main article or as a supplement.
Our R package, fsthet, calculates smoothed quantiles for the observed distribution and therefore overcomes some of the issues that plague LOSITAN, which uses coalescent simulations that rely on many potentially unknown population-level parameter values. Despite the advantage of not assuming a particular distribution to identify outliers, fsthet has 2 drawbacks. First, it is not a test of any particular model, and second it identifies a specific number of outliers from the empirical distribution. The number of outliers will depend on the number of loci in the analysis and the threshold chosen. However, we believe that unnecessary assumptions regarding the FST–heterozygosity distribution should be avoided, because this distribution can change substantially in response to a variety of unknown population parameter values.
In some cases, such as when few demes are sampled or when migration rates are low, the FST–heterozygosity approach is not recommended for pinpointing putative targets of selection because the neutral loci saturate the full distribution of potential FST–heterozygosity values. Instead, we suggest using alternative methods, such Bayesian maximum-likelihood analyses that account for nonindependence of populations (e.g., Beaumont and Balding 2004; Foll and Gaggiotti 2008; Günther and Coop 2013; Duforet-Frebourg et al. 2014) to identify putative targets of selection. Regardless, continued testing of approaches will be necessary as more sophisticated models are developed and as empirical genotype-by-sequencing methods continue to evolve.
Supplementary Material
Supplementary data are available at Journal of Heredity online.
Funding
This work was supported by the National Science Foundation (DEB-1119261 to A.G.J., DEB-1401688 to A.G.J. and S.P.F.) and the National Science Foundation Graduate Research Fellowship (DGE-1252521 to S.P.F.).
Data Availability
The forward-in-time simulation program and fsthet are uploaded to GitHub (https://github.com/spflanagan/fsthet_analysis) and archived on Zenodo (https://zenodo.org/record/376816#.WMVWhPkrKUk). The output from the simulation runs is archived on Dryad (doi:10.5061/dryad.785bn).
Acknowledgments
Six anonymous reviewers provided helpful comments on previous drafts of this manuscript.







