Population Structure Limits Parallel Evolution in Sticklebacks

Abstract Population genetic theory predicts that small effective population sizes (Ne) and restricted gene flow limit the potential for local adaptation. In particular, the probability of evolving similar phenotypes based on shared genetic mechanisms (i.e., parallel evolution), is expected to be reduced. We tested these predictions in a comparative genomic study of two ecologically similar and geographically codistributed stickleback species (viz. Gasterosteus aculeatus and Pungitius pungitius). We found that P. pungitius harbors less genetic diversity and exhibits higher levels of genetic differentiation and isolation-by-distance than G. aculeatus. Conversely, G. aculeatus exhibits a stronger degree of genetic parallelism across freshwater populations than P. pungitius: 2,996 versus 379 single nucleotide polymorphisms located within 26 versus 9 genomic regions show evidence of selection in multiple freshwater populations of G. aculeatus and P. pungitius, respectively. Most regions involved in parallel evolution in G. aculeatus showed increased levels of divergence, suggestive of selection on ancient haplotypes. In contrast, haplotypes involved in freshwater adaptation in P. pungitius were younger. In accordance with theory, the results suggest that connectivity and genetic drift play crucial roles in determining the levels and geographic distribution of standing genetic variation, providing evidence that population subdivision limits local adaptation and therefore also the likelihood of parallel evolution.


Supplementary Tables
Supplementary Table 1 | Information on LD-clusters of three-and nine-spined sticklebacks.
Supplementary Table 2 | Information on the nine-spined stickleback samples used in the study. Figure 1b, c.

Supplementary Information
Supplementary Information 1 | Methodological considerations.

| Complexity reduction using nested three-step linkage disequilibrium network analyses (LDna)
Since allelic states of many pairs of loci can be highly correlated (in linkage disequilibrium, LD), corrections for multiple testingassuming all loci are independentare unnecessarily conservative. Recently, Li et al. 2018 (see also : Santure and Garant 2018) developed a method where correlated sets of loci (LD-clusters) along non-overlapping windows along chromosomes can be grouped together by LD network analyses (LDna; Kemppainen et al. 2015) and summarised by Principal Component Analyses (PCA) prior to Genome Wide Association (GWA)-analyses. Furthermore, in the presence population structuring, LD typically extends across the whole genome such that complexity reduction can be increased further by performing LDna within chromosomes or even at the whole genome level (prior to GWA-analyses). The idea is that for all (genome wide) loci that only reflect population structuring (thus cannot be reliably associated with any of the studied traits due to relatedness) only one test is sufficient, thus substantially reducing the total number of tests needed. The benefit of this, relative to window-based approaches that disregards the correlation structure among loci, is the possibility to separate different sets of correlated loci even though they are interspersed in the genome (Li et al. 2018), as demonstrated in Supplementary Fig. 1a.
Since LD-estimates between all pairs of loci from large population genomic data set (with millions of loci) cannot be considered simultaneously, we adopt a nested three-step approach to perform complexity reduction (Fang et al. 2020;Li et al. 2018) prior to downstream analyses. In LDna step 1 (LDna-1) we recursively identified LD-clusters along chromosomes where at least one locus was connected by LD>0.7 with all other loci in its cluster. These SNPs represent the whole cluster in the subsequent analysis steps (rSNPs; if several were found, one SNPs was randomly chosen; Supplementary Fig. 1a). To reduce computational speed this recursive algorithm was performed within nonoverlapping windows of 1000 bi-allelic SNPs and within those windows, LD was estimated in windows of 100 SNPs with minor allele frequency, mar>0.05, (Fang et al. 2020). Thus, while no SNP could directly be connected to another SNP further than 100 SNPs away, they could nevertheless belong to the same cluster via links to other (intermediate) SNPs, and SNPs from different (non-overlapping) 1000 bps windows can of course become connected in LDna-2. The median proportion of the genetic variation explained by PC1 for such LD-clusters typically exceeds 99.9%.
In the second step (LDna-2; Supplementary Fig. 1b), using the rSNPs from LDna-1, we performed LDna separately for each chromosome ignoring all singleton clusters (SNPs that were not connected to any other locus by at least LD = 0.7), as in Fang et al. 2020. In standard LDna (Kemppainen et al. 2015), all pair-wise LD values are considered simultaneously using single-linkage clustering trees, where each cluster is defined by its weakest LD-value among any pair of loci in the cluster. LD-clusters of interest are defined by two parameters; the minimum number of edges in the cluster (|E|min, which is sensitive to both the connectedness within the cluster as well as the number of loci in a cluster) and a threshold for λ (λlim), which indicates the magnitude of how different LD-clusters are with respect to median LD when considered separately or jointly (Kemppainen et al. 2015).
However, similar cluster solutions that can be obtained by different combinations of |E|min and λlim can also be obtained just by varying |E|min. Since these parameters are redundant, we here considered all branches for a given |E|min as separate LD-clusters regardless of λ (thus removing this parameter altogether) as shown in Supplementary Fig. 1b. Since, for a given value of |E|min, PC1 in "LD-clades" with node depths ≥0.8 (i.e. when the lowest LD value between any two loci in the cluster is ≥0.8) typically explain >>90% of the variation, these were considered as single LD-clusters; such cases increase with lower |E|min as this typically leads to more branching at higher LD levels. In previous implementations of the three-step LDna (Fang et al. 2020), the most interconnected SNP (the SNP with the highest median LD with all other loci in its cluster) in each LD-cluster/branch from LDna-2 was chosen as the rSNP for LDna-3. However, since the genetic variation in a cluster with many loci more efficiently and robustly can be summarised by PC-coordinates (based on all the loci in the cluster), we here opted to base the LDna-3 tree on all pairwise correlation coefficients of PC1 coordinates between the LDna-2 LD-clusters entering LDna-3 ( Supplementary Fig. 1c). The matrix of all pair-wise correlation coefficients KPC1 were nevertheless always correlated with the corresponding matrix of LD KLD based on rSNPs, so this is not likely to substantially affect the final LDna-analysis, except to make it less sensitive to the exact choice of the rSNP.

Data sets
To make a rigorous comparison, the comparative LDna analyses between three-and ninespined sticklebacks were restricted to the Atlantic region. This is because our sampling lacked marine nine-spined stickleback populations from the Pacific regions and because three-spined sticklebacks from the Eastern Pacific region exhibit unusually high (ca. 10 times) genetic parallelism compared to other regions in the world (Fang et al. 2020).
To generate comparable datasets for three-and nine-spined sticklebacks for LDna and downstream analyses, it is important to consider that genome coverage was much higher in nine-than three-spined sticklebacks due to larger proportion of WGS samples in the former species (80.1%) than in the latter species (22.9%). Therefore, firstly, we made a list of sites across the genome for both species for which we had sufficient coverage of highquality reads using ANGSD (-minIndDepth 1, -uniqueOnly 1, -remove_bads 1, -minMapQ 20, -minQ 20, -minInd 25%), resulting in 101,598,319 and 129,356,224 sites in three-and nine-spined stickleback genomes, respectively. Then we retained variable sites using the -SNP_pval flag in ANGSD (-SNP_pval 1e -6 ) and filtered by minor allele frequency (maf ≥ 0.05) using custom R scripts based on the output 'GENO' file. After this, 882,125 SNPs and 1,725,617 SNPs were retained from three-and nine-spined sticklebacks, respectively.
Next we computed the number of SNPs that we would have discovered in nine-spined sticklebacks if we had sequenced the same number of bases as for the three-spined sticklebacks: SNPs9s=SNPs3s/L3s*L9s , where SNPs9s is the number of SNPs to retain in the nine-spined sticklebacks dataset, SNPs3s is the total number of SNPs passing quality filters in the three-spined sticklebacks dataset, and L3s and L9s are the total number of sequenced sites passing quality control in the three and nine-spined sticklebacks datasets, respectively. This resulted in 1,355,325 SNPs in the nine-spined stickleback dataset (101,598,319/129,356,224*1,725,617), which we subsampled from the 1,725,617 SNPs that passed quality filtering. Ultimately, 882,125 SNPs for the three-and 1,355,325 SNPs for the nine-spined stickleback (in form of genotype likelihoods) were used in the downstream comparative LDna analyses.

| Testing associations between "synthetic alleles" from LD-clusters and ecotypes using linear mixed models
Since population demographic events such as range expansions, bottlenecks and population growth greatly increase the variance of test statistics from "Genome-scan" based outlier methods, such methods are prone to inflated rates of false positives as well as lack power when background differentiation due to population structuring is high (Galloway et al. 2020;Hoban et al. 2016;Matthey-Doret and Whitlock 2019). Genetic regions underlying variation in locally adapted traits can be identified directly by linkage mapping or genome-wide association (GWA) approaches (e.g. Robinson et al. 2014;Savolainen et al. 2013;Vinkhuyzen et al. 2013). When studying local adaptation, however, both genome-scans and GWA-analyses estimate the same statistical association, namely that between locally adapted phenotypes and genotypes. Since all locally adapted traits are by definition in some way correlated with habitat, the expectation is that all locally adapted traits deviate from the null distribution in GWA-analyses, even if only one of them is tested (LD caused by local adaptation/parallel evolution; Kemppainen et al. 2015).
Conversely, if ecotype designation/habitat itself is treated as a binary trait, all genomic regions correlated with local adaptation are expected to deviate from the null distribution.
While relatedness in GWA-studies at any level of the population hierarchy (from families to populations) can also lead to false statistical associations between loci and traits of interest (i.e. p-value inflation; Kang et al. 2010;Kang et al. 2008), this has elegantly been solved by linear mixed model (LMM) analyses that successfully correct for the genetic relatedness in the data (Kang et al. 2008). When phenotypic values are spatially auto correlated (i.e. certain phenotypic values are mainly shared among related groups of individuals) this can, however, substantially decrease the power in GWA-studies (Kang et al. 2008). The converse can also be true; when two independently colonized and genetically divergent populations share the same heritable phenotypes, correcting for relatedness could potentially increase power to identify the associated loci (Kang et al. 2008). The main benefit of using GWA analyses to find loci associated with local adaptation is thus the ability to use a linear mixed model analysis framework adopted from GWA-studies to monitor and control for relatedness and other co-factors. Indeed, in a related field, Kemppainen et al. 2017 showed that in artificial selection experiments allele frequency change before and after selection also are prone to false positives due to relatedness, and that this could be solved by LMM accounting for relatedness by analysing survival as a binary trait (e.g., "1" for individuals remaining in the population after selection and "0" for those individuals that died). In Kemppainen et al. 2017, the correlation between -log10(P)-values from LMM analyses and those attained from a null-distribution generated by permutation was close to unity.
Previously, the Euclidean distance between group centroids (along the two first PC-axes and adjusting for the Euclidean distances within groups), known as the cluster separation score CSS has been applied to detect genomic regions associated with marine-freshwater parallelism in three-spined sticklebacks (Jones et al. 2012;Kingman et al. 2020). The significance of CSS can be tested by permutation and it correlates with FST, but with higher resolution when differentiation is strong (Jones et al. 2012). Since PC1-coordinates from the LD-clusters typically explain >>90% of the total variation, the CSS in this study will be simplified to the difference in means of the PC1-coordinates between marine and freshwater individuals. This simplification, however, is likely to lead to significant loss of information when a window-based approach to CSS is applied, i.e. disregarding correlation structure between SNPs, as in e.g. Jones et al. 2012 andKingman et al. 2020, and is thus only applicable in combination with complexity reduction using e.g. LDna.
Similarly to Kemppainen et al. 2017, where the significance of allele frequency differences between groups of individuals was tested for each bi-allelic SNP separately, we here show that the correlation between -log10(P)-values from permutation (performed as in Jones et al. 2012 andKingman et al. 2020) and from LMM-analyses (treating ecotype as a binary trait, but not correcting for any other potentially confounding factors) also is close to unity ( Supplementary Fig. 2). Thus, PC1 coordinates from the LD-clusters can be viewed as "synthetic multi-locus alleles" (SMLAs). Here, we thus adopt a similar approach to Kemppainen et al. 2017 to test for asssociations between the SMLA and ecotype using LMM, which allowed us to correct for potential p-value inflation due to relatedness and other potentially confounding factors. Note that in contrast to the CSS score used in Jones et al. 2012 andKingman et al. 2020, our simplified CSS scoresthat can be replaced by LMM analyses treating ecotype as binary traitdoes not account for within group variances. However, the correlation between -log10(P)-values from our simplified CSS and the CSS used in Jones et al. 2012 andKingman et al. 2020 is high (r 2 =0.97, n=343; this test was performed on all nine-spined stickleback LDna-1/LDna-2 clusters with parameter settings |E|min=20m, SNPmin=20, with 10 5 permutation replicates). Similarly, based on all LDna-1 clusters from the three-spined sticklebacks in this study, the linear correlation between -log10(P) from LMM and their 95% quantile FST (based on the loci from each cluster) was r 2 =0.92 (n=4412). Thus, when not accounting for any confounding factors, testing for the association between SMLAs and ecotype using LMM can replace both CSS (as performed in Jones et al. 2012 andKingman et al. 2020) and FST.

| Restricted maximum likelihood (REML) analyses
Linear mixed models can be used to control for hidden population and/or family structure in the data by including the relatedness among individuals as a random effect. Restricted maximum likelihood (REML)-based approaches have been widely used to evaluate the regression parameters and variance components in such models (Kang et al. 2010;Kang et al. 2008). EMMAX (Kang et al. 2010) estimates the variance components once based on an intercept model and subsequently fixes them to evaluate the effect and statistical significance of the SNPs. This approach is preferred over its predecessor due to its computational efficiency while maintaining both power and the ability to control for false positives (Kang et al. 2010;Kang et al. 2008). Association tests here are based on the coordinates of the first PC extracted from PCA (i.e. the SMLAs) on all loci from a given LDcluster (Li et al. 2018;Fig. 1c Given uncertainties in low coverage whole genome sequence data, we obtained the relationship matrix A (based on all available loci) in a probabilistic framework from individual allele frequencies estimated iteratively based on inferred population structure from genotype likelihoods using the software PCAngsd (Korneliussen et al. 2014).

| Correction for multiple testing by permutation
Although computationally much more intensive, an alternative but potentially less conservative alternative to the "false discovery rate" method to control the family-wise error (FWER) is by permutation that accounts for the correlation structure among the multiple tests (Efron 2010;Joo et al. 2016;Li et al. 2018). This method is particularly useful in combination with LD-based complexity reduction, where the total number of tests can in many cases substantially be lowered, as demonstrated by Li et al. 2018. To correctly account for the among individual correlation structure in the data we generated phenotypic data under the null-hypothesis by drawing phenotypic data from a multivariate normal distribution θ0 and then using EMMAX to calculate the test-statistics on each sample. The resulting empirical distributions of the test statistics were then used to adjust p-values to control the multiplicity as in Li et al. 2018. Since we are testing for associations between loci and ecotype (treated as a binary trait), we treated the phenotypes as threshold traits, with the number of freshwater and marine ecotypes equal to that observed in the real data.

| Genotype likelihood estimation
In the step of quality filtering in ANGSD, we retained only bases with a minimum Phred score of 20 and a minimum mapping quality of 25 (-minQ 20 and -minMapQ 25). We retained sites with less than 20% missing data and a minimum read depth of two for each individual following Fang et al. 2020 to account for low sequence-depth data and computed genotype likelihoods for variants that had a p-value smaller than 1e-6 (-SNP_pval 1e-6). The sex chromosomes were excluded (chromosome 19 [group XIX] in the three-spined stickleback; Kitano et al. 2009;Natri et al. 2013;chromosome 12 [LG 12] in the nine-spined stickleback; Natri et al. 2019;Rastas et al. 2015;Shapiro et al. 2009).

| Genetic diversity and differentiation
For  and θ, site allele frequency likelihoods and the site frequency spectrum (SFS) were estimated for each population in ANGSD. The summary statistics of  and θ for each population were calculated using a custom R scripts from Momigliano et al. 2021. Scripts To calculate FST in each species, genotypes were called in ANGSD (-doVcf 1) with additional filtering on the genotype posterior probability (-postCutoff 0.95). VCFtools v.0.1.15 (Danecek et al. 2011) was used for the downstream SNP filtering, setting maximum missing genotypes to 20% (--max-missing 0.8), a minimum minor allele frequency to 0.05 (--maf 0.05) and minimum distance between any two sites to 30,000 bp (--thin 30,000). We then used this subset of high quality, called genotypes to estimate global FST for each ecotype and the pairwise FST between all populations with the R packages hierfstat (Goudet 2005) and StAMPP (Pembleton et al. 2013), respectively.
For nine-spined sticklebacks, the first calibration was the time to the most recent common

| Genetic diversity and divergence in the marine-freshwater divergent genomic regions
To see whether genomic regions under selection have more ancient origins than neutral regions, we tested whether the absolute genetic divergence (dXY, Nei 1987) in the genomic regions under selection exceeds that in the rest of the genome. All freshwater populations considered are the results of postglacial invasions, and hence mean dXY between freshwater and geographically close marine populations is expected to be low, i.e. to approximate  from the ancestral population (see discussion in Ravinet et al. 2017). If selection is acting on a novel or rare variant, we can expect a high peak in FST but not an increase in dXY, as allelic differentiation would be a result of reduced  in the population experiencing selection (i.e. a selective sweep; Cruickshank and Hahn 2014). In contrast, if selection is acting on ancient haplotypes, a peak in FST is likely to be driven by an increase in dXY, rather than a decrease in  (e.g. Nelson and Cresko 2018). Comparing dXY in regions underlying parallel evolution with the rest of the genome could provide information on the evolutionary origin of the SGV under selection, but such comparison is not straightforward. Like other measures of diversity ( and θ), dXY is strongly affected by the heterogeneity in background selection, recombination rates and mutation rates across the genome (Charlesworth et al. 1993). Since the processes that govern diversity levels within genomes (background selection, mutation rate and recombination rate variation) are conserved between closely related populations (and species), different measures of diversity ( and dXY) are correlated across the genome of closely related populations.
Dutoit et al. 2017 demonstrated this correlation is maintained even after lineage sorting is complete, suggesting shared ancestral variation cannot explain this pattern.
Here we take advantage of this correlation to detect whether genomic regions under selection show excess absolute divergence. We start from the assumption that marine populations within the same regions are proxies of ancestral diversity. For each genomic region involved in parallel evolution we then run a linear model where marine is the predictor of dXY across the genome, including 100 random genomic regions as well as the region under selection. This yields a distribution of residuals (dXY), where a dXY of 0 represents the dXY predicted by the ancestral diversity. This is conceptually similar to Nei's measure of relative divergence -da=dXY-(x y)/2, (Nei and Li 1979)but with the difference that we use  of the marine population as a proxy of ancestral diversity rather than the mean of  between both populations. Furthermore, the residuals are cantered so that positive and negative values represent higher and lower relative divergence than the mean across all regions. We use the random 100 neutral genomic regions to infer the confidence intervals for the neutral distribution, and then compare to dXY of the target region. We use a similar approach to detect whether regions under selection show lower diversity in the freshwater populations than the rest of the genome (i.e. if peaks in FST are caused by selective sweeps). As for dXY, we run a linear regression between freshwater and marine (lm freshwater ~ marine) based on the same 101 genomic regions. We refer to the residuals of the model as . A negative  in the genomic region under selection would be suggestive of a reduction in  relative to the expected difference in  between the two species.
Estimates of genetic diversity ( and dXY) were calculated as follows. from 1000 randomly chosen LDna-1 clusters from the three-and nine spined sticklebacks data sets testing for associations between SMLAs (PC1-coordinates from LD-clusters) and ecotype either by (y-axis) EMMAX, treating ecotype as a binary trait and not including any co-factors or random effects or by (x-axis) permutation (n=10 5 ). In both species, the relationship is close to unity, with slightly larger variance for nine-spined sticklebacks. instance, the most "successful" |E|min settings were |E|min=20 for three-spined but |E|min=10 for nine-spined sticklebacks (left panel). Similarly, the most successful Corth value for three-spined sticklebacks was Corth=10, while this parameter had less influence for ninespined sticklebacks (right panel). Both reflect the fact that outlier regions affect a larger number of correlated loci in three-compared to nine-spined sticklebacks and that a given parameter setting can be more successful (in detecting significant outlier regions) in one species compared to another.   between populations within genetic clades in nine-spined sticklebacks compared to threespined sticklebacks. Genetic differentiation is higher in nine-spined sticklebacks whether we consider freshwater-freshwater, marine-freshwater or marine-marine comparisons.
Note that data for all ecotype contrasts were not available for all clades.   d. e. c.

Marine Populations
Freshwater Populations

Populations in the Comparative SNAPP Analyses
Population ID Population ID in the Fig. 3 Population ID Population ID in the Fig. 3 Fang et al. (2020). Region: matches outlier region in Figure 5 & 6, main text Fst (Q95%) = 95% quantile among loci that mapped to the outlier regions from significant LD-clusters

Range=size of outlier genomic region in Bps
P_min=the lowest p-value among LD-clusters that mapped to this region Subsequent column names should be interpreted as follows: cor: correlation coefficient P: P-value from EMMAX A: relatedness was accounted for unrl: individuals assumed unrelated GC: genomic control was used to account for residual p-value inflation (otherwise permutation was used to assess significance) Note that lineage was always included as co-factor for nine-spined sticklebacks

Supplementary Information 1 | Methodological considerations
It is generally believed that strong genetic drift makes it difficult to detect signatures of parallel evolution (Galloway et al. 2020;Hoban et al. 2016; Matthey-Doret and Whitlock 2019) and therefore the reported differences between three-and nine-spined sticklebacks in parallel evolution could be argued to be an artefact of the inherent difficulty of detecting outlier loci among highly differentiated populations. However, both LDna-complexity reduction as well as using EMMAX to test for associations between SMLAs and ecotype are likely to mitigate the many shortcomings that otherwise characterise e.g. FST-based genome scan approaches (e.g. Bierne et al. 2011;Excoffier et al. 2009;Pritchard and Di Rienzo 2010). In our analyses, EMMAX produced test statistics that were highly correlated with both FST and CSS (we know already that CSS is correlated with FST; Jones et al. 2012), but importantly only when relatedness among individuals or other co-factors were not explicitly accounted for. P-value inflation in our analyses was high (up to λ~2), which was strongly reduced when relatedness was included as a random effect. However, both accounting for relatedness and correcting for p-value inflation (using GC) reduced the power to detect significant associations considerably, such that it was futile to find any outlier genomic regions without also using LDna-complexity reduction which allowed much fewer conservative corrections for multiple testing. Estimating and correcting for p-value inflation caused by relatedness is unfortunately not common practice in evolutionary genetic studies. However, nor is complexity reduction. It is thus possible that the increased power from p-value inflation roughly have counterbalanced the reduced power from unnecessarily conservative corrections for multiplicity in many previous studies. However, without re-analyses of these data sets, it is impossible to know to what extent previously detected outlier regions could in fact be artefacts of relatedness and p-value inflation.
We know that population structuring increases background differentiation in statistics estimating genetic differentiation between populations (Galloway et al. 2020;Hoban et al. 2016; Matthey-Doret and Whitlock 2019) and can also increase p-value inflation in genome wide association (GWA)-analyses when phenotypes are confounded by relatedness (Kang et al. 2010;Kang et al. 2008). There is thus reason to believe that these two observations are not independent, i.e. whenever background differentiation is high, we can also expect p-value inflation due to relatedness. Importantly, however, accounting for relatedness can also increase power in GWA-studies (Kang et al. 2010;Kang et al. 2008).
e.g. when multiple divergent populations segregate for the same causal variants for a given phenotype. This can also be expected when multiple populations in similar habitats display high frequencies of the same genetic variants (i.e. parallel evolution); the more divergent the populations are, the stronger will the contrast be between the neutral genetic background (reflecting relatedness) and the genomic regions under selection. For the outlier regions detected here, the overall effect sizes in three-spined sticklebacks differed when including relatedness as a random effect (0.38) compared to when relatedness was not accounted for (0.28) but remained roughly the same for nine-spined sticklebacks (0.29 vs. 0.27, respectively). Since p-value inflation (that reflect biases due to relatedness) in three-spined sticklebacks was higher (λ=1.95) compared to nine-spined sticklebacks (λ=1.73; when including lineage as co-factor), there is no reason to suspect that our methods lacked power to detect outlier regions in nine-spined sticklebacks across comparable levels of effect sizes in both species. Despite the fact that marine and freshwater individuals were pooled across all geographic regions in our analyses, no outlier region showed high effect sizes across all geographic regions. This demonstrates that, as long as effect sizes are sufficiently high overall, our methods also have sufficient power to detect regional parallelism (in both species). This is no different from detecting significant genotype-phenotype associations in GWA-studies for phenotypes that only segregate in only one or few of the populations included in the study.
While p-values are important, they tell very little about effect sizes which are much more likely to reflect the extent to which a given genomic region is affected by parallel evolution.
While 26 and nine outlier genomic regions were significant in three-and nine-spined sticklebacks respectively, they were to different degrees sensitive to parameter settings and varied considerably in their effect sizes. These are all important metrics to consider when determining how significant a given genomic region harbouring freshwater adapted alleles are from an evolutionary/ecological perspective. However, while we can safely conclude that the marine-freshwater differentiated regions in three-spined sticklebacks outnumber such regions in nine-spined sticklebacks across a wide range of parameter/threshold values and correction methods, more detailed assessments of these outlier regions are outside the scope of this study.