BlockFeST: Bayesian calculation of region-specific FST to detect local adaptation

Pfeifer, Bastian; Lercher, Martin J

doi:10.1093/bioinformatics/bty299

Abstract

Summary

The fixation index F_ST can be used to identify non-neutrally evolving loci from genome-scale SNP data across two or more populations. Recent years have seen the development of sophisticated approaches to estimate F_ST based on Markov-Chain Monte-Carlo simulations. Here, we present a vectorized R implementation of an extension of the widely used BayeScan software for codominant markers, adding the option to group individual SNPs into pre-defined blocks. A typical application of this new approach is the identification of genomic regions, genes, or gene sets containing SNPs that evolved under directional selection.

Availability and implementation

The R implementation of our method, which builds on the powerful population genetics and genomics software PopGenome, is available freely from CRAN.

Supplementary information

Supplementary data are available at Bioinformatics online.

1 Introduction

Genomic analyses that calculate F_ST values frequently aim to separate neutral contributions to F_ST caused by population histories from locus-specific contributions due to non-neutral evolution. However, different populations have different histories and are influenced by different factors that affect mutations, genetic drift, and migration (Kronholm et al., 2010). As a consequence, the neutral distribution of F_ST values strongly depends on the organisms and populations considered. The separation between non-neutral and neutral (population history-based) F_ST contributions can be made explicit in Bayesian approaches that employ Markov-Chain Monte-Carlo (MCMC) methods to estimate F_ST (Beaumont and Balding, 2004; Foll and Gaggiotti, 2008; Villemereuil and Gaggiotti, 2015). The BlockFe_ST method introduced in this paper is based on the work of Beaumont and Balding (2004), which establishes an F_ST based hierarchical Bayesian model to detect loci that evolve non-neutrally. This Bayesian approach uses a logistic regression model to distinguish between locus-specific effects (selection) and population-specific effects (reflecting the common history of all loci). Foll and Gaggiotti, 2008, extended this work through a reversible jump MCMC method (Green, 1995), which calculates an explicit posterior probability for non-neutral evolution at each locus (implemented in BayeScan, http://cmpg.unibe.ch/software/BayeScan). Whereas these previous algorithms allowed the analysis of either haplotypes or of individual single nucleotide polymorphisms (SNPs), we propose to group neighboring SNPs that likely experienced the same evolutionary pressures into blocks, e.g. by gene, by gene set or by sliding window. This joint analysis of co-evolving SNPs increases the statistical power of the approach, allowing us to more reliably detect directional population-specific selection.

2 Materials and methods

To identify non-neutrally evolving blocks of DNA from SNP data, BayeScan (Foll and Gaggiotti, 2008) allows two possible strategies: either to consider each SNP as an independently evolving unit, or to consider haplotypes that combine all SNPs for one block of DNA. Considering each SNP separately may lead to statistical problems due to non-independence, and ignores prior knowledge (the strong linkage of SNPs within one co-evolving block). Conversely, using haplotypes can produce misleading results at moderate to high mutation rates, as each mutation introduces a new haplotype (Hudson, 1992; Riebler et al., 2008). To combine the strengths of both approaches, we use a slightly modified logistic regression model to estimate region-specific F_ST:

log (\frac{F_{S T}^{i j}}{1 - F_{S T}^{i j}}) = α_{I} + β_{j}

(1)

Here, $F_{S T}^{i j}$ measures the differentiation between subpopulation j and the ancestral population at locus i within region I. α_I is a parameter specific to region I (capturing region-specific deviations from neutral evolution), while β_j is a parameter specific to population j (capturing locus-independent effects, i.e. consequences of population history). We consider the likelihood contribution of each SNP separately, but assume that the locus-specific effect α_I is identical for all SNPs located in the same DNA block I. This means that also in the jump model, all SNPs in one block would be classified as evolving either neutrally (⁠ $α_{I} = 0$ ⁠) or non-neutrally (⁠ $α_{I} \neq 0$ ⁠). Note that in this modified model, the ancestral population is still modelled as a Dirichlet distribution that considers the allele frequencies of each SNP independently. Recent studies have shown that the reversible jump model tends to produce false positives, particularly for balancing selection (Lotterhos and Whitlock, 2014; Luu et al., 2017). Thus, we restrict the model to the detection of positive (directional) selection. Moreover, in previous simulations (Pfeifer, 2014), we found that the F_ST values obtained after the initial pilot runs used by BayeScan to tune the reversible-jump model are almost as informative as the corresponding posterior values. Pilot runs are initial MCMC iterations (without the reversible jump model) to estimate the distribution of the model parameters α and β. Thus, we propose a simple sampling scheme to verify significant outlier loci subject to local adaptation based on the distribution of the α_I values observed after the MCMC iterations (approximated through a region-specific normal distribution $N (α_{I})$ ⁠) without using a time consuming reversible jump model for testing the null hypotheses: (i) For each region I, sample two values $x_{I}, y_{I} \sim N (α_{I})$ ⁠, resulting in two distributions of sampled values D_x and D_y across regions. (ii) For each region I, increment its counter c_I if y_I is above the q-quantile for D_x. (iii) Repeat (i–ii) N_iter = 1000 times.

3 Simulation and results

To validate our algorithm and to compare it to alternative approaches, we performed extensive simulations using the MSMS program (Ewing and Hermisson, 2010), which allows coalescent simulations for a structured population under selection. We assume a population with an effective population size of N_e=10 000 that split into two subpopulations 4000 generations ago. After the splitting event, there is no migration between the two populations (island model). To evaluate the performance of the methods to detect selection in cases of variable pairwise population differentiation, we also study a three-population model including an ancestral population (divergence mode) (see Supplementary Material for details and the MSMS calls). We assessed the performance of BlockFe_ST and compared it to those of BayeScan (Foll and Gaggiotti, 2008) applied to individual SNPs, F_ST (Hudson, 1992), and the recently published PCA method implemented in the R-package pcadapt (Luu et al., 2017) (see Supplementary Material). For pcadapt and BayeScan, we used the P-values and calculated the sum of logs for each region to compare it to F_ST and BlockFe_ST.

We explored a wide range of simulation scenarios, varying recombination rates, selection strengths and coalescent times. Specifically, we studied the robustness of the approaches regarding the effect of neutrality eroding the signal of selection (Supplementary Figs S3–S4 and S8–S9). While F_ST and pcadapt are the computationally fastest algorithms (Supplementary Table S1), we found that BlockFe_ST is most powerful and accurate in almost all cases, especially when the signal of local adaptation is not yet eroded by recombination, i.e. in cases of recent positive selection (Fig. 1 and Supplementary Figs S1–S9). For the divergence model, pcadapt and BlockFe_ST perform comparably well. While the measure of precision shows slightly better results for pcadapt in the divergence model simulations, we obtain higher AUC values for BlockFe_ST also here, indicating a better balance between sensitivity and specificity (Fig. 1B and Supplementary Figs S4–S9). BayeScan produces elevated P-values, falsely pointing to balancing selection even when no such signal is present, confirming the results in Lotterhos and Whitlock, 2014. This behaviour highlights the tendency of the jump model to produce false positives especially in cases of hierarchical population structures. Despite its simplicity, F_ST performs well in comparison to the computationally more involved alternatives, with good detection of intermediate signals of positive selection. Overall, we conclude that BlockFe_ST, which groups SNPs into blocks and calculates empirical P-values, provides faster and more reliable detection of region-specific directional selection compared to the standard application of BayeScan, and performs favourably also compared to alternative approaches.

Fig. 1.

$(A) Island model. BlockFeST, pcadapt, FST, and BayeScan are compared in terms of precision, defined as the fraction of true positives recovered in the top 5% of predictions across varying recombination rates. (B) Divergence model. ROC curves for a three-population model, where population 1 and 2 split 4000 generations ago, when directional selection set in, after having split from the ancestral population 8000 ago; here, the recombination rate is set to zero. For a summary of additional simulation results, see Supplementary Figures S1–S9$

Open in new tab Download slide

(A) Island model. BlockFe_ST, pcadapt, F_ST, and BayeScan are compared in terms of precision, defined as the fraction of true positives recovered in the top 5% of predictions across varying recombination rates. (B) Divergence model. ROC curves for a three-population model, where population 1 and 2 split 4000 generations ago, when directional selection set in, after having split from the ancestral population 8000 ago; here, the recombination rate is set to zero. For a summary of additional simulation results, see Supplementary Figures S1–S9

Acknowledgements

We thank Pablo E. Verde, Gregory Ewing, Stefan Habenschuss, Laura Rose and Juliette de Meaux for helpful discussions.

Funding

M.J.L. acknowledges financial support from the German Research Foundation (DFG grants EXC 1028 and CRC 680).

Conflict of Interest: none declared.

References

Beaumont

M.A.

,

Balding

D.J.

(

2004

)

Identifying adaptive genetic divergence among populations from genome scans

.

Mol Ecol

.,

13

,

969

–

980

.

Ewing

M.

,

Hermisson

J.

(

2010

)

MSMS: a coalescent simulation program including recombination, demographic structure and selection at a single locus

.

Bioinformatics

,

26

,

2064

–

2065

.

Foll

M.

,

Gaggiotti

O.

(

2008

)

A genome-scan method to identify selected loci appropriate for both dominant and codominant markers: a Bayesian perspective

.

Genetics

,

180

,

977

–

993

.

Green

P.J.

(

1995

)

Reversible jump Markov chain Monte Carlo computation and Bayesian model determination

.

Biometrika

,

82

,

711

–

732

.

Google Scholar

Crossref

WorldCat

Hudson

R.R.

et al. (

1992

)

Estimating of levels of gene flow from DNA sequence data

.

Genetics

,

13

,

583

–

589

.

Google Scholar

OpenURL Placeholder Text

WorldCat

Kronholm

I.

et al. (

2010

)

Influence of mutation rate on estimators of genetic differentiation: lessons from Arabidopsis thaliana

.

BMC Genet

.,

11

,

33

.

Lotterhos

K.E.

,

Whitlock

M.C.

(

2014

)

Evaluation of demographic history and neutral parameterization on the performance of FST outlier tests

.

Mol. Ecol

.,

23

,

2178

–

2192

.

Luu

K.

et al. (

2017

)

pcadapt: an R package to perform genome scans for selection based on principal component analysis

.

Mol Ecol Resour

.,

17

,

67

–

77

.

Pfeifer

B.

(

2014

) Whole-genome population genomic analyses. Dissertation, Landesbibliothek der Heinrich-Heine-Universitaet Duesseldorf.

Riebler

A.

et al. (

2008

)

Bayesian variable selection for detecting adaptive genomic differences among populations

.

Genetics

,

178

,

1817

–

1829

.

Villemereuil

P.

,

Gaggiotti

O.

(

2015

)

A new FST-based method to uncover local adaptation using environmental variables

.

Methods Ecol Evol

.,

6

,

1248

–

1258

.

Google Scholar

Crossref

WorldCat

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

Associate Editor:

Download all slides

Month:	Total Views:
April 2018	25
May 2018	50
June 2018	25
July 2018	13
August 2018	4
September 2018	132
October 2018	58
November 2018	25
December 2018	19
January 2019	24
February 2019	15
March 2019	16
April 2019	52
May 2019	17
June 2019	18
July 2019	27
August 2019	21
September 2019	77
October 2019	99
November 2019	104
December 2019	66
January 2020	82
February 2020	71
March 2020	67
April 2020	65
May 2020	32
June 2020	58
July 2020	68
August 2020	36
September 2020	35
October 2020	53
November 2020	47
December 2020	41
January 2021	29
February 2021	40
March 2021	50
April 2021	23
May 2021	24
June 2021	33
July 2021	31
August 2021	31
September 2021	21
October 2021	40
November 2021	36
December 2021	11
January 2022	29
February 2022	8
March 2022	25
April 2022	30
May 2022	36
June 2022	25
July 2022	33
August 2022	52
September 2022	53
October 2022	39
November 2022	21
December 2022	27
January 2023	15
February 2023	17
March 2023	27
April 2023	24
May 2023	7
June 2023	11
July 2023	8
August 2023	12
September 2023	22
October 2023	15
November 2023	16
December 2023	18
January 2024	25
February 2024	8
March 2024	22
April 2024	19

Article Contents

BlockFe_ST: Bayesian calculation of region-specific F_ST to detect local adaptation

Abstract

1 Introduction

2 Materials and methods

3 Simulation and results

Acknowledgements

Funding

References

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Looking for your next opportunity?

Article Contents

BlockFeST: Bayesian calculation of region-specific FST to detect local adaptation

Abstract

1 Introduction

2 Materials and methods

3 Simulation and results

Acknowledgements

Funding

References

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Looking for your next opportunity?

This Feature Is Available To Subscribers Only

BlockFe_ST: Bayesian calculation of region-specific F_ST to detect local adaptation