Genomic Signatures of Recent Adaptation in a Wild Bumblebee

Abstract Environmental changes threaten insect pollinators, creating risks for agriculture and ecosystem stability. Despite their importance, we know little about how wild insects respond to environmental pressures. To understand the genomic bases of adaptation in an ecologically important pollinator, we analyzed genomes of Bombus terrestris bumblebees collected across Great Britain. We reveal extensive genetic diversity within this population, and strong signatures of recent adaptation throughout the genome affecting key processes including neurobiology and wing development. We also discover unusual features of the genome, including a region containing 53 genes that lacks genetic diversity in many bee species, and a horizontal gene transfer from a Wolbachia bacteria. Overall, the genetic diversity we observe and how it is distributed throughout the genome and the population should support the resilience of this important pollinator species to ongoing and future selective pressures. Applying our approach to more species should help understand how they can differ in their adaptive potential, and to develop conservation strategies for those most at risk.

Behavioral experiments and analyses of observation records have shown that pesticide use, habitat fragmentation, emerging diseases, and climatic change threaten insect pollinators including bees (Brown and Paxton 2009;Vanbergen 2013;Ollerton et al. 2014;Goulson et al. 2015). Despite the resulting risks for agricultural yields and for ecosystem stability, we know little about how wild insects may adapt to such environmental pressures. Similarly, we understand relatively little about how genomes are shaped by selection in the wild (Harrisson et al. 2014). If a species has adapted in response to a detrimental environmental pressure, then we should see changes in the alleles of genes involved (Hohenlohe et al. 2010;Ellegren 2014) (fig. 1A). Analyzing genomes of many individuals can identify such changes and reveal the constraints and adaptive potential of species (Chen et al. 2018;van Klink et al. 2020). The resulting knowledge should support conservation efforts and practices (Supple and Shapiro 2018).
The annually reproductive bumblebee Bombus terrestris is ideal for understanding how coping with recent rates of environmental change is possible because, unlike many other pollinators, it has shown little evidence of population decline (Ollerton et al. 2014). Furthermore, because male bumblebees are haploid, their genome sequences are unambiguous and intrinsically phased, providing more analytical power than the diploid genomes of female bumblebees and of many other insects. To understand which bumblebee genes and molecular processes underlie responses to recent selective pressures, we sequenced the genomes of male B. terrestris bumblebees from across Great Britain. We subsequently identified and characterized the genomic regions showing the strongest signs of recent adaptive evolution in this population. Our characterization of the amount and distribution of genetic diversity throughout the bumblebee genome provides measures of the genetic health of this species and highlights genes and processes through which it has recently adapted.
To understand whether population structure constrains adaptation in this species, we performed identity-by-state (IBS) and co-ancestry-based analyses. These analyses indicate that our data set represents one population ( fig. 1C and supplementary figs. S1 and S2, Supplementary Material online). Similarly, although the second principal component correlates with latitude (Pearson's r ¼ 0.8, fig. 1C and supplementary fig. S3, Supplementary Material online), individual principal components explained at most 2.46% of genetic variation. In line with the relative absence of barriers to gene flow in Great Britain, there is sufficient gene flow for these bees to be considered as one panmictic population, implying that new alleles have the potential to readily spread. The weak substructure of British B. terrestris is supported by studies using fewer markers (Schmid-Hempel et al. 2007;Lye et al. 2011;Moreira et al. 2015). Our result also indicates that no subset of our samples has the type of large-scale differentiation that could be expected from a cryptic subspecies.
Selection Is Fine-Tuning Functional Regions throughout the Bumblebee Genome We used two approaches to identify signatures of recent selection in the genome. First, we identified large "hard" sweep regions, where selection on an allele can lead to haplotype fixation (Berry et al. 1991). For this, we identified genomic segments longer than 100,000 nucleotides with significantly lower nucleotide diversity than the rest of the genome (zscore < À2r). We found 90 such segments. Our second approach detected more localized signatures of selection, and "soft" sweeps, where two or more haplotypes are at high frequency. This can occur, for example, when strong selection on a new mutation occurs after a first allele reaches fixation, or when selection favors different alleles in different habitats (Hermisson and Pennings 2005). For this, we determined for each SNP the metric jnS L j ("number of segregating sites by length"), a measure of haplotype homozygosity that is robust to variation in recombination and mutation rates; an jnS L j score greater than 2 is considered evidence of recent selection (Ferrer-Admetlla et al. 2014;Szpiech and Hernandez 2014). The 10,132 SNPs with the highest 1% of jnS L j scores have particularly strong signatures of recent selection (jnS L j>2.56, i.e., տ3 standard deviations from the mean; fig. 2A and supplementary table S2, Supplementary Material online) and are typically within 300 nucleotides of SNPs with low jnS L j scores. This indicates that recombination rapidly breaks down haplotypes in this species and that selection has acted in fine-tuned manners. The SNPs with high jnS L j scores are mostly in genic regions (79%; P < 10 À15 ) and, within these regions, are similarly represented in coding and noncoding sequence (P > 0.05), indicating that selection recently acted on protein function and on gene regulation.
To understand which types of biological functions were under the strongest recent selection pressures, we inspected annotations of genes with the strongest jnS L j scores and performed rank-based analyses of Gene Ontology and InterPro descriptions of all bumblebee genes (Bonferroni adjusted P < 0.05; fig  Each dot represents one SNP; labeled dots represent the SNP with highest jnS L j score for genes of interest, including transcription factors, insecticide susceptibility genes, and a Wolbachia-like gene, with high jnS L j scores. Labels indicate Flybase gene symbols when clear Drosophila orthology exists, otherwise, the NCBI gene symbol is provided. Blue and purple horizontal dashed lines, respectively, indicate the 1st percentile of overall jnS L j scores and 10th percentile of genic jnS L j scores. (B) Distributions of jnS L j scores show that most SNPs are in genic regions, and that most jnS L j scores are consistent with neutral or purifying rather than directional evolution as 96% of SNPs have jnS L j < 2. (C) Diverse Gene Ontology terms are enriched in genes with high jnS L j scores (Àlog10 transformed Bonferroni-adjusted P values). Terms associated with roles in neurology and transcription factor activity are, respectively, highlighted in bold and bold italics. The total number of annotated genes for each term is in parentheses. online). In particular, the gene with the strongest evidence of recent selection is the B. terrestris ortholog to the schnurri gene (jnS L j¼5.14). In Drosophila, this transcription factor regulates embryonic patterning and wing patterning through the Decapentaplegic pathway (Torres-Vazquez et al. 2000). Another transcription factor, the ortholog to the ventral veins lacking gene (vvl), has the ninth highest jnS L j score (jnS L j¼4.54). In Drosophila, vvl is involved in steroid biosynthesis and embryonic brain development (Meier et al. 2006;Danielsen et al. 2014), and intriguingly also interacts with the Decapentaplegic pathway to affect wing imaginal disc development (Certel et al. 2000) and vein patterning (de Celis et al. 1995). These results, together with "wing morphogenesis" being the eighth most overrepresented Gene Ontology description among genes under selection, suggest that there was strong recent selection on wing structure. Such selection could be linked to recent changes to foraging or flight patterns (Memmott et al. 2007;Miller-Struttmann et al. 2015), because climatic changes modified the physical constraints of flying (Corbet et al. 1993), or perhaps in response to pathogens, such as the deformed wing virus, which can cause extensive wing abnormalities in infected individuals (Genersch et al. 2006).

Strong Selection Acting on Genes Involved in Bumblebee Neurobiology
Neurological genes were overrepresented among genes with the highest jnS L j scores ( fig. 2C). In line with this, 4 of the 30 genes with the highest jnS L j scores have potential roles in neurotransmission (gamma-aminobutyric acid receptor alpha-like; jnS L j¼4.44, glutamate receptor ionotropic kainate 2; jnS L j¼4.39), axon guidance (Down Syndrome cell adhesion molecule 2; jnS L j¼4.62), and memory formation (neurotrimin; jnS L j¼4.1). Furthermore, selection on G protein-coupled receptor signaling in B. terrestris mirrors previous analyses on honeybees (Harpur et al. 2014;Wallberg et al. 2014;Avalos et al. 2017). These receptor targets of hormones, pheromones, and neurotransmitters are thus long-term targets of selection in social bees, potentially for roles responding to social or to environmental cues (Hauser et al. 2006). Selection on neurological genes could, for example, be linked to the need to improve complex cognitive and social behaviors of bumblebees (Loukola et al. 2017), for remembering increasingly complex foraging routes due to patchier habitats, or for neurological changes because of exposure to neurotoxins.
Positive Selection on a Gene Horizontally Transferred from Wolbachia We used similarity searches to understand the potential functions of uncharacterized genes among the genes with the twenty highest jnS L j scores. This showed that the gene with the 16th-strongest signature of selection (LOC105666162;  (Papafotiou et al. 2011); based on its amino acid sequence and expression profiles, this gene is a strong candidate for driving mechanisms of cytoplasmic incompatibility (Metcalf et al. 2014). One could speculate that LOC105666162 contributes to the lack of Wolbachia in bumblebees.
To test whether the characteristics of this region are specific to B. terrestris, we identified orthologous regions in another bumblebee species Bombus impatiens and in the honeybee Apis mellifera. The orthologous region in B. impatiens contains 52 of the 53 genes and similarly has lower diversity (p$1.7 Â 10 À4 ) than other regions (genome-wide average p ¼ 1.2 Â 10 À3 ; t df¼18 ¼ À22.8, P < 10 À15 ). Orthology to the honeybee is split between two regions separated by 4.4 Mb, indicating that rearrangements have occurred because the common ancestor of bumblebees and honeybees existed at least 78 Ma. For both regions, honeybee populations had at least 13-fold lower nucleotide diversity than the rest of the genome (region 1: t df¼19.328 ¼ À98.9, P < 10 À15 ; region 2: t df¼15.8 ¼ À65.2, P < 10 À15 ; fig. 3). These patterns indicate that an intrinsic long-standing process is responsible for the low genetic diversity of these regions in bees. Although the regions include genes that are likely under strong purifying selection (supplementary table S7, Supplementary Material online), no particular gene annotation was overrepresented which could help interpretation. Unlike the rest of the genome, the lack of genetic diversity in this large region suggests that bees will have limited ability to adapt to selection pressures involving the genes it contains.

Selection on Potential Insecticide Susceptibility Genes
Selection for resistance to neurotoxic pesticides can lead to changes in expression or sequence of target receptors or detoxification enzymes in insect pests (Ffrench-Constant 2013). Because bumblebees can be exposed to pesticides when foraging on crops, and given the extensively documented detrimental effects of pesticide exposure on bumblebee health (Vanbergen 2013;Goulson et al. 2015), we preliminarily examined signatures of recent selection in genes for which orthologs in other species are known to be involved in responses to insecticide exposure. Four target receptors of insecticides were among the 10% of genes with the highest jnS L j scores: three nicotinic acetylcholine receptor subunits which are targets of neonicotinoid insecticides (nAChR1a, nAChR6a, nAChR7a; jnS L j>3.12 for all; supplementary table S8, Supplementary Material online), and metabotropic glutamate receptor 2 (jnS L j¼3.59), a target of the natural plant toxin L-canavanine (Mitri et al. 2009). Mutations in orthologs of two of the nicotinic acetylcholine receptor subunit genes confer resistance to neonicotinoid exposure in other species (Puinean et al. 2013;Shimada et al. 2020 fig. S7, Supplementary Material online). We also found strong signatures of selection for 42 genes that are differentially expressed in bumblebees after exposure to neonicotinoid pesticides (Bebane et al. 2019;Colgan et al. 2019; supplementary table S9, Supplementary Material online). There was no overlap between these 42 genes and those previously identified as having a role in insecticide resistance. Future research will help pinpoint the reasons for the patterns we observe, and whether some of the recent changes may reduce susceptibility to toxins naturally present in pollen and nectar, or to synthetic pesticides.

Discussion
Human-induced environmental changes add to longstanding ecological and evolutionary challenges faced by wild animals. Identifying potential causes of pollinator declines has to date relied on inferences from laboratory experiments or on correlative associations in the field. Our study takes an important step toward understanding the bases of resilience of an important pollinator species by uncovering signatures left in the organism's genetic "blueprint" in response to selective pressures. The strong signatures of selection we find at loci distributed throughout the B. terrestris genome are consistent with the view that insect pollinators face many different pressures. Environmental pressures likely contributed to recent changes that occurred in B. terrestris genes underlying physiology, neurology, and wing development.
Bumblebees also face intra-and interspecies pressures including, competition for food, habitat, and mates, and pressures from predators, pathogens, and parasites. Large-scale gene expression and functional genomic data sets are only beginning to be produced for bumblebees (L opez-Osorio and Wurm 2020) and will be crucial for disentangling how the specific changes we observed affect phenotypes and fitness. Similarly, historical sampling of museum specimens could help characterize changes over time in morphology, population structure, and allele frequencies.
The fine tuning of adaptive responses in B. terrestris is highlighted by our finding of strong signatures of selective sweeps within few nucleotides of neutrally evolving loci. Several characteristics of this species likely facilitate this fine tuning. Crucially, the high recombination rate (Liu et al. 2017) and social lifestyle of B. terrestris mean that one queen can produce hundreds of haploid males, encompassing a broad diversity of allelic combinations. These males are fully exposed to the environment as they spend weeks foraging and trying to attract a mate (Wolf et al. 2012). Male bees are also subject to haploid selection, which should lead to faster adaptation than in diploid species (Meisel and Connallon 2013;Pracana et al. 2021). Furthermore, the broad gene flow and large population size of B. terrestris enables the maintenance of large amounts of genetic diversity and the rapid spread of adaptive alleles. Although our data would be unable to detect slight changes in population size over the past century, our data and analyses support the absence of major recent population bottlenecks in this species. Future comparisons with sister species including those that are declining will clarify whether B. terrestris may have additionally harbored a generalist genetic toolkit further predisposing it to resilience.
We show that locating recent signatures of selection throughout the genome can indicate which genes and pathways changed for B. terrestris to adapt in response to the pressures it has faced. Furthermore, the amount and distribution of genetic diversity we observed throughout its genome suggest that this bumblebee species maintains an ability to respond to future pressures. Our work in this bumblebee species complements recent efforts in vertebrates and model systems. Future comparative genomic studies with other social and solitary pollinators will improve our ability to disentangle why species differ in their resilience to recent environmental changes. Additionally, scaling up our approach will enable the creation of frameworks for predicting detailed responses to environmental challenges for entire ecological networks. Overall, although insect declines are worrying, we show how at least one common pollinator is adapting.

Materials and Methods
Bumblebee Collection, DNA Extraction, and Sequencing In the summer of 2014, we collected up to two males from each of 28 sites, with each site being >20 km from the nearest neighboring site (fig. 1B). Male B. terrestris (large earth or bufftailed bumblebee) were caught using butterfly nets and transferred into individual 100-ml pots after morphological confirmation of sex and species. Pots were placed into a bag at 4-10 C. Within 2 h, males were rapidly transferred to 2 ml cryotubes and then snap frozen in liquid nitrogen. Subsequent storage was at À80 C.
From each bee, dissected tissue was homogenized in 200 ll of phenol in a 2-ml screw-cap tube (supplementary table S1, Supplementary Material online). Subsequently, DNA was extracted using phenol-chloroform followed by purification with the Sigma GenElute Mammalian Genomic DNA miniprep kit. DNA purity was initially assessed using a NanoDrop spectrophotometer (Thermo Fisher Scientific, United Kingdom) followed by quantification with a Qubit v3 fluorometer (Thermo Fisher Scientific). DNA from each male was fragmented to $550 bp using a Covaris M220 ultrasonicator and fragment size distribution assessed using a TapeStation 2200 (Agilent Technologies, United Kingdom). From each sample, we prepared an individually indexed Illumina TruSeq PCR-free DNA library, which was quantified using qPCR MasterMix (ABI Prism) and primer premix (Kapa Biosystems, United Kingdom). Libraries were pooled in equimolar concentrations and pairs of 125-bp sequences were produced on two lanes of Illumina HiSeq 2500 at Biomedical Research Centre Genomics, London, United Kingdom. Five samples were additionally sequenced on one lane of Illumina HiSeq 2500 at Oxford Genomics, Oxford, United Kingdom.

Quality Assessment and Filtering of Raw Illumina Sequences
We obtained 616 million paired-end reads from the 51 samples we initially collected. Using bowtie2 (v.2.2.5; Langmead and Salzberg 2012) with the parameter "-X 1000," we aligned raw reads to the B. terrestris reference genome (GCF_000214255.1; Sadd et al. 2015). The 422Â cumulative genome coverage provided strong power to detect sites with nucleotide sequence polymorphism. Four males were removed from all biological analyses due to low coverage. A fifth male was only used for the analysis of contaminant sequences (supplementary appendix, Supplementary Material online) because >58.1% of reads from this male lacked similarity to the reference genome. The mean mapped coverage for each of the remaining 46 samples was 11.8Â (min: 7Â; max: 26.7Â). Quality of raw reads was assessed using FastQC (v.0.11.3; https://www.bioinformatics.babraham.ac.uk/projects/fastqc; last accessed June 30, 2015). Illumina adapters were detected and removed using Trimmomatic (v.0.33; Bolger et al. 2014). Using Khmer (v.2.1.1; Crusoe et al. 2015) to first interleave pairs of reads, we removed sequences of low quality (where >25% of the read has a Phred quality score of strictly <20) using the fastx toolkit (v.0.0.14; http://hannonlab.cshl.edu/fastx_toolkit; last accessed November 30, 2015). We used Khmer to remove 31-mers present three or fewer times across the entire data set, as they likely represent technical artifacts or particularly rare variants that we would be unable to analyze. Sequences shorter than 50 bp were removed using seqtk (v.1.0-r82-dirty; https://github.com/lh3/seqtk; last accessed November 30, 2015). The final cleaned data set thus comprised 46 males with a mean coverage of 11.1Â (min 6.7Â; max 24.4Â).
This cleaned data set provides sufficient power to genotype the majority of polymorphic sites because 96.2% of the genome had >1Â coverage in each of the 46 males. Overall, $99% of the reference genome had at least 20Â coverage.

Identification of Polymorphic Sites and Genotyping of Individuals
After mapping cleaned reads to the reference assembly using bowtie2, we called variants using freebayes (v.1.0.2-29-g41c1313; Garrison and Marth 2012) with the following parameters: -report-genotype-likelihood-max -use-mapping quality -genotype-qualities -use-best-n-alleles 4 -haplotypelength 0 -min-base-quality 3 -min-mapping-quality 1 -minalternate-fraction 0.25 -min-coverage 1 -use-reference allele. We first removed the aforementioned five low-coverage individuals as they were each missing >10% of genotype calls, thus retaining data from 46 males. We then removed entire SNPs with low genotype quality scores (-minQ 20) and variants in collapsed repetitive regions (-max-mean-DP 100) using VCFtools (v.0.1.15; Danecek et al. 2011). We removed sites that appeared to be heterozygous, which is impossible in haploids, and all sites with more than two alleles as they also likely represent collapsed regions in the reference genome. To further reduce data set complexity, we used -remove-indels to only consider SNPs. We calculated allele frequencies and retained genotypes only where the rare allele was present in at least two males. Finally, we only considered those SNPs in regions of the genome that are mapped to the 18 linkage groups (representative of chromosomes). Mean nucleotide diversity p was calculated using 10 kb sliding windows with 5 kb overlap using PopGenome (v.2.2.4;Pfeifer et al. 2014).

Evidence of Recent Selective Sweeps
First, we identified regions of the genome with particularly low nucleotide diversity, indicative of "hard" sweeps. Second, to identify potential "soft" selective sweeps, we calculated nS L (Ferrer-Admetlla et al. 2014) for all high confidence SNPs using selscan (v.1.1.0b; Szpiech and Hernandez 2014). This metric is a measure of extended haplotype homozygosity. We normalized all nS L scores against the empirical genomewide distribution using selscan "norm," using default settings. We used the top 1% (jnS L j ! 2.56) absolute score of the nS L metric (jnS L j) for all downstream analyses because jnS L j ! 2 indicates a selective sweep. Normalized jnS L j scores per gene, as well as NCBI RefSeq gene symbol and description, are

Supplementary Material
Supplementary data are available at Molecular Biology and Evolution online.