Balancing Selection Drives the Maintenance of Genetic Variation in Drosophila Antimicrobial Peptides

Abstract Genes involved in immune defense against pathogens provide some of the most well-known examples of both directional and balancing selection. Antimicrobial peptides (AMPs) are innate immune effector genes, playing a key role in pathogen clearance in many species, including Drosophila. Conflicting lines of evidence have suggested that AMPs may be under directional, balancing, or purifying selection. Here, we use both a linear model and control-gene-based approach to show that balancing selection is an important force shaping AMP diversity in Drosophila. In Drosophila melanogaster, this is most clearly observed in ancestral African populations. Furthermore, the signature of balancing selection is even more striking once background selection has been accounted for. Balancing selection also acts on AMPs in Drosophila mauritiana, an isolated island endemic separated from D. melanogaster by about 4 Myr of evolution. This suggests that balancing selection may be broadly acting to maintain adaptive diversity in Drosophila AMPs, as has been found in other taxa.


Introduction
Pathogens exert strong selective pressures on their hosts, both in terms of individual fitness and the evolutionary trajectory of populations and species (Anderson and May 1981). Coevolutionary dynamics of hosts and pathogens results in continual selection for adaptive improvements in both players, often referred to as a coevolutionary arms race (Paterson et al. 2010;Schulte et al. 2010;Thrall et al. 2012). As a consequence, genes involved in immune defense tend to undergo strong positive selection, such that they are among the fastest evolving genes in the genomes of many hosts (Nielsen et al. 2005;Kosiol et al. 2008;Downing et al. 2009a;Ekblom et al. 2010;McTaggart et al. 2012).
However, resistance mutations may not always become fixed. Balancing selection is the process whereby polymorphism is adaptively maintained over extended timescales, sometimes described as trench-warfare dynamics (Stahl et al. 1999). Several processes are thought to contribute to balancing selection (reviewed by Llaurens et al. [2017]). These include heterozygote advantage, whereby individuals heterozygous at a given locus have a fitness advantage over either homozygote; negative frequency-dependent selection, whereby the benefit of an allele increases the rarer it is in a population; and selection varying in a context-dependent manner, for example, at different spatial or temporal scales, between the sexes, or in the presence or absence of infection. Balancing selection can be detected as an excess of intermediate frequency variants and a region of increased polymorphism around the selected site.
The extent to which selection will impact genetic variation within and around immune genes will depend on a number of factors. These include the form and strength of selection (Kingsolver and Pfennig 2007); the genetic architecture (e.g., dominance, epistasis) of immune traits (Mackay 2001); the timescale upon which selection is acting (Charlesworth 2006); the density, diversity, and virulence of pathogens (Lambrechts et al. 2006); the cost of maintaining resistance alleles in the absence of infection ; effective population size (Charlesworth 2009); the mutation and recombination rates of hosts and pathogens (Gandon and Michalakis 2002); environmental variables (Wolinska and King 2009); and demographic factors such as gene flow and bottlenecks (Brockhurst et al. 2003).
Balancing selection also appears to play a role in the evolution of antimicrobial peptides (AMPs) in some taxa. AMPs are effectors of innate immunity that are strongly induced upon infection (Lemaitre et al. 1997;Tzou et al. 2000). They are often membrane active (Shai 2002;Brogden 2005), with a direct role in killing and/or impeding the growth of pathogens (De Gregorio et al. 2002;Lemaitre and Hoffmann 2007). Balancing selection has been implicated as a driver of AMP evolution in a diverse array of species including birds (Hellgren and Sheldon 2011;Chapman et al. 2016), amphibians (Tennessen and Blouin 2008), fish (Halld orsd ottir and Arnason 2015), mollusks (Gosset et al. 2014), and humans (Cagliani et al. 2008;Hollox and Armour 2008).
The fruit fly, Drosophila melanogaster, is an important model for understanding evolution of the immune system (Hultmark 1993;Hoffmann 2003;Janssens and Beyaert 2003;Dostert et al. 2005;Lamiable et al. 2016). Directional selection on Drosophila immune genes appears to be a relatively widespread phenomenon, especially among antiviral, receptor, and signaling genes (Schlenke and Begun 2003;Obbard et al. 2006Obbard et al. , 2009Clark et al. 2007;Heger and Ponting 2007;Sackton et al. 2007;Hill et al. 2019). In contrast, evidence for balancing selection acting on Drosophila immune genes has been more equivocal. Genome-wide scans by Croze et al. (2016Croze et al. ( , 2017 found little evidence for balancing selection acting on immune genes. In contrast, both single gene and genome-wide analyses of selection have indicated that balancing selection may play an important role in the evolution of AMPs in Drosophila (Hanson et al. 2016;. One striking example is the AMP Diptericin. This AMP is subject to balancing selection in D. melanogaster, likely driven by a tradeoff between immune defense and another life-history trait . The same balanced polymorphism, achieved via a different mutation to the derived allele, is found in Drosophila simulans . Additionally, recent analyses have shown that both spatial and temporal fluctuations are associated with variation in D. melanogaster allele frequencies (Bergland et al. 2014), particularly in immune genes, including AMPs (Bergland et al. 2014;Early et al. 2017;Behrman et al. 2018).
AMPs play a key role in controlling pathogen load and infection outcome (De Gregorio et al. 2002;Lemaitre and Hoffmann 2007), which may be particularly important for insects and other invertebrates that lack an adaptive immune system. Given their direct interaction with pathogens, it is surprising that insect AMPs often do not show signatures of recurrent adaptive substitutions. We hypothesize that AMPs in Drosophila are prone to balancing selection. To test this hypothesis, we examined patterns of nucleotide variation at AMP, immune-, and control-gene loci in four populations of D. melanogaster and one population of Drosophila mauritiana. Using both a linear model and a matched control-genebased approach, with standard population genetic statistics, we searched for molecular evolutionary signatures of selection on AMPs and immune genes. Our results provide evidence that balancing selection is an important driver of AMP evolution.

Genetic Variation across Four D. melanogaster Populations
To determine whether AMPs show signatures of balancing selection, we obtained coding sequence alignments for 13,494 genes (including 35 AMPs and 288 genes putatively involved in immune defense; hereafter: immune genes) (Lack et al. 2016) for four D. melanogaster populations: Zambia (ZI), Rwanda (RG), France (FR), and North Carolina (DGRP) and quantified nucleotide polymorphism (supplementary table S1, Supplementary Material online). Drosophila melanogaster originated in sub-Saharan Africa, expanded into Europe $15-16,000 years ago, and subsequently spread to North America <200 years ago (David and Capy 1988;Li and Stephan 2006;Keller 2007). The ZI and RG lines therefore represent ancestral populations, whereas FR and DGRP are derived populations. For each autosomal gene, we calculated three population genetic statistics: Watterson's h (the number of segregating sites, corrected for sample size), p (pairwise nucleotide diversity), and Tajima's D across all populations, for silent (four-fold degenerate) sites, per after controlling for missing data. We limited our analyses to silent sites to allow us to reduce the possibility that our results were due to relaxed constraint on nonsynonymous variation rather than balancing selection. Relaxed constraint at the protein level should increase nonsynonymous diversity because selection against amino acid changes is lowered (Hartl and Clark 2006;Wang et al. 2016). Excluding nonsynonymous sites therefore allows us to specifically focus on the footprint of balancing selection. We then grouped genes as AMPs, immune genes, and background genes. The mean Tajima's D for AMPs is higher than the mean of background genes in all populations (ZI, À0.284 AMPs vs. À0.874 autosomal average; RG, À0.110 vs. À0.232; FR, À0.064 vs. À0.113; DGRP, À0.041 vs. À0.596, supplementary table S2, Supplementary Material online), consistent with relatively more balancing selection occurring in AMPs. As observed previously (e.g., Glinka et al. 2003;Shapiro et al. 2007), the autosome-wide average for Tajima's D is quite negative in D. melanogaster, which likely reflects a complex demographic history (supplementary fig. S1, Supplementary Material online).
As selection across the genome can be affected by differing levels of mutation and recombination, we next tested for differences in population genetic statistics between AMPs and the autosomal background after controlling for genomic position. We specifically tested whether AMPs have higher values of the three population genetic statistics by employing a linear model with four covariates: gene length, chromosome, chromosomal region (nested in chromosome, explained in more detail in the Materials and Methods), and gene type (AMP or not, nested in chromosomal region and chromosome). This revealed that population genetic measures were elevated for AMPs in ancestral populations (ZI and RG), but not derived populations (DGRP and FR) (table 1 and supplementary table S3 and

AMP-Control Tests for Balancing Selection in Drosophila
Given the apparent differences in selection between AMPs and the background averages described above, we also employed an AMP-control approach to test whether AMPs showed elevated diversity (a signature of balancing selection) in D. melanogaster and D. mauritiana while controlling for local variation in mutation and recombination rates. This approach also allowed us to visualize the differences found in the linear models above. For each AMP, we randomly sampled genes of similar length (coding sequence length 10 times the size of the AMP) and genomic position (within 100,000 bp on either side), calculated statistics for the AMP and control gene, and then calculated the mean difference over the 35 AMP/control comparisons. We repeated this procedure 10,000 times to obtain an empirical distribution of differences ( fig. 1). Each AMP was associated with 8-30 control genes, resulting in each replicate containing a unique set of control genes. In these instances, a positive difference suggests a higher value for AMPs versus the control gene, consistent with balancing selection. Indeed, the differences are primarily positive for both p and Watterson's h for all populations (fig. 1B and C and table 2). For Tajima's D, the differences are positive for Zambia and Rwanda (ancestral populations), consistent with balancing selection, but close to 0 for France and negative for DGRP (derived populations, fig. 1A and table 2). In line with our previous analyses, we found that AMPs had higher Tajima's D in both RG and ZI, but not FR or DGRP. These results were recapitulated when we subsampled AMPs to control for the fact that they often cluster in the genome (see Materials and Methods, supplementary fig. S2, Supplementary Material online). We also examined all non-AMP immune genes using this control gene method and found little evidence of balancing selection in immune genes as a whole, in general concordance with Croze et al. (2017Croze et al. ( , 2016  Accounting for Background Selection Strengthens the Signature of Balancing Selection on Drosophila AMPs Background selection, the removal of neutral variation due to selection against linked deleterious alleles, can influence levels of polymorphism across the genome. Comeron (2014) calculated the observed amount of background selection across the genome in 1,000-bp windows in the D. melanogaster -These linear models include only genes within 100,000 bp and within ten times the size of an antimicrobial peptide. Data is presented as F-statistic/P-value from the linear model, with degrees of freedom (df) denoted in the second header row. P values <0.05 are in bold. AMP refers to AMP nested in region nested in chromosome and region refers to region nested in chromosome. Linear models were run individually for five Drosophila populations: four D. melanogaster populations (DGRP, Drosophila Genetics Reference Panel from North Carolina, USA; FR, France; RG, Rwanda; ZI, Zambia), and one D. mauritiana (D. mau) population. All three statistics (Tajima's D, p and h W ) were calculated on silent (four-fold degenerate) sites only.
Rwanda population. He then correlated silent polymorphism against this measure. Regions with positive residuals (more silent polymorphism than expected based on background selection) were deemed to be under balancing selection, whereas those with negative residuals (less silent polymorphism than expected based on background selection) were deemed to be under directional selection. Two regions that contain AMPs (IM4 and Cecropin) were among the handful of outliers discussed by Comeron (2014) as being under balancing selection, which further motivated us to examine the general pattern for AMPs. We identified all AMPcontaining windows and replotted Comeron's data. This revealed that AMPs tend to fall in regions well above the trend-line (pink points, fig. 2A), indicating they are, in general, evolving in a manner consistent with balancing selection. In contrast, immune genes do not show elevated residuals compared with neighboring genes (teal points, fig. 2A). To further ascertain whether, as a group, AMPs show signatures of balancing selection, we used Comeron's background selection data (Comeron 2014) to fit a linear model as described above but also included Comeron's M1 statistic for background selection for a particular region as a covariate. In this case, AMPs showed significantly elevated silent polymorphism compared with other genes, whether we looked genome wide (F 10, 41533 ¼16.66, P < 0.0001) or focused on 100,000-bp regions of the genome containing AMPs (F 10, 1288 ¼10.59, P < 0.0001). We also found that regions containing AMPs, but not immune genes, were significantly elevated for residuals compared with nearby regions using our resampling approach ( fig. 2B). This supports Comeron's assertion that accounting for background selection improves the ability to detect balancing selection (Comeron 2014) and also supports our previous results showing that AMPs as a group are likely subject to balancing selection.
4.1 Â 10 À4 3.4 Â 10 À4 5.1 Â 10 À4 7.4 Â 10 À4 1.5 Â 10 À6 resampled the difference in these statistics between AMPs and neighboring control genes. We found AMPs have consistently higher values for p, Watterson's h, and Tajima's D than their matched controls ( fig. 1, table 2, and supplementary table S4, Supplementary Material online). For other immune genes, the differences from controls are primarily negative for p, Watterson's h, and Tajima's D, suggesting directional selection may be acting on these genes in D. mauritiana ( fig. 1 and supplementary tables S3 and S4, Supplementary Material online).

Discussion
We find evidence consistent with balancing selection being an important evolutionary driver of AMP genes in Drosophila. This is most clearly observed in ancestral African populations (Zambia and Rwanda). There are several reasons why previous analyses may not have identified these selective forces acting on AMPs. First, signals of selection can be clouded by background selection. We found that the clearest signal for AMP balancing selection in the Rwandan population after using Comeron's (2014) method to account for background selection. Second, previous studies have tended to group immune genes as a single entity when scanning genomes for footprints of selection. Strong directional selection acting on some receptor and signaling immune genes may swamp a subtler signal of balancing selection acting on AMPs. Third, this effect may be exacerbated by the fact that effector genes tend to be smaller (Lemaitre and Hoffmann 2007) than receptor and signaling genes, providing fewer sites and therefore less power to detect any population genetic signature. Relatedly, single gene analyses might lack the power to show such signatures. Fourth, patterns of nucleotide polymorphism are strongly influenced by population demographic history. Our AMPcontrol approach should account for the confounding influences of local mutation and recombination rate variation, gene size, and demography (Garrigan and Hedrick 2003).
The differences between derived (DGRP and France) and ancestral populations (Rwanda and Zambia) were striking. As populations establish in new habitats, they will encounter different pathogen pressures and prevailing environmental conditions. This could dramatically alter which alleles are selectively advantageous. First, bottlenecks may lead to the loss of one or more of the balanced alleles. Furthermore, loss of disadvantageous alleles (e.g., alleles resistant to pathogens not present in the new habitat) likely occurs more rapidly than establishment of new, beneficial polymorphisms (e.g., resistance alleles for newly encountered pathogens). This may explain why we find the strongest evidence for balancing selection on AMPs in ancestral African populations that have been coevolving with their pathogens, under semipredictable conditions, for long time-periods.
An alternate explanation for these differences could be that the quality of genomic data in the derived populations is lower than that for the ancestral populations (supplementary table S1, Supplementary Material online). We therefore limited our analysis to a subset of 114 DGRP lines with the highest quality data (DGPR-HQ, being those lines with fewest Ns and highest coverage, see Materials and Methods). Though we did still did not find strong evidence for balancing selection acting on AMPs in this population (DGRP-HQ in supplementary table S2, Supplementary Material online), both measures of diversity (pairwise diversity and Watterson's theta) further increased in AMPs (but not other immune genes) when compared with the background, supporting an increase in diversity in AMPs compared with other genes. This also serves as a cautionary tale, as including lower quality genomes adds noise (increased standard deviation in resampling, supplementary table S2, Supplementary Material online) that could potentially mask population genetic signals.
It is tempting to look to newly developed methods for detecting balancing selection (e.g. DeGiorgio et al. 2014;Siewert and Voight 2017), but these statistics were developed for detecting the molecular footprints of selection in human populations. Assumptions about the genomic signatures of a balanced polymorphism that work well in humans are not applicable to Drosophila, because the window of linked polymorphism likely to show these signatures is tiny. To state this numerically, DeGiorgio et al. (2014), based on Gao et al. (2015), suggest a window size of 1/q (where q is the population-scaled recombination rate or 4N e r) for observing the signature of a linked balanced polymorphism. For humans, q is about 0.001 so the window size is about 1,000 bp (DeGiorgio et al. 2014). Estimates of q in D. melanogaster are highest in the DGRP population and range from 9.6 to 14.8 per kb for the different chromosomes (Chan et al. 2012). These values correspond to windows of 100 bp or less in D. melanogaster. Given estimates of nucleotide diversity between 0.001 and 0.01 (Lack et al. 2016), we expect less than one segregating site per window, rendering these tests unusable in this species. Recombination is even higher in D. mauritiana (True et al. 1996), and the use of pooled sequencing (Pool-seq) data for this species would further complicate the interpretation of newer selection statistics, due to the fact that low frequency alleles are dropped in Pool-seq data.
We find that, at least in ancestral populations, AMPs tend to evolve in a manner consistent with balancing selection, showing increased diversity but no increase in divergence  relative to other genes. This contrasts with other immune genes which show no such pattern. Why are AMPs different than other immune genes? One characteristic of AMPs is that they interact directly with microbes (Bulet and Stocklin 2005), and, in some cases, AMP sequence is directly linked to the efficacy of bacterial membrane interactions (Schmitt et al. 2016;Franzoi et al. 2017). If particular AMP alleles encode for peptides that are more effective at fighting infection by particular microbes, a fluctuating suite of pathogens in the environment over time or space could lead to balanced polymorphisms. This "specificity hypothesis" suggests that allele frequencies in AMPs should vary spatially or temporally. There is some evidence for both seasonal (Behrman et al. 2018) and spatial (Early et al. 2017) variation in selection pressure on AMPs. However, evidence for AMP specificity against particular pathogens, especially different naturally occurring alleles of the same AMP, is currently rare (but see e.g., Tennessen et al. 2009;Hellgren et al. 2010;Park et al. 2014;. Additionally, the patterns of divergence and polymorphism expected after periods of fluctuating selection are not clear-cut and may complicate the detection of balancing selection (see, e.g., Huerta-Sanchez et al. 2008;Miura et al. 2013;Gossmann et al. 2014).
Alternatively, AMP variation might be maintained because AMP alleles that are more effective against pathogens also tend to carry a higher autoimmune cost. This "autoimmune hypothesis" states that more effective AMP alleles should be common during pathogen epidemics, but decrease in frequency when pathogens are rare. These patterns might also vary spatially and temporally, making the interpretation of these context-dependent patterns more difficult. There is evidence that overexpression of AMPs can have deleterious fitness consequences (Gilliet and Lande 2008;Benachour et al. 2009;Maneerat et al. 2016). However, it seems that if autoimmune costs were important in maintaining variation, we would also see signatures of balancing selection in the IMD and toll pathway signaling genes that control expression of AMPs. Most work suggests that these genes are evolving under directional selection, consistent with an arms race model (Obbard et al. 2006(Obbard et al. , 2009Sackton et al. 2007). Distinguishing between these two hypotheses for the adaptive maintenance of AMP genetic variation will take careful functional analysis.

Polymorphism in Four Populations of D. melanogaster
We downloaded chromosome sequences for the Zambia (ZI, n ¼ 197), Rwanda (RG, n ¼ 27), Drosophila Genetic Reference Panel (DGRP, n ¼ 205), and France (FR, n ¼ 96) populations, available as part of the Drosophila Genome Nexus from http:// www.johnpool.net/genomes.html; last accessed September 10, 2019. (Mackay et al. 2012;Pool et al. 2012). These data were collected as described elsewhere (Pool et al. 2012). Briefly, short read Illumina data were generated for each individual and mapped to the D. melanogaster reference genome version 5.22 using BWA and Stampy. Following mapping and alignment, GATK indel realigner was used to refine short indel alignments and SNPs were called for each individual. A custom reference genome was then generated for each individual based on all SNPs called and the process was repeated. Following a second round of mapping and indel realignment, SNPs called across both rounds were then inserted to create a second, final custom genome for each individual (Pool et al. 2012).
We converted these sequences into FASTA files, per chromosome, for each population. The RG and ZI populations are much higher quality data, the average per base coverage of the raw FASTQ data used to generate the FASTA files is much higher, and the number of ambiguous bases is much lower than the DGRP and FR populations (supplementary table S1, Supplementary Material online). Additionally, we created a subset of 114 DGRP lines (hereafter DGRP-HQ) with high coverage to calculate statistics in DGRP using only high-quality data (at least 25-fold coverage, and at least 75% of sites called in 1,000-bp windows across the genome).
Using annotation 5.57 of the D. melanogaster genome, we extracted FASTA alignments for each gene and extracted silent (four-fold degenerate) sites using a custom Biopython script (personal comm. Yasir Ahmed, May 2018). We then used a custom Python script with the package DendroPy, to find p, Watterson's h, Tajima's D, and the number of segregating sites for the subset of silent sites per gene (Sukumaran and Holder 2010;Ferretti et al. 2012). We categorized genes involved in immune defense (hereafter immune genes) using the designations given at https://www.epfl.ch/labs/lemaitrelab/lemaitre-lab/resources/list-of-drosophila-genes-potentiallyinvolved-in-the-immune-response/; last accessed September 10, 2019. (Lemaitre and Hoffmann 2007). We removed nonautosomal genes from all downstream analyses, because the X chromosome does not harbor any AMPs, and has a different effective population size than autosomes.
We employed a linear model in the form of Y ¼ chromosome þ gene length þ chromosome/region þ chromosome/ region/AMP, where Y refers to the value of a particular statistic for that gene, region refers either to the region of the genome in which the gene is found (200,000-bp width) or a window around the focal AMP, and AMP refers to whether or not each gene was an AMP. Slashes such as in chromosome/region refer to region nested in chromosome. For the analysis of AMP silent polymorphism while accounting for background selection, we employed a linear model in the form of Y ¼ M1 þ region/AMP (Comeron 2014). All linear models included only genes on chromosomes 2L, 2R, and 3R, as these are the only chromosomes harboring AMPs in D. melanogaster and D. mauritiana. ANOVA results were analyzed using the car package (Fox and Weisberg 2011) in R v 3.5.1 (R Core Team 2017). The results of these models are summarized in supplementary table S3, Supplementary Material online.
To control for missing data, we removed all sequences containing over 25% ambiguous bases and recalculated each statistic using DendroPy v 4.4.0 in Python v 2.7.0, accounting for missing data in our calculation (Sukumaran and Holder 2010;Ferretti et al. 2012). We again fitted a linear model to this data, for each AMP and those control genes within 10,000, 50,000, and 100,000 bp of that AMP.
For each population, we then resampled to find the average difference in scores between AMP/immune genes and control genes. Genes were assigned as AMPs or immune genes based on data from Bruno Lemaitre. For each gene in these categories (AMP or immune), we randomly sampled a control gene in a 200,000-bp window centered on the AMP (100,000-bp upstream or downstream), that was no more than ten times larger than the focal gene (defined as coding sequence length measured in base pairs), and not another gene in the given category. We then found the average difference ( D) in each measure for the focal gene (AMP or immune) group and the control group such that D ¼ 1 n X n i¼1 ðX AMP=Immune i À X Control i Þ; where X AMP/Immune represents the chosen AMP/immune gene, X Control represents the randomly sampled control gene, and n accounts for the number of genes in the group. We then resampled 10,000 times to obtain an empirical distribution of the differences. Each AMP or immune gene was associated with between 8 and 47 control genes. With 10,000 resamplings, we expect only one particular combination of control genes to be chosen twice. We tested a range of window sizes (50,000-150,000 bp) and found qualitatively similar results (data not shown). To account for the fact that many AMPs cluster in the genome which might lead to pseudoreplication, we additionally repeated our analysis with a subsampled data set. To this end, we choose a subset of ten AMPs that were at least 5,000 bp from any other included AMPs and repeated the AMP-control analysis described above, with the specific subset of ten AMPs changing randomly per resampling. We employed this method to control for genome-wide variation in recombination rates, mutation rates. Resampling 10,000 times allows for a robust empirical distribution that does not rely on the particular control genes chosen per iteration. We therefore present the distribution of differences as violin plots with the proportion of resamplings that do not overlap zero, analogous to a bootstrap value.

Polymorphism in a Population of D. mauritiana
We downloaded the reference genome, annotation, and mapped BAM file of a population of D. mauritiana sequenced as a pool (Pool-seq) from http://www.popoolation.at/mauriti-ana_genome; last accessed September 10, 2019. (Nolte et al. 2013), and used PoPoolation to calculate p, Watterson's h, and Tajima's D for each gene in this population. We then resampled to find the average difference in scores between AMPs/immune and a control set of genes, as described above.

Data Availability
All data used in this study are publicly available and freely accessible. The D. melanogaster sequence data were obtained from John Pool's Drosophila Genome Nexus (http://www.johnpool.net/genomes.html; last accessed September 10, 2019) in FASTA format. All D. mauritiana data (BAM file, reference genome and gene annotation) were downloaded from http://www.popoolation.at/mauriti-ana_genome; last accessed September 10, 2019.

Supplementary Material
Supplementary data are available at Genome Biology and Evolution online.