Balancing Selection for Pathogen Resistance Reveals an Intercontinental Signature of Red Queen Coevolution

Abstract The link between long-term host–parasite coevolution and genetic diversity is key to understanding genetic epidemiology and the evolution of resistance. The model of Red Queen host–parasite coevolution posits that high genetic diversity is maintained when rare host resistance variants have a selective advantage, which is believed to be the mechanistic basis for the extraordinarily high levels of diversity at disease-related genes such as the major histocompatibility complex in jawed vertebrates and R-genes in plants. The parasites that drive long-term coevolution are, however, often elusive. Here we present evidence for long-term balancing selection at the phenotypic (variation in resistance) and genomic (resistance locus) level in a particular host–parasite system: the planktonic crustacean Daphnia magna and the bacterium Pasteuria ramosa. The host shows widespread polymorphisms for pathogen resistance regardless of geographic distance, even though there is a clear genome-wide pattern of isolation by distance at other sites. In the genomic region of a previously identified resistance supergene, we observed consistent molecular signals of balancing selection, including higher genetic diversity, older coalescence times, and lower differentiation between populations, which set this region apart from the rest of the genome. We propose that specific long-term coevolution by negative-frequency-dependent selection drives this elevated diversity at the host's resistance loci on an intercontinental scale and provide an example of a direct link between the host’s resistance to a virulent pathogen and the large-scale diversity of its underlying genes.


Introduction
Hosts and parasites engage in specific interactions that are believed to select for and maintain genetic diversity at host resistance genes (Sackton et al. 2007;Ebert and Fields 2020;Radwan et al. 2020). If pathogens evolve to overcome the resistance of common host alleles, rare resistance alleles have a selective advantage until they also become common. This form of time-lagged negative-frequency-dependent selection (NFDS), often referred to as Red Queen coevolution, is believed to increase genetic polymorphism at loci that interact with the antagonist (Charlesworth 2006;Thrall et al. 2015;Rabajante et al. 2016). Indeed, the Red Queen hypothesis has gained so much popular support that regions in host genomes that show elevated genetic diversity are taken as potential indicators of antagonistic coevolution, even when the coevolving antagonists are unknown. The Red Queen model was originally conceived to be a process that acts within populations, but host-parasite interactions undergoing NFDS also shape genetic diversity among populations (reviewed in Ebert and Fields [2020]). Because resistance alleles that migrate into host populations are rare, they may be favored by selection, resulting in a higher effective migration rate than other alleles in the genome (Charlesworth et al. 1997;Thrall et al. 2012;Jousimo et al. 2014;Bolnick and Stutz 2017). Nevertheless, the random loss of genotypes in small populations and strong selection from local parasites can also quickly lead to genetic divergences between neighboring populations (Lively and Dybdahl 2000;Bourgeois et al. 2017). Given this combination of regional and local dynamics, even nearby populations can display high divergence at resistance loci, whereas distant populations may show low divergence (Charlesworth et al. 1997). On large geographic scales, thus, one would expect genomic regions with resistance loci involved in coevolution to display signatures of higher genetic diversity than the rest of the genome, balancing selection, and reduced spatial structure. Evidence for these predictions has been found in the vertebrate MHC loci (Eizaguirre et al. 2012;Kaufman 2018) and in R-genes in plants (Bergelson et al. 2001;Tellier and Brown 2011), although, for both these groups of genes, the functional link between the resistance genes and the long-term coevolving parasites is missing. In other systems, although coevolutionary dynamics between hosts and specific parasites have been demonstrated, the underlying genetics are not known (Thrall et al. 2012;Gibson et al. 2018).
Here we test the hypothesis that host-parasite coevolution causes balancing selection at a host resistance gene cluster in the water flea Daphnia magna, coevolving with the obligate bacterial endoparasite Pasteuria ramosa. In this system, both the host and the parasite have a wide natural distribution covering nearly the entire Holarctic. Infections bear extreme fitness costs for the host (Luijckx et al. 2012). Resistance follows a matching allele model, preventing individual hosts and parasite genotypes from reaching fixation (Luijckx et al. 2013), and displays high diversity within populations (Andras and Ebert 2013). Coevolution has been indicated in this system based on a study of sediment cores showing the temporal dynamics of D. magna-P. ramosa interactions over about three decades (Decaestecker et al. 2007). To test for predicted patterns of genetic diversity within and between populations, we used a panel of D. magna genotypes consisting of single clonal lines collected from 125 populations in Eurasia and North Africa ( fig. 1A), each with information about geographic origin and genome sequences. Notably, for each host genotype we also possessed resistance phenotype data for five parasite genotypes. To test for signatures of balancing selection, we analyzed patterns of diversity at both the phenotypic and genetic level. The latter focused especially on a genomic region in D. magna that explains the most variance in its resistance to Pasteuria (Bento et al. 2017). This region (positions 1,368,860 to 1,506,215 on scaffold00944 of the D. magna reference genome, version 2.4, here called "resistance QTL") contains a supergene that has been found to harbor extremely diverged haplotypes (Bento et al. 2017). Evidence from several sources suggests that this region plays a role in resistance to P. ramosa in natural populations. Genome scans for selection and association show a significant signal for this cluster and its flanking genomic regions across European D. magna populations . The same association signal is found within a single panmictic population in Switzerland (Ameline et al. 2021). It has historically been difficult to establish a functional link between resistance, genetic diversity, and the consequences for coevolution, as the underlying genes of either the coevolving parasites or the host were unknown. Nevertheless, the architecture of resistance to P. ramosa has now been characterized for D. magna (Routtu and Ebert 2015;Bento et al. 2017;Bourgeois et al. 2017;Ameline et al. 2021), making it possible to test directly for a signature of balancing selection at this region and understand how coevolution with a virulent and widespread parasite affects host genetic variability.

Results and Discussion
No Geographical Structure for Resistance Phenotypes Parasite-driven NFDS is expected to result in a geographic mosaic of resistance phenotypes and genotypes across the host's range with no or weak geographic structure as compared with the genetic background (Kaltz and Shykoff 1998;Tellier and Brown 2011;Ebert and Fields 2020). To test this, we investigated whether polymorphism for resistance phenotypes displayed a signal of spatial structure. We isolated five parasite strains (P. ramosa C1, C19, P15, P20, and P21) from natural populations across Europe ) and phenotyped D. magna clones for resistance to these by assessing whether labeled spores attached to the host's foregut (all five parasites) or hindgut (two parasites: P15 and P21) (Duneau et al. 2011) (supplementary table S1, Supplementary Material online). This resulted in seven different resistance phenotypes, which we summarized with 7-letter codes (R for resistance and S for susceptibility for each of the five foregut and two hindgut phenotypes; fig. 1A). Note that this phenotypic assessment covers only a fraction of the total phenotypic variation and should be seen as a sample for the actual diversity in parasites and host resistotypes. Resistance phenotypes were found to be uniformly distributed across the entire study region without a pattern of isolation by distance (IBD; fig. 1A and supplementary table S2, Supplementary Material online). This was further confirmed by a global Distance-based Moran's eigenvector maps (dbMEM) analysis, which did not detect any significant positive spatial correlation in the spatial repartition of the seven resistotypes ( adj R 2 ¼ 0.006, P ¼ 0.068). The same observation held when considering each resistotype independently ( adj R 2 between À0.12 and 0.002, all P > 0.1). To understand the biogeographic context for this absence of a spatial pattern, we compared this analysis to a similar analysis using singlenucleotide polymorphism (SNP) data derived from the genomic sequences of the 125 D. magna clones. We found a strong pattern of IBD for genomic data, where average relatedness between individual host clones decreased with geographic distance (N ¼ 125, Mantel R ¼ À0.56, 1,000,000 permutations, P < 10 À6 ). This pattern is consistent with a previous study of the same Daphnia species (Fields et al. 2015). Moreover, in D. magna, other phenotypic traits show a clear geographic structure (Yampolsky et al. 2014;Seefeldt and Ebert 2019), underscoring that the lack of geographic structure for Pasteuria resistance is unique.
We further examined whether resistance polymorphism (only to Pasteuria C1, C19, P15, and P20). The P21 isolate was isolated only later also held true on the single-population scale. To do so, we obtained resistance phenotypes for D. magna individuals hatched from resting eggs from 23 populations from the Western Palaearctic for which we could successfully phenotype at least five host genotypes-20 of them polymorphic (R or S) for at least one Pasteuria strain. There was no correlation between variation in resistotype frequencies and pairwise distance ( fig. 1C)  . This lack of positive spatial correlation was confirmed by a dbMEM analysis ( adj R 2 < 0). Thus, phenotypic diversity for resistance against Pasteuria infections, which did not show a spatial pattern, contrasted strongly with the genomic background, which was shaped by IBD. This uniform diversity in resistance on a very large geographic scale (entire Palaearctic) coincides with the theory of host-parasite coevolution by balancing selection, which projects that phenotypic diversity is maintained at loci of functional importance for the Balancing Selection in Daphnia magna . doi:10.1093/molbev/msab217 MBE interaction of the antagonists. To our knowledge, this finding has not been shown before for phenotypic traits under coevolution.

Genomic Data Reveal a Signature of Population Structure and Postglacial Expansion
To investigate patterns of genetic diversity and divergence, population structure and history should be taken into account. Using a discriminant analysis on principal components (DAPC; Jombart et al. 2010) based on genotypes (supplementary fig. S1A, Supplementary Material online) for all 125 D. magna clones, we identified three geographic clusters. As the model with three clusters had the lowest Bayesian Information criterion (BIC), we assigned individuals to three putative geographical clusters, which we called, for simplicity, Europeþ, Middle East, and East-Asian (N ¼ 100, 11, and 14, respectively). Our results supported previous studies in revealing substantial divergence between East-Asian samples and the other groups (Fields et al. 2015(Fields et al. , 2018; indeed the East-Asian cluster was clearly separated from other clones by the first discriminant function with the highest eigenvalue ( fig. 2A). In addition, estimates of differentiation measured by F ST over 1-kb windows along the genome (see Materials and Methods) were also substantially higher for the East-Asian cluster (average F ST ¼ 0.32, 0.37, and 0.124 for Europeþ vs. East-Asia, Middle-East vs. East-Asia, and Europeþ vs. Middle-East, respectively, Wilcoxon signed rank tests, all P < 2.2 Â 10 À16 ).  *** *** *** * *** *** *** NS *** *** *** *** All E+ EA ME All E+ EA ME All E+ EA ME The DAPC analysis identified three major groups: Europeþ (Eþ), East-Asia (EA), and Middle-East (ME). (B) Empirical P values for nucleotide diversity in 1-kb windows for all 125 D. magna clones and the three geographic groups. Diversity statistics are ranked in decreasing order to obtain P values, so low P values correspond to high diversity. The resistance supergene region (QTL locus 6 100 kb) is located between the two dotted lines. The supergene itself is masked in gray due to very poor mapping of short reads to this region (positions 1,435,000 to 1,490,000 on scaffold00944). Coordinates correspond to D. magna 2.4 genome. Negative coordinates correspond to a region in the PacBio scaffold that mapped outside the original scaffold00944 (see supplementary fig. S1, Supplementary Material online). (C) Neutrality statistics (over 1-kb windows) in the region around the resistance supergene compared with genome-wide values (excluding scaffolds shorter than 10 kb in genome version 2.4). In all pairwise comparisons, the boxplots on the left and right correspond to the genomic background and the region around the resistance supergene, respectively. For Fu and Li's F and Fu and Li's D, Daphnia similis was used as an outgroup; higher values are associated with frequency spectra skewed toward ancestral variants and alleles at intermediate frequencies, supporting balancing selection. P values were obtained from Wilcoxon rank-sum tests (NS: nonsignificant; *: P < 0.05; ***: P < 0.001). Color codes as in figure 2. (D) Empirical P values for divergence statistics. The upper panels show the F ST , which is expected to be reduced if balancing selection is present, for all three pairwise comparisons among the geographic regions Europeþ, East-Asia, and Middle-East. In that case, F ST values are ranked in increasing order to obtain the empirical P value. The lower panel shows the absolute divergence, d xy , for the same pairs, which is expected to increase if there are ancient polymorphic alleles.
Balancing Selection in Daphnia magna . doi:10.1093/molbev/msab217 MBE estimates and confidence intervals), which coincides with previous studies based on mitochondrial data (Fields et al. 2018). Such demographic events are thought to skew the genome-wide allele frequency spectrum (AFS) toward more rare alleles, whereas balancing selection would maintain alleles at higher frequency. The absence of a strong recent bottleneck suggests that false positive evidence for balancing selection due to demography most likely do not explain the patterns we observed near the resistance QTL.

High Nucleotide Diversity and Skewed Allele Frequency Spectra near the Resistance Locus
To first assess whether the region around the resistance QTL displayed elevated nucleotide diversity, as would be expected under balancing selection (Charlesworth 2006), we improved the quality of scaffold00944 by using a PacBio contig from the same individual used to build the reference genome (supplementary fig. S2, Supplementary Material online), as described in a previous study (Bento et al. 2017). Nevertheless, divergence between variants within the supergene (roughly located between positions 1,435,000 and 1,490,000 on scaffold00944) were so high that alignment of short Illumina reads on the reference was not possible. Because the supergene haplotypes are not homologous and are difficult to assemble due to their repeat richness, typical population genetic approaches that rely on the alignment of diverged haplotypes are impossible for this region (Bento et al. 2017). In all subsequent analyses, thus, we excluded the supergene region to avoid the unreliable mapping of reads and instead focused on its flanking regions. As a result, our divergence measures and other population genetic summary statistics based on SNP variation in these flanking regions strongly underestimate the actual polymorphism in the region of highest diversity. All geographic clusters displayed a considerable increase in nucleotide diversity for 1-kb windows between positions encompassing the resistance QTL and the following 1 Mb (fig. 2B), with most windows in the top 5-10% genome-wide, and peaks in the top 0.1%. For all clones as well as for the three geographic clusters, nucleotide diversity was higher at the resistance region (hereafter defined as the resistance QTL 6100 kb), than the rest of the genome (Wilcoxon rank-sum tests, all P < 2.2 Â 10 À16 ).
We then tested the hypothesis that alleles with intermediate frequencies should be more common in genomic regions under balancing selection than in the genomic background (Charlesworth 2006). Indeed, the resistance region showed an abundance of alleles at intermediate frequencies with a significantly elevated Tajima's D ( fig. 2). Using the closely related species Daphnia similis (Cornetti et al. 2019) as an outgroup to define ancestral alleles, we further found elevated Fu and Li's F and D (Fu and Li 1993), indicating an excess of ancestral polymorphisms at intermediate frequencies ( fig. 2C). This supports the hypothesis that balancing selection acts at the resistance region. We also note that genome-wide values for Tajima's D were generally negative for the Europeþ and East-Asia clusters, and closer to 0 for the Middle-East cluster, consistent with the recent expansion inferred by our demographic analyses and evidence on mitochondrial genomes (Fields et al. 2018). The glacial refugium for the European D. magna was suggested to be in South-Eastern Europe/Middle East (Fields et al. 2018).

Low Relative but High Absolute Measures of Spatial Differentiation
Another hallmark of balancing selection is reduced differentiation at selected loci among populations at large spatial scales. Migrating alleles at loci under balancing selection are likely to be rare upon arrival, giving them an advantage and, as a consequence, increases their effective migration rate. Neutral alleles, on the other hand, would only increase in the recipient population if they hitchhike with alleles under selection (Laine et al. 2011;Thrall et al. 2012;Phillips et al. 2018;Ebert and Fields 2020). In addition, alleles under balancing selection are less likely to go extinct in a given population because they are advantageous when rare. Balancing selection, thus, can be expected to reduce the turnover rate of alleles and to facilitate long-term persistence of polymorphism within populations (Charlesworth 2006;Leffler et al. 2013). Distinct populations would share polymorphisms and show reduced estimates of population differentiation at the genes under selection (Charlesworth 2006;Rico et al. 2015). To test this theory, we estimated genome-wide variation in relative (F ST ) and absolute (d XY ) differentiation using the three geographic clusters determined by the DAPC ( fig. 2A). Introgression and selection are both expected to impact population differentiation and estimates of F ST and d XY in different ways: balancing selection should decrease F ST and increase d XY , whereas recent introgression would reduce both statistics, as it is an absolute measure of divergence that captures the number of sequence differences since the most recent common ancestor (TMRCA) (Cruickshank and Hahn 2014). Our data clearly support balancing selection, showing reduced relative population divergence (significantly lower F ST for Europeþ vs. East-Asia and East-Asia vs. Middle-East comparisons; Wilcoxon rank-sum test, all P < 2.2 Â 10 À16 , fig.  2D), and increased absolute population divergence in the resistance region (higher d XY compared with genome-wide mean [all P < 2.2 Â 10 À16 , fig. 2D]). This suggests that alleles in the resistance region have a higher chance of spreading across populations and being maintained within populations, consistent with an advantage for being rare.

Observed Patterns of Diversity Are Consistent with Simulations under Negative-Frequency-Dependent Selection but Not with Neutrality
To test whether differentiation and diversity statistics deviated significantly from neutral expectations, we generated 10 million coalescent simulations under our demographic model. We then performed a principal component analysis (PCA) on summary statistics ( fig. 3A). The first PC axis (PC1) explains 42% of the total variance, and is clearly correlated with diversity statistics and Tajima's D. The second axis (PC2)  Predicted values for the resistance region are consistent with our previous observations, with high PC1 scores (high diversity and Tajima's D), and low PC2 scores (low F ST ). There is a high density of windows deviating from neutrality in the resistance region ( fig. 3B). This is further confirmed by a scan for balancing selection using the B 0,MAF statistics (Cheng and Degiorgio 2020). The test contrasts allele frequency spectra around a focal SNP to the genome-wide frequency spectrum to estimate the likelihoods of models with and without balancing selection, under a broad range of equilibrium frequencies. We observe clear signals in Europeþ and Middle-East clusters ( fig. 3B), with many regions above the top 1% genome-wide threshold.
To test the conditions under which NFDS could produce the observed patterns of diversity, we ran simulations using the forward-in-time simulator SliM3 on the demographic history estimated from whole-genome data. We simulated 1-kb windows with mutation and recombination rates consistent with current knowledge about D. magna. We varied the fraction of new mutations recruited by selection and the equilibrium frequency at which balanced polymorphisms are maintained. As expected in NFDS simulations, nucleotide diversity, Tajima's D, and d XY were higher than in neutral simulations, whereas F ST was lower ( fig. 3A and supplementary  fig. S3, Supplementary Material online). In scenarios where 0.01% of new mutations were recruited by selection, diversity was generally lower than our observations in the resistance region. However, the skew in the frequency spectrum (Tajima's D) was consistent with observations. A closer match between simulations and observed data occurred in scenarios where 0.1% of new mutations were recruited by selection with an equilibrium frequency of 0.1 ( fig. 3A). For higher equilibrium frequencies, Tajima's D values were much higher than our observations (supplementary fig. S3, Supplementary Material online), consistent with a stronger skew of the AFS toward higher frequencies. This suggests that balanced polymorphisms in the resistance region may not necessarily reach very high frequencies at the geographical scale considered here, which is consistent with a fast tracking of host genotypes by quickly evolving pathogens that rapidly reduce the selective advantage of the most common resistotypes (Decaestecker et al. 2007).
It is important to note that we scaled down effective population sizes and times by 100 in our simulations to ensure fast running times, which limits the maximum strength of selection that we could simulate. We assumed a selective coefficient of 0.005 for newly established mutations under NFDS before scaling parameters. In actual populations, the strength of selection is likely higher, as P. ramosa castrates the host, reducing the residual fitness of the infected female by about 90% ). Such strong selection should counteract the effects of recombination over large genomic intervals, especially given the large effective population sizes considered here. In that case, an even lower fraction of mutations recruited by selection would be enough to generate the patterns of diversity observed in the resistance region. Although it seems clear that neutrality can be rejected, further simulations at multiple spatial scales would be needed to properly compare various NFDS scenarios. Nevertheless, our simulations suggest that even moderate selection and low equilibrium frequencies can lead to a marked increase in diversity in our system.

The Resistance Region Displays an Excess of Ancient Alleles and Older Coalescence Times
Reduced allele turnover during balancing selection implies that alleles have a longer lifespan. Thus, regions under longterm balancing selection should display older coalescence times and higher local effective population sizes (Charlesworth 2006). Our findings of higher absolute divergence (d XY, fig. 2C) in the resistance region suggest that the alleles in this region are already older than alleles in other parts of the genome. We confirmed this by computing ancestral recombination graphs (ARG) (Rasmussen et al. 2014) on a set of three large scaffolds, including the scaffold with the resistance region, retrieving local genealogies at all nonrecombinant blocks. We restricted the analysis to 48 host clones that had fewer than 5% missing genotypes to achieve a high quality of the estimates and to reduce computational burden (we could not handle with the entire genomes of all genotypes). We also included priors on past changes in effective population sizes based on our demographic analyses. The region displaying a local reduction in F ST also displayed longer times to the TMRCA (coalescence time; fig. 4), further supporting long-term balancing selection at this region. This long coalescence time was not driven by a few older haplotypes, since the half coalescence time (HCT; the minimum time at which half of lineages coalesce) was also substantially older than the background ( fig. 4).
Higher diversity near the resistance region may be caused by the reduced influence of linked selection if recombination rates are particularly high (Charlesworth 2013;Cruickshank and Hahn 2014;Burri 2017). While in the upper range when compared with other scaffolds, recombination rates estimated by ARGWeaver in the resistance region are not extreme ( fig. 4 and supplementary fig. S4, Supplementary Material online), suggesting that higher recombination rates and weaker linked selection do not explain the locally high concentration of windows of high diversity.

Genome Scans for Ancient Balancing Selection Pinpoint Genes Involved in Glycolipid and Glycoprotein Synthesis
It is predicted that the flanking regions surrounding a genomic region under balancing selection will show little signature of balancing selection because older polymorphisms give recombination enough time to erode a signal on the flanking regions (Charlesworth 2006). However, our study found clear evidence of signals of balancing selection in the flanking region of the resistance supergene, hinting that further polymorphisms in these flanking regions might be under balancing selection. Furthermore, an earlier resistance mapping study also suggested that genes outside the resistance supergene are linked to phenotypic variation in resistance Bourgeois et al. . doi:10.1093/molbev/msab217 MBE . We therefore conducted genome scans of balancing selection using recently developed b statistics (Siewert and Voight 2017) to investigate at a higher resolution which regions near the resistance supergene displayed the strongest signals of selection. b should be sensitive to alleles at an equilibrium allele frequency between 0.2 and 0.5, covering the range of equilibrium frequencies explored in our simulations. It should also be robust to the lower sample sizes in the Middle-East and East-Asia clusters (Siewert and Voight 2017). We also expect b to be more powerful than other statistics such as NCD (Bitarello et al. 2018), which mostly target alleles with equilibrium frequencies in the 0.3-0.5 range. We found several signals of balancing selection in the region around the resistance supergene, particularly in the Europeþ lineage where our sample size was largest ( fig.  5A). We then extracted genealogies from the ARGWeaver output for nonrecombining blocks that overlapped the three main peaks identified in the targeted region ( fig. 5B). These genealogies displayed very long branches with very old coalescence times ($2 million generations). For two (#1, #2) of the three peaks of beta score close to the resistance QTL, clones from all three geographic regions were represented in both clades. For the third peak (#3), relatedness between non-European clones coincided with the clusters identified by our DAPC analysis, with individuals from the same genetic group clustering together in the phylogeny. This lack of strong geographic structure was common along the resistance region. Correlations between geographic and phylogenetic distance between clones were weaker for hundreds of ARGWeaver trees randomly sampled in the resistance region than for trees sampled in other scaffolds ( fig. 5C, Wilcoxon rank-sum test, P < 4.3 Â 10 À14 ), consistent with the maintenance of ancestral polymorphism in the resistance region.
We then identified which genes near the resistance region were associated with b scores higher than the top 1% genome-wide (table 1). These genes displayed homology with glucosyltransferases, chitinases, and transcription factors, echoing the function of genes previously found in the two haplotypes of the resistance supergene (Bento et al. 2017). Other peaks on the scaffold outside the resistance supergene itself were mostly concentrated between positions 1,820,000 and 2,000,000 on scaffold00944 and overlapped genes with  (table 1), suggesting that the resistance QTL may belong to a larger region recruited by NFDS. We also found genes annotated as digestive enzymes such as trypsins and serine-proteases, an interesting observation given that P. ramosa starts its infection process in the host's oesophagus and hindgut (Duneau et al. 2011;Bento et al. 2020) and attaches to host cells through collagen-like proteins (Mouton et al. 2009;Andras et al. 2020).

Conclusion
The D. magna-P. ramosa system has become a model for the study of antagonistic coevolution in natural population (Mitchell et al. 2004;Decaestecker et al. 2007; Goren and Ben-Ami 2013; Auld et al. 2016). The discovery of a major resistance supergene in the host genome (Routtu and Ebert 2015;Bento et al. 2017) has allowed us to further explore the evolution of this region and to test whether it is under balancing selection, as the model of Red Queen coevolution would predict. Our study demonstrates that the region around this supergene diversity is indeed maintained through balancing selection. Furthermore, variation in this region is ancestral and older than variation in other regions of the genome. For the first time, our analysis links large-scale phenotypic diversity for parasite resistance with the underlying genomic region for a host-parasite system. As in other systems with balancing selection, such as mating types and incompatibility alleles (Joly and Schoen 2011;Roux et al. 2013), we found high phenotypic and genetic diversity combined with the absence of a large-scale geographic pattern. Textbook examples of balancing selection in genes related to immune function typically lack functional evidence of the interaction between host resistance genes and a specific coevolving parasite. Mechanistically, balancing selection at the Pasteuria resistance locus may be maintained by a matching allele resistance matrix that links host and parasite genotypes on a functional level (Luijckx et al. 2013;Metzger et al. 2016). The host locus underlaying this matching allele matrix is part of the supergene (Bento et al 2017), which is the center of the current study. Earlier studies that provide evidence for balancing selection at disease loci in hosts typically only speculate about the coevolving parasite or assume a community of different parasites that vary in space and time (hence diffuse coevolution, see Ebert and Fields 2020). This study tested for balancing selection in a region of the host genome known to interact specifically with the widespread, virulent bacterial parasite Pasteuria ramosa, but not with other parasites (Routtu and Ebert 2015;Krebs et al. 2017;Keller et al. 2019). Specific coevolution-which we believe explains the polymorphisms at the resistance supergene region examined here-is the heart of the Red Queen hypothesis of antagonistic coevolution, as it was originally proposed by Clarke (although not under the name Red Queen) (Clarke 1976;Hamilton 1980) and taken up by others (Hamilton 1980;Frank 1991;Tellier and Brown 2007). The simplicity of this model, combined with the fascinating complexity produced by the intricate interactions between antagonists, have made it frequently cited for antagonistic coevolution and further, for phenomena linked to coevolution, such as the evolution of genetic recombination (Hamilton et al. 1990;Lively 2010). The Daphnia-Pasteuria system is among the few systems where Red Queen model's assumption of specific coevolution is

Middle East
Position  MBE demonstrated and shown with strong evidence (Decaestecker et al. 2007;Duneau et al. 2011;Luijckx et al. 2011Luijckx et al. , 2013. Our finding of long-term balancing selection on the Pasteuria resistance locus here further reinforces this key prediction of the Red Queen model for specific coevolution, leaving a strong impact on the genome of the host.

Spatial Variability in Resistance Phenotypes
For each of the 125 D. magna clones, seven resistance phenotypes (resistotypes) were obtained using the attachment test (Duneau et al. 2011). Resistance (failure of the parasite to attach to the host cuticle) was coded as "R," and susceptibility (attachment), as "S" (Andras and Ebert 2013;Luijckx et al. 2013, Bento et al. 2020. Resistotypes are defined by the R-S sequence for the seven tests each host clone underwent (at least three replicates per host-parasite combination). Pairwise phenotypic distance between individuals was coded as 0 when resistotypes were the same and 1 when they differed. Pairwise genetic differences between individuals were estimated from genomic data (see below) using the relatedness function in VCFTOOLS v0.1.12b (Danecek et al. 2011), which computes the Ajk statistics (Yang et al. 2010). This statistic should vary between 0 (for pairs of unrelated individuals) and 1 (for an individual with itself). Resistotype frequencies were calculated for 23 population samples (supplementary table S5, Supplementary Material online). Distance measures, including all phenotypes, were estimated as a Euclidean distance, using each resistotype as a distinct dimension. IBD was assessed by Mantel tests with the ecodist package in R (v3.6.3) (Dray and Dufour 2007). dbMEM were also used to assess the geographical variables influencing resistotypes composition at the different spatial scales in our study (Legendre et al. 2015). Resistotypes presence/absence or abundance data were first Hellinger transformed to avoid overweighting rare resistotypes and significant linear trends were removed, using the adespatial package in R.

Whole-Genome Resequencing
Genomic DNA was extracted from the 125 D. magna and one D. similis clone (three times selfed) as in Fields et al. (2015) (see supplementary table S6, Supplementary Material online for details). Individuals were treated with antibiotics to evacuate their guts and reduce DNA from microbiota and food following the protocol of Duki c et al. (2016). DNA was extracted using an isopropanol precipitation protocol. Paired-end 125 cycle sequencing was performed by the NOTE.-For some uncharacterized proteins, a protein-protein BLAST search was performed at https://blast.ncbi.nlm.nih.gov/Blast.cgi to identify possible homologs. In those cases, we report the percentage of coverage, identity, and the species in which the homolog was found. For each gene, we highlight whether it was found in the original resistance QTL (excluding the supergene), in the region around the resistance supergene (QTL 6 100 kb), or elsewhere on scaffold00944. For each candidate, we also indicate the maximum value for the B 0,MAF statistics composite likelihood ratio in each of the three geographic groups (see also fig. 3).
Quantitative Genomics Facility service platform at the Department of Biosystem Science and Engineering (D-BSSE, ETH) in Basel, Switzerland, on an Illumina HiSeq 2000. Read quality was assessed with FastQCv0.11.5 (http://www.bioinformatics.babraham.ac.uk/projects/fastqc, last accessed August 9, 2021), and Trimmomatic (v0.32) (Bolger et al. 2014) was subsequently used to remove low quality bases, sequencing adapter contamination and systematic base calling errors. Sequences were aligned using BWA MEM (v0.7.15) on the D. magna genome assembly (v. 2.4) (Li et al. 2009). We improved the quality of this reference for the resistance region by replacing a part of the original scaffold944 (length ¼ 2,137,955 bp) with a PacBio contig from the same Xinb3 reference (supplementary fig. S1, Supplementary Material online) as described in a previous study (Bento et al. 2017); however, only minor differences between the analyses using the old and the updated reference were observed. We converted coordinates for the PacBio contig into coordinates on scaffold00944 by carrying a BLAST (v2.6.0) search analysis between the two sequences (Altschul et al. 1997). BAM alignment files were filtered for quality, and PCR duplicates were removed using PICARD tools (v 2.0.1, http://broadinstitute. github.io/picard/, last accessed August 9, 2021). SNP calling was performed simultaneously on all samples using freebayes (v. 0.9.15-1). Freebayes is a haplotype caller, which automatically performs indels realignment and base quality recalibration. VCF files were then filtered using VCFTOOLS v0.1.12b (Danecek et al. 2011) to include SNPs with a minimum quality of twenty, a minimum genotype quality of 30, a minimum depth of coverage of 8X/genotype, and a mean maximum sequencing depth of 70Â. We removed polymorphic indels and homopolymers before using the vcfallelicprimitive script in vcflib (https://github.com/vcflib, last accessed August 9, 2021) to convert the haploid calls into pointwise SNPs. Only SNPs that passed filters in at least 90% of samples were included in subsequent analyses (no more than 10% missing data). For simplicity, contigs and scaffolds shorter than 10 kb were excluded from analyses of the genomic background (see supplementary table S6, Supplementary Material online for quality statistics after filtering). Poor mapping could lead to an excess of false-positive polymorphisms and an excess of heterozygotes in the resistance region. We further examined six quality statistics for the whole genome and the resistance region: the mapping quality of the reference and alternate alleles, the proportion of reference and alternate alleles supported by properly paired-ends reads, the ratio between depth of coverage at heterozygous sites normalized by individual depth of coverage, and sequencing depth. No substantial differences were observed between the resistance region and the rest of the genome (supplementary fig. S5, Supplementary Material online).

Structure and Descriptive Statistics
To characterize population structure, we used DAPC in the R package adegenet (v2.1.2) to perform a clustering analysis (Jombart et al. 2010) on a set of 8978 SNPs with no missing data; these were thinned every 1,000 bp to limit the effects of linkage and of variation in SNP density (supplementary fig. S4, Supplementary Material online). DAPC first decomposes the variance in the data set into principal components (PC), then performs a discriminant analysis on these PC to identify the most likely genetic clusters. We selected the clustering model with the highest support using BIC and retained 14 PC that explained about 21.4% of the total variance and two of the linear discriminants. These numbers were chosen through a cross-validation procedure that suggested perfect assignation to clusters with a 0% mean-square error (Jombart and Collins 2015). Tajima's D, F ST and d xy were calculated for nonoverlapping 1-kb windows using the R package PopGenome (v2.2.5) (Pfeifer et al. 2014), and D. similis was used as an outgroup for computing Fu and Li's F, and D. This windows length was chosen to ensure independence between windows, based on the rapid decay in linkage disequilibrium over 1,000 bp in Daphnia genome (supplementary fig. S4, Supplementary Material online). Windows with less than 5 segregating sites were excluded.

Demographic Model
We fitted a demographic model on SNP data using the likelihood algorithm implemented in fastsimcoal2.6 (Excoffier and Foll 2011). The model consisted of one ancestral population that split into three with gene flow, and allowed one population size change after each split to reflect the recent postglacial expansion in D. magna (Fields et al. 2018). The three populations corresponded to the three geographical clusters identified by the DAPC. To obtain accurate spectra and limit the impact of missing data, we used a subset of 48 clones with less than 5% missing data (supplementary table  S7, Supplementary Material online), covering the whole species range as well as common resistotypes. Migration rates, along with current effective population sizes and time since divergence between populations, were estimated from the joint folded AFS with 30 independent runs, and included 2,458,902 SNPs with no missing data. We estimated the total number of callable sites with the coverage tool in BEDTOOLS v2.25.0 to exclude genomic intervals covered at less than 10Â depth in each single individual (Quinlan and Hall 2010). Each run used 40 cycles of likelihood optimization, with 100,000 coalescent simulations per cycle. We present the results from the run with the highest likelihood. Time in years and effective population size were obtained by assuming a mutation rate of 8.96 Â 10 À9 substitution/generation (Ho et al. 2020) and ten generations (asexual and sexual) per year (Haag et al. 2009). The same procedure was applied to 100 bootstrapped frequency spectra to obtain confidence intervals for all parameters.

Forward-in-Time Simulations
To understand how diversity statistics in 1-kb windows may be affected by demography, variable proportion of mutations under balancing selection, and equilibrium frequency, we performed simulations using the forward-in-time simulator MBE assumed a mutation rate of 8.96 Â 10 À9 /generation (sexual þ asexual combined), and a recombination rate of 6.78 Â 10 À8 /sexual generation (Duki c et al. 2016), equivalent to 6.78 Â 10 À9 /generation (sexual þ asexual combined). For scenarios with balancing selection, 0.01% or 0.1% of new mutations were under NFDS. At equilibrium frequency, the selective coefficient s was equal to 0 and varied, so that s ¼ 0.005 * (f eq À f obs )/f eq , where f eq is the equilibrium frequency, and f obs the frequency of the allele at a given generation in a given population. This results in a dynamic where s approaches 0.005 as f obs approaches 0, and -0.005 as f obs approaches 2 * f eq . To shorten run times, we scaled all parameters inferred by fastsimcoal2.6 by a factor of 100: migration, mutation and recombination rates were multiplied by 100, whereas effective population sizes, times in generation, and selection coefficients were divided by the same factor. This scaling maintains constant parameters that control mutation-selection-drift balance, such as Nl, Nm, Nr, and Ns, with l the mutation rate, m the migration rate, r the recombination rate, s the selection coefficient, and N is the effective population size. Simulations were run without any demographic event for 10,000 generations (after scaling) to ensure that mutationselection-drift balance was achieved. We ran 1,000 simulations for each combination of parameters, producing a VCF file for each; summary statistics were computed with PopGenome.

Test for Neutrality and Composite Likelihood Ratio Test for Balancing Selection
To test for deviation from neutrality, we generated 10 million coalescent simulations with fastsimcoal without scaling parameters and converted the outputs into VCF files to obtain statistics with PopGenome. Divergence and diversity statistics were summarized through a PCA using the prcomp function in R. We then predicted PCA scores for windows in the resistance region and for SLiM3 simulations. Envelopes containing 95% of points for each category were obtained using the locfit package in R (https://cran.r-project.org/web/ packages/locfit/index.html). Deviation from neutrality was estimated for each 1-kb window in the genome by counting the proportion of simulations with a higher score on the first PC axis. The resulting P values were Bonferroni-corrected. We also calculated the B 0,MAF statistics (Cheng and Degiorgio 2020) for each of the three geographical groups identified by DAPC. The statistics does not require specifying a window's size. Allele count data were extracted from the VCF file using VCFTOOLS.

Ancestral Recombination Graphs and Alleles Age
We conducted coalescent analyses using ARGweaver (Rasmussen et al. 2014;Hubisz et al. 2020) on the same 48 clones with less than 5% missing data that we used in the demographic analyses (supplementary table S7, Supplementary Material online). ARGWeaver estimates local recombination rates and time since coalescence along the genome by reconstructing genealogies along the genome as well as changes in their branching due to recombination events ARG. To limit computation time, we focused on three scaffolds larger than 2 Mb (PacBio contig þ end of scaf-fold00944, scaffold00512, and scaffold00024) that belonged to distinct linkage groups. We used the VCF file as an input, which makes ARGWeaver estimate the phase for each diploid genome. We used the same mutation and recombination rates as those used in demographic inference and simulations (see above). We also allowed changes in effective population size over time, using results from our fastsimcoal2.6 inference as a prior. We set the number of time points at which coalescence events could happen at 10 and set the maximum coalescence time at 5 million generations. Because we are mostly interested in ancient balancing selection, we also set the -delta parameter at 0.00001 so that coalescence events were less biased toward recent times than with the default value. The algorithm was run over 6,000 iterations and the MCMC chain sampled every 30 iterations. Observation of the likelihood values showed that convergence was achieved after 2,000 MCMC iterations, which were discarded as burn-in. We then extracted time since TMRCA for each nonrecombining block, the minimal time since coalescence for half of the samples, the recombination rate, and the total length of genealogies. To obtain statistics over 1-kb windows, we averaged estimates across nonrecombining blocks using the package regioneR (v1.20.0) (Gel et al. 2015).

Refined Scans for Balancing Selection
For each SNP we computed b score (Siewert and Voight 2017), a statistic that identifies SNPs of allele clusters that segregate at similar frequencies, a pattern associated with long-term balancing selection. The length of the windows we examined around each given SNP was chosen using the formula provided in Siewert and Voight (2017). The distribution of haplotypes sizes is exponential with rate parameter T * q, with T being the time since balancing selection and q the recombination rate. Assuming T ¼ 3 Â 10 6 generations (which is about ten times older than the average coalescence time retrieved by ARGWeaver) and q ¼ 6.78 Â 10 À9 /generation, 95% of haplotypes flanking a selected site should be shorter than 147 bp. We used a window size of 125 bp on each side of each focal SNP (for a total size of 250 bp, optionw 250), which, assuming 10 generations/year, should guarantee the detection of events that occurred in the last 300,000 years. Because alleles with equilibrium frequencies below 0.1 are more likely to be erased by drift, the statistic was not reported for SNPs at frequencies beneath this threshold (option -m 0.1). We performed analyses within each of the three clusters identified by DAPC to minimize confounding effects of population structure and regional adaptation and included all 125 clones. For some candidate regions, we extracted genealogies from the ARGWeaver output that overlapped with the SNP with the highest b score, sampling a random genealogy from the post burn-in MCMC iterations. We also sampled 5,000 random genealogies across the scaffolds, estimated pairwise phylogenetic distances between all pairs of samples using the R package ape, and performed Mantel test between these distance matrices and the matrix of geographical distances between samples. These analyses Bourgeois et al. . doi:10.1093/molbev/msab217 MBE were conducted using the R packages ape (Paradis et al. 2004) and ecodist.

Identification of Candidate Genes
We identified a set of strong candidates for balancing selection by first selecting SNPs and genomic regions in the top 1% for b scores (threshold estimated using all scaffolds larger than 10 kb). A mappability score was estimated using GenMap (Pockrandt et al. 2020) (v1.0.2), with a score of 1 indicating no repetitive sequence at a given position. We replaced regions from scaffold00944 covered by the improved PacBio scaffold. We filtered out regions that had overlapping, repetitive content, that is, sequences of at least 125 bp (125mers, length of a single Illumina read) and scores <1.
We allowed for up to four mismatches between repeated 125-mers. To further eliminate possible issues with copynumber variants that could artificially inflate diversity, we performed a one-tailed test for an excess of heterozygotes in all 125 individuals and removed regions where SNPs harbored P values lower than 1 Â 10 À4 . Windows satisfying these conditions, and genes overlapping them, were extracted using BEDTOOLS (Quinlan and Hall 2010). We also extracted the highest B 0,MAF value at each candidate gene using BEDTOOLS.

Supplementary Material
Supplementary data are available at Molecular Biology and Evolution online.