-
PDF
- Split View
-
Views
-
Cite
Cite
Martin Kapun, Daniel K. Fabian, Jérôme Goudet, Thomas Flatt, Genomic Evidence for Adaptive Inversion Clines in Drosophila melanogaster, Molecular Biology and Evolution, Volume 33, Issue 5, May 2016, Pages 1317–1336, https://doi.org/10.1093/molbev/msw016
Close - Share Icon Share
Abstract
Clines in chromosomal inversion polymorphisms—presumably driven by climatic gradients—are common but there is surprisingly little evidence for selection acting on them. Here we address this long-standing issue in Drosophila melanogaster by using diagnostic single nucleotide polymorphism (SNP) markers to estimate inversion frequencies from 28 whole-genome Pool-seq samples collected from 10 populations along the North American east coast. Inversions In(3L)P, In(3R)Mo, and In(3R)Payne showed clear latitudinal clines, and for In(2L)t, In(2R)NS, and In(3R)Payne the steepness of the clinal slopes changed between summer and fall. Consistent with an effect of seasonality on inversion frequencies, we detected small but stable seasonal fluctuations of In(2R)NS and In(3R)Payne in a temperate Pennsylvanian population over 4 years. In support of spatially varying selection, we observed that the cline in In(3R)Payne has remained stable for >40 years and that the frequencies of In(2L)t and In(3R)Payne are strongly correlated with climatic factors that vary latitudinally, independent of population structure. To test whether these patterns are adaptive, we compared the amount of genetic differentiation of inversions versus neutral SNPs and found that the clines in In(2L)t and In(3R)Payne are maintained nonneutrally and independent of admixture. We also identified numerous clinal inversion-associated SNPs, many of which exhibit parallel differentiation along the Australian cline and reside in genes known to affect fitness-related traits. Together, our results provide strong evidence that inversion clines are maintained by spatially—and perhaps also temporally—varying selection. We interpret our data in light of current hypotheses about how inversions are established and maintained.
Introduction
Inversions are common structural mutations which result in the reversal of gene order in the corresponding genomic region (Sturtevant 1921). Because inversions suppress recombination when in heterozygous state (Sturtevant and Beadle 1936), they have long been thought to be important for evolutionary processes: In speciation because they act as a genetic barrier to gene flow by favoring the accumulation of reproductive isolation genes; in sex chromosome evolution; in meiotic drive; and in adaptive evolution, for example, because they might represent “coadapted gene complexes” (epistatic combinations of beneficial loci), or because they might capture and protect locally adapted loci from recombination with maladaptive immigrant haplotypes (Dobzhansky 1937, 1950, 1970; Noor et al. 2001; Rieseberg 2001; Navarro and Barton 2003; Schaeffer et al. 2003; Hoffmann et al. 2004; Kirkpatrick and Barton 2006; Hoffmann and Rieseberg 2008; Kirkpatrick 2010; Faria et al. 2011; Yeaman 2013).
Indeed, a large body of literature supports a major role for inversions in adaptation. Beginning with the pioneering field surveys and laboratory experiments carried out by Theodosius Dobzhansky and his school in the 1930s and 1940s in Drosophila (Dobzhansky 1937, 1943, 1947a, 1950, 1970), many cases of inversion polymorphism have been found that are consistent with strong natural selection, including steep frequency clines, correlations with environmental (climatic) factors, predictable seasonal changes in inversion frequencies, evidence of epistatic selection, and so forth (Dobzhansky 1943; Dobzhansky and Epling 1944; Krimbas and Powell 1992; Powell 1997; Andolfatto et al. 2001; De Jong and Bochdanovits 2003; Schaeffer et al. 2003; Hoffmann et al. 2004; Hoffmann and Rieseberg 2008; Schaeffer 2008; Rane et al. 2015). The notion that selection can drive changes in inversion frequencies is also supported by a handful of population cage and experimental evolution studies in Drosophila (Wright and Dobzhansky 1946; Dobzhansky 1947b; Dobzhansky 1948; Inoue 1979a; García-Vázquez and Sánchez-Refusta 1988; Kapun et al. 2014). Moreover, a recent study has reported that a common 900-kb inversion polymorphism in humans is under positive selection (Stefansson et al. 2005).
Importantly, an adaptive role of inversions is also underscored by associations between inversion polymorphisms and fitness-related traits. In Drosophila, different inversions have been associated with developmental time, egg-to-adult survival, various size-related traits, fecundity and fertility, stress resistance (to cold, heat, starvation), and lifespan (Sperlich and Pfriem 1986; Hoffmann et al. 2004; Hoffmann and Weeks 2007; Hoffmann and Rieseberg 2008; and references therein). A prominent example is the inversion polymorphism In(3R)Payne in Drosophilamelanogaster which has been linked to clinal variation in body size using genetic association methods (Weeks et al. 2002; Rako et al. 2006; Kennington et al. 2007), a relationship also supported by quantitative trait locus (QTL) mapping (Gockel et al. 2002; Calboli et al. 2003).
Despite these important findings, however, conclusive evidence for a direct role of inversion in adaptation remains scarce in most systems (Hoffmann et al. 2004; Kirkpatrick and Barton 2006; Hoffmann and Rieseberg 2008; Kirkpatrick and Kern 2012); for some notable exceptions see Schaeffer et al. (2003, 2008), White et al. (2007), Ayala et al. (2011), Lowry and Willis (2010), Joron et al. (2011), and Cheng et al. (2012). Although clines are widely considered to be the result of selection (Endler 1977) and can have major effects on fitness components (see above), whether spatially varying (clinal) selection shapes inversions is poorly understood. Similarly, surprisingly little is known about the mechanisms of selection acting on inversions (Kirkpatrick and Barton 2006; Kirkpatrick 2010; Guerrero et al. 2012). Three basic types of selective mechanisms can be distinguished (Kirkpatrick and Barton 2006; Hoffmann and Rieseberg 2008; Kirkpatrick 2010). First, selection can act directly on the inversion due to positive effects of the chromosomal lesion, for example, caused by a beneficial mutation at the breakpoint. Second, an inversion can induce structural problems with meiosis, causing deleterious effects. Third, selection can act indirectly: A new inversion might capture locally adapted alleles and, by suppressing recombination, protect them from maladaptive gene flow (“local adaptation” without epistasis), or it might keep a combination of epistatically favored alleles together (“coadaptation,” local adaptation with epistasis). In principle, these alternatives can be distinguished empirically (Guerrero et al. 2012), but appropriate DNA sequence data to do so are still largely lacking (Cheng et al. 2012; Rane et al. 2015).
An additional complication is that nonadaptive factors such as population structure and demography can also generate clinality, which can hamper adaptive inferences (Endler 1977; Kao et al. 2015; Polechová and Barton 2015). For instance, Bergland et al. (2016) have reported that clines in D. melanogaster can be confounded by admixture and secondary contact with ancestral populations, illustrating the importance of distinguishing demographic versus selective sources of clinality.
Several fundamental but largely unresolved questions can thus be asked about the adaptive nature of inversions: (1) How are clinal gradients of inversion polymorphisms maintained? Are they shaped by selection, or are they due to demography? (2) How persistent are clines in inversion frequencies? (3) Do inversion frequencies change seasonally? If yes, do seasonal changes parallel clinal changes? (4) Which environmental—presumably selective—actors drive the spatio-temporal distribution of inversions? (5) What are the patterns of genetic variation within inversions, and what might they tell us about the mechanisms of selection?
Here we try to shed light on these questions by studying clinal inversions in the fruit fly D.melanogaster, a well-established model system for investigating clinality (David and Bocquet 1975; David and Capy 1988; Lemeunier and Aulard 1992; De Jong and Bochdanovits 2003; Hoffmann and Weeks 2007; Schmidt and Paaby 2008; Fabian et al. 2012, 2015; Kapun et al. 2014; Klepsatel et al. 2014; Adrion et al. 2015). We capitalize on a comprehensive set of whole-genome resequencing data from the well-known North American cline, a broad north–south gradient running along the east coast and within several hundred miles of the eastern seaboard.
The North American latitudinal cline is of particular interest for our purposes, for four reasons. First, four large common cosmopolitan inversions (In(2L)t, In(2R)NS, In(3L)P, In(3R)Payne), as well as several rare cosmopolitan inversions, vary clinally along the North American east coast, with previous records going back to ∼40 years (Mettler et al. 1977; Stalker 1980; Knibb 1982; Lemeunier and Aulard 1992). Second, populations situated along this cline are differentiated for major fitness-related traits, including body size, fecundity, diapause, lifespan, and stress resistance (Coyne and Beecham 1987; Schmidt et al. 2005; Schmidt and Paaby 2008), with the trait clines qualitatively matching the inversion clines. It is thus tempting to speculate that inversion clines might underlie, at least partly, clinal patterns of trait differentiation (De Jong and Bochdanovits 2003; Hoffmann et al. 2004; Fabian et al. 2012). Third, many aspects of the North American cline are naturally replicated and qualitatively mirrored by a latitudinal cline running along the east coast of Australia (Hoffmann and Weeks 2007), thus providing an ideal parallel system for comparison. Fourth, several recent studies have provided genomic descriptions of the North American and Australian clines (Turner et al. 2008; Kolaczkowski et al. 2011; Fabian et al. 2012; Bergland et al. 2014; Reinhardt et al. 2014; Zhao et al. 2015); for instance, in previous work we have used next-generation sequencing data to study clinal inversions in North America (Fabian et al. 2012; Kapun et al. 2014), but the spatio-temporal resolution of our previous sampling and analysis was limited.
In this study, we examine the spatio-temporal dynamics of inversions by analyzing the largest genomic data set available for the North American cline to date, based on whole-genome sequencing of pools of individuals (Pool-seq) (Schlötterer et al. 2014). Our data set consists of 28 Pool-seq samples collected from 10 populations spanning the entire cline, from southern Florida to Maine, including temporal (summer and fall) samples, collected by the Drosophila Real Time Evolution Consortium (Dros-RTEC; 12 samples; unpublished); Bergland et al. (2014) (14 samples); and Fabian et al. (2012) (2 samples) (supplementary table S1, Supplementary Material online). By applying diagnostic, inversion-specific single nucleotide polymorphism (SNP) markers (Kapun et al. 2014) to these data, we estimate the frequencies of seven cosmopolitan inversions (In(2L)t, In(2R)NS, In(3L)P, In(3R)C, In(3R)K, In(3R)Mo, In(3R)Payne) (Lemeunier and Aulard 1992) and track their spatio-temporal dynamics. We complement our analysis with genomic data from the Australian cline (Reinhardt et al. 2014), and by integrating records of inversion frequencies from nonsequenced North American populations based on cytological karyotyping, spanning ∼40 years of observation.
We had six specific objectives. (1) We aimed to quantify the extent of clinality of inversions along the North American east coast: By comparing our results with previous records we address whether the clines have been maintained over time, presumably due to spatially varying selection. (2) We asked whether inversions undergo seasonal changes, potentially indicating the action of temporally varying selection (Dobzhansky 1943; Knibb 1986; Sanchez-Refusta et al. 1990), and whether local seasonal fluctuations might mirror clinal changes in inversion frequency across latitude (Rhomberg and Singh 1988; Lemeunier and Aulard 1992). (3) To understand the selective factors that might shape inversion clines, we aimed to identify environmental variables that predict changes in inversion frequencies. (4) To determine whether inversion clines are maintained by selection or nonadaptive demographic factors, we tested whether genetic differentiation in inversion frequencies is likely due to neutrality and/or admixture. (5) Using genome-wide regression analysis, we tested for enrichment of clinal and seasonal SNPs within inversions and identified SNPs that are presumably in linkage disequilibrium (LD) with them. Finally, (6), we aimed to interpret our results in light of current hypotheses that seek to explain how inversions are established and maintained (Kirkpatrick and Barton 2006; Kirkpatrick 2010).
Results and Discussion
Clinal and Seasonal Changes in Inversion Frequencies
Latitudinal Clinality
We first examined the clinality of seven polymorphic inversions (In(2L)t, In(2R)NS, In(3L)P, In(3R)C, In(3R)K, In(3R)Mo, In(3R)Payne) (Lemeunier and Aulard 1992) along the North American east coast (fig. 1A–C, supplementary fig. S1A–D and Supplementary Data, Supplementary Material online). With the exception of the common cosmopolitan inversions In(2L)t and In(3R)Payne, which occurred at frequencies >20%, the other inversions segregated at lower frequencies (fig. 1A–C and supplementary fig. S1A–D, Supplementary Material online). In particular, the rare cosmopolitan inversions In(3R)C and In(3R)K exhibited frequencies <10%, or were completely absent in most populations (supplementary fig. S1B and C and Supplementary Data, Supplementary Material online), similar to earlier records (Mettler et al. 1977). Average inversion frequency differences between the clinal endpoints (Florida, Maine) ranged between 2% for In(3R)C and 39% for In(3R)Payne (supplementary table S2, Supplementary Data online).
Clines of In(2L)t, In(3L)P, and In(3R)Payne along the North American east coast, between seasons and sampling decades, and as compared with neutral and admixture expectations. (A–C) Clines and seasonal phase clines. Inversion frequency estimates for summer (black) and fall (blue) collections along the North American east coast (supplementary table S1, Supplementary Material online). Regression lines shown in red are based on lumping sampling seasons (supplementary table S3, Supplementary Material online). (D–F) Temporal stability of clines. Combined inversion frequency estimates across four decades of observation, using data from Mettler et al. (1977) (and references/sources cited therein; see Materials and Methods) (black), Sezgin et al. (2004) (blue), and this study (red; supplementary table S1, Supplementary Material online). Also see supplementary table S4, Supplementary Material online. (G–I) Clines versus neutral expectation. Inversion frequencies averaged across seasons for each sampling locale (red) and average frequencies of 9,996 putatively neutral, genome-wide SNPs located in short introns (black) (conditioned to have zero frequency in the northernmost population, except for In(3R)Mo for which frequencies were conditioned to be zero in the southernmost population) (supplementary table S8, Supplementary Material online). (J–L) Clines versus admixture expectation. Inversion frequency estimates averaged across seasons for each sampling locale (red) as compared with inversion frequencies expected under admixture from Africa and Europe (black) (supplementary table S9, Supplementary Material online). For a corresponding figure for In(2R)NS, In(3R)C, In(3R)K, and In(3R)Mo, see supplementary figure S1, Supplementary Material online. All curves shown in (A–L) represent logistic curves based on parameters from binomial GLMs. Note that while we plot logistic curves in (D–F), clinal stability was tested using homogeneity of slopes tests in ANCOVA, which is based on linear regression. Binomial standard errors for inversion frequency estimates are given in supplementary table S1, Supplementary Material online.
Consistent with—but not proof of—spatially varying selection, we detected significant negative correlations with latitude for In(3L)P and In(3R)Payne (fig. 1B and C and supplementary table S3, Supplementary Material online), in good agreement with previous observations from the North American cline (Mettler et al. 1977; Knibb 1982; Fabian et al. 2012; Kapun et al. 2014) and the Australian cline (Knibb et al. 1981; Knibb 1982). The perhaps most parsimonious interpretation is that the clines in In(3L)P and In(3R)Payne are maintained by selection, especially in view of the fact that their directionality is reversed between the two continents, as expected from the inverted climatic gradients across latitude in the northern and southern hemisphere; for a possible alternative scenario based on admixture, see Bergland et al. (2016) and below.
For the rare cosmopolitan inversion In(3R)Mo, we found a positive association with latitude, confirming the findings of Kapun et al. (2014) (supplementary table S3, Supplementary Material online). Interestingly, this inversion was very rare and nonclinal in North America ∼40 years ago (Langley et al. 1977; Mettler et al. 1977), but has recently been found to have increased in frequency up to ∼18% in Raleigh (North Carolina) (Langley et al. 2012), suggesting a very rapid change in frequency along the North American east coast (Kapun et al. 2014). This strong—as of yet unexplained—change in the clinality of In(3R)Mo deserves further study; whether it has been caused by a recent shift in selection pressure and/or demography is unclear. Our previous observation that In(3R)Mo has increased—from 5% to 25%—under conditions of cold adaptation in an experimental evolution experiment is consistent with the directionality of this cline and the idea that this inversion can respond to climatic selection in a short amount of time (Kapun et al. 2014). The other two rare cosmopolitan inversions, In(3R)C and In(3R)K, were clearly nonclinal in our analysis (supplementary fig. S1B and C, Supplementary Material online), which matches well with previous data (Mettler et al. 1977).
Seasonal Phase Clines
In contrast to older records (Mettler et al. 1977; Knibb 1982), we did not observe a main effect of latitude for In(2L)t and In(2R)NS. Instead, we found that the steepness of the clinal slopes of these inversions changes between summer and fall: The slope for In(2L)t was uncorrelated with latitude in summer (fig. 1A and supplementary table S3, Supplementary Material online) but negatively correlated in fall, whereas for In(2R)NS the slope was positive in summer but negative in fall (supplementary fig. S1A and Supplementary Data, Supplementary Material online). A similar effect was seen for In(3R)Payne, in addition to a strong effect of latitude itself, with the clinal slope being less negative in summer than in fall (fig. 1C and supplementary table S3, Supplementary Material online).
These patterns are reminiscent of “seasonal phase clines” whereby each local population undergoes seasonal change, but the onset of the seasonal cycle varies latitudinally (Roff 1980; Rhomberg and Singh 1988). Our results, especially those for In(2R)NS—for which we see a seasonal reversal of the sign of the clinal slope (supplementary fig. S1A, Supplementary Material online)—support the idea that northern (standard) arrangements might be favored during colder periods (fall-to-summer), whereas southern (inverted) arrangements might be favored during warmer periods (summer-to-fall), suggesting that local seasonal changes in frequency might mirror those across latitude (Lemeunier and Aulard 1992).
Effects consistent with such a scenario have been found by Stalker (1980) who tracked inversion frequencies in D. melanogaster in Missouri and Texas across years. More recently, the same phenomenon has been reported by Cogni et al. (2015) for SNPs residing in 46 metabolic genes: Across the North American cline, northern alleles are favored in spring but decrease in frequency at the expense of southern alleles from spring to fall, with this effect being stronger in northern populations than southern populations. Although this specific observation is unlikely to be driven by inversions (Cogni et al. 2015), these data do not exclude the possibility that inversions cycle seasonally.
Our finding that the clinal slopes for In(2L)t and In(3R)Payne are more shallow in summer than fall might also be explained by differences in voltinism across latitude (Levy et al. 2015); as compared with oligovoltine northern populations, the growing season in Florida is much longer and populations are polyvoltine, thus possibly allowing inverted arrangements more time to increase in frequency.
Two important caveats must be made, however: First, confirming the patterns found here will require additional temporal samples in the future; second, the seasonal changes in clinal slopes and inversion frequencies we have observed are small. On average, phase cline frequency differences between seasonal samples collected from the same location and year ranged between 1% for In(3L)P and In(3R)C and 5% for In(2L)t (supplementary table S2, Supplementary Material online). Thus, clinal frequency changes were much larger than seasonal phase cline shifts: The percent difference between clinal changes and seasonal phase cline shifts ranged from 3% for In(3R)K and 44% for In(2R)NS to ∼100% or greater for In(2L)t, In(3L)P, In(3R)C, In(3R)Mo, and In(3R)Payne (supplementary table S2, Supplementary Material online).
Local Seasonality
Although our inference regarding seasonal changes in clinality is somewhat limited by the relatively small number of temporal samples, a strong prediction can be made from the seasonal phase cline model, namely that the existence of seasonal phase clines requires “local” seasonal changes in inversion frequencies (Rhomberg and Singh 1988).
We tested this prediction by analyzing local seasonal changes in inversion frequencies in a temperate population from Linvilla Orchards (Media, PA), sampled 10 times during 4 consecutive years (2009–2012; 4 summer samples, 6 fall samples; supplementary table S1, Supplementary Material online). For the majority of the seven inversions, frequencies fluctuated erratically over time (fig. 2). However, for two of the three inversions that appeared to exhibit a seasonal phase cline, In(2R)NS and In(3R)Payne, we detected significant seasonal fluctuations across years, with the frequencies on average increasing from summer-to-fall but declining from fall-to-summer (fig. 2 and supplementary table S3Supplementary Material online). In particular for In(2R)NS, these local fluctuations qualitatively parallel the pattern seen for the seasonal phase cline (Rhomberg and Singh 1988; Lemeunier and Aulard 1992; Cogni et al. 2015).
Local seasonal fluctuations of inversion frequencies in a temperate orchard population in Pennsylvania across 4 years of observation. Seasonal fluctuations of inversion frequencies in the Linvilla Orchards population (Media, PA) between 2009 and 2012. Summer collections shown in red, fall samples in black. Asterisks denote significant effects of local seasonality: *P < 0.05; **P < 0.01; ***P < 0.001. Also see figure 3 and supplementary table S3, Supplementary Material online. Binomial standard errors for inversion frequency estimates are given in supplementary table S1, Supplementary Material online.
Although these results might be expected under the seasonal phase cline model, the caveat remains that the temporal fluctuations we have observed are relatively small; in fact, they were not detected in a previous study of the same population based on fewer temporal samples (Bergland et al. 2014). Seasonal changes between summer and fall ranged on average from 0% for In(3L)P and In(3R)C, 2% for In(3R)Payne, to 3% for In(2R)NS; Cohen’s standardized effect size d (Cohen 1988) ranged from small (d < 0.5; In(3L)P, In(3R)C, In(3R)Mo) to intermediate (d > 0.5; In(3R)K, In(3R)Payne) to large (d > 0.8; In(2R)NS) (supplementary table S2, Supplementary Material online). Despite these relatively small effects, however, three arguments suggest that the seasonal changes might be real, at least for In(3R)Payne and In(2R)NS. The first is that pervasive seasonal changes in allele frequencies, although not in inversion frequencies, have already been reported for the Linvilla Orchards population (Cogni et al. 2013, 2015; Bergland et al. 2014). Second, flies from this population exhibit predictable seasonal changes in several life history and stress resistance traits (Behrman et al. 2015). Third, seasonal fluctuations of inversions are rather common in several Drosophila species, including D. melanogaster (Dobzhansky 1948, 1970; Dobzhansky and Ayala 1973; Stalker 1980; Sperlich and Pfriem 1986; Krimbas and Powell 1992; Rodríguez-Trelles et al. 1996). Importantly, seasonal changes have already previously been documented for In(2R)NS in Japanese and Australian populations (Inoue 1979b; Knibb 1986) and for In(3R)Payne in Spain, Egypt, and Japan (Inoue 1979b; Masry 1981; Sanchez-Refusta et al. 1990).
Although we cannot rule out that the seasonal changes we have observed are due to drift (e.g., caused by cyclic population booms and busts) or migration from neighboring populations, genome-wide analysis of the same Pennsylvanian population over three consecutive years as well as simulation models suggest that seasonal changes in SNP frequencies in that population are unlikely due to drift, migration, or recolonization (Bergland et al. 2014). Thus, although our data are not conclusive and the observed effects are small, they might be consistent with inversion polymorphisms being maintained by temporally varying selection (Dobzhansky 1943, 1970; Lewontin 1974; Endler 1986; Krimbas and Powell 1992).
Temporal Stability of Clines
Inversion polymorphisms can remain remarkably stable over time: In Drosophilapseudoobscura, the frequencies of many inversions have remained unchanged for >40 years in 48 North American populations (Anderson et al 1991). Similarly, in D. melanogaster, long-term records from Australasia indicate that the clinal frequencies of In(2L)t and In(3L)P have remained temporally stable, suggesting that they are maintained by spatially varying selection (Knibb et al. 1981; Knibb 1982; Anderson et al. 1987; Umina et al. 2005; Kennington and Hoffmann 2013). In contrast, the steep latitudinal cline of In(3R)Payne along the Australian east coast has shifted in position (latitudinal intercept) across a time span of 20 years, presumably due to changes in climatic conditions (Anderson et al. 2005; Umina et al. 2005). Similar shifts have been found in Drosophila robusta and Drosophila subobscura (Levitan 2003; Balanyà et al. 2006; Rodríguez-Trelles et al. 2013). Much less is known, however, about the temporal stability of inversion clines along the North American east coast, except for qualitative observations (Sezgin et al. 2004; Kapun et al. 2014).
To test whether inversion clines in North America have remained temporally stable, we combined our data with older estimates from the literature, with the total records spanning ∼40 years (Mukai et al. 1974; Stalker 1976; Langley et al. 1977; Mettler et al. 1977; Mukai and Voelker 1977; Sezgin et al. 2004). We focused on In(2L)t, In(3L)P, and In(3R)Payne, for which we had sufficient data to examine temporal dynamics and to allow comparisons with data from Australasia. For all three inversions, and averaging across all estimates, we detected a highly significant negative association with latitude (fig. 1D–F, supplementary table S4, Supplementary Material online), in agreement with previous data from North America and Australasia (Mettler et al. 1977; Knibb et al. 1981; Knibb 1982; Anderson et al. 1987; Umina et al. 2005; Kennington and Hoffmann 2013). For In(2L)t and In(3L)P we found evidence that their clines have changed across decades, unlike the situation in Australia: The cline in In(2L)t has shifted upwards in latitude (intercept) over time (significant effect of sampling “decade”) but with no change in slope (nonsignificant effect of “latitude × decade”) (fig. 1D and supplementary table S4, Supplementary Material online), whereas the cline in In(3L)P has shifted downwards and become more shallow (fig. 1E and supplementary table S4, Supplementary Material online). In sharp contrast, the steep latitudinal cline of In(3R)Payne in North America has remained invariant across ∼40 years, without any changes in clinal slope or intercept (fig. 1F and supplementary table S4, Supplementary Material online). Thus, unlike the situation in Australia (Anderson et al. 2005; Umina et al. 2005), the North American cline in In(3R)Payne might be maintained by persistent spatially varying selection (Endler 1986), either independent of recent changes in climatic conditions in North America or due to potential differences in climate change between the two continents. In(2L)t and In(3L)P, on the other hand, might be more sensitive to environmental change or subject to different selection pressures than In(3R)Payne.
Effects of Climatic Factors on Inversion Frequencies
Inversion clines are often strongly correlated with climatic factors that covary with latitude, most prominently with temperature, but also with rainfall and humidity (Krimbas and Powell 1992). Presumably these factors represent—directly or indirectly—the selective agents underlying clinal adaptation (Knibb 1982; Inoue et al. 1984; Knibb 1986; Anderson et al. 1987; Krimbas and Powell 1992; Bradshaw and Holzapfel 2006; Balanyà et al. 2009; Fabian et al. 2015). The best evidence comes from D. subobscura, where inversion frequencies have been found to shift rapidly and predictably in response to environmental gradients when this ancestrally European species colonized new areas (e.g., North and South America) or when local climatic conditions changed (Krimbas 1992; Menozzi and Krimbas 1992; Balanyà et al. 2006,, 2009; Rodríguez-Trelles et al. 2013). Similarly, in D. melanogaster, the clinal frequencies of In(2L)t, In(2R)NS, In(3L)P, and In(3R)Payne in North America, Asia, and Australasia have been found to be correlated with maximum/minimum temperature and maximum/minimum rainfall, with the details depending on the inversion (Knibb 1982). However, still little is known about the environmental (selective) factors that underlie inversion clines (Lemeunier and Aulard 1992).
Given its tight relationship with latitude, temperature is often thought to be the dominant factor in shaping clines; however, inversion clines often seem to be affected by multiple climatic factors (Knibb 1982; Krimbas and Powell 1992; Menozzi and Krimbas 1992; Hoffmann et al. 2003; Umina et al. 2005; Hoffmann and Weeks 2007). For example, a combination of climatic variables seems to underlie the temporal shift of the In(3R)Payne cline in eastern Australia, because no single factor could fully account for the results (Umina et al. 2005). Similarly, north–south clines in D. subobscura are best explained by a combination of climatic predictors, suggesting that the selective agent is related to temperature but not temperature itself (Krimbas 1992; Menozzi and Krimbas 1992). Likewise, thermal experimental evolution experiments in D. subobscura and D. melanogaster have failed to replicate clinal patterns observed in natural populations, indicating that temperature is unlikely to be the sole selective factor driving clinality (Santos et al. 2004; Kellermann et al. 2015); for a critical discussion see Huey and Rosenzweig (2009).
Preliminary analyses of our data based on single regressions of inversion frequencies against four climatic predictors (minimum temperature of the coldest month; maximum temperature of the hottest month; average precipitation of the driest month; and average precipitation of the wettest month; Knibb 1982) suggested that inversion frequencies are affected by multiple climatic factors (not shown). We thus decided to analyze the joint effects of multiple climatic variables on inversion frequencies, thereby accounting for their potential intercorrelations (multicollinearity). To do so, we applied principal component analysis (PCA) to all 19 variables from WorldClim (Hijmans et al. 2005), followed by linear regression of inversion frequencies against the principal components (PCs). We restricted our analysis to the first three PCs which together explained >80% of the total variance. PC1 explained 51.53% of the variance in the climate data and was highly correlated (r ≥ 0.90 or r ≤ −0.90) with 10 of the 19 variables (supplementary table S5, Supplementary Material online), demonstrating the nonindependence of predictors. Seven of the 10 variables are related to temperature, while the remaining three are related to precipitation (supplementary table S5, Supplementary Material online). PC1 showed a strongly negative correlation with latitude (r = 0.88, P < 0.001); in contrast, PC2 and PC3 only explained 20.94% and 13.12% of the variance, respectively, and were not correlated with latitude. Accordingly, the frequencies of In(2L)t, In(3L)P, and In(3R)Payne were strongly positively correlated with PC1, but not with PC2 and PC3 (supplementary table S6, Supplementary Material online). Our results indicate that inversion clines are likely driven by an interplay of factors related to temperature and precipitation; at the same time, they illustrate the difficulty in isolating potentially causal selective factors by statistical analysis alone. A closer look at the loadings of PC1 suggests that PC1 is overall positively correlated with measures of temperature and precipitation, but negatively correlated with temperature seasonality and temperature annual range (supplementary table S5, Supplementary Material online). Thus, the frequencies of inverted arrangements for In(2L)t, In(3L)P, and In(3R)Payne tend to covary positively with most measures of temperature and precipitation, whereas temperature dispersion (range) and seasonality appear to favor higher frequencies of the standard arrangements (supplementary table S6, Supplementary Material online).
These data are in qualitatively good agreement with those of Knibb (1982), who—for a similar latitudinal range in North America—generally detected positive relationships between inversion frequencies and climatic predictors, in particular for annual maximum temperature, minimum temperature, and minimum rainfall; the same general pattern was found for Australasia and Asia. However, when comparing climate-inversion associations across the three geographic areas, none of the inversion frequencies were associated with the same climatic factor in all three regions. Climatic correlations were overall highest in Australasia, intermediate in North America, and lowest in Asia: This heterogeneity might be due to the fact that the samples from the three regions differed in terms of their proximity to the equator and in terms of mean values and ranges of climatic variables (Knibb 1982). Clearly, future work will be required to better understand these patterns and to what extent they are consistent or not across broad geographic areas.
A potential problem with classical regression is that associations between genetic variants and environmental variables can suffer from high false positive rates because effects of predictors on variant frequencies might be confounded by hidden population structure and/or isolation by distance (IBD) (Meirmans 2012; Frichot et al. 2013). We thus accounted for population structure by applying latent factor mixed models (LFMMs) (Frichot et al. 2013) to a combined data set consisting of inversion-specific SNP markers (Kapun et al. 2014) and a genome-wide panel of ∼10,000 putatively neutral SNPs located in small introns (Clemente and Vogl 2012). We detected positive LFMM associations with PC1 for In(2L)t and In(3R)Payne, but failed to confirm the effects on In(3L)P seen with classical regression (supplementary table S6, Supplementary Material online), perhaps due to population structure affecting this inversion. In contrast, we did not find any significant LFMM associations between climatic predictors and the frequencies of neutral markers (supplementary table S6, Supplementary Material online). Climatic factors related to temperature and rainfall thus robustly predict the frequencies of In(2L)t and In(3R)Payne, independent of population structure. The fact that neutral SNPs do not exhibit these associations suggests that the examined climatic predictors—or other variables correlated with them—are causally related to the selective agent(s) underlying clinality of In(2L)t and In(3R)Payne. These findings, especially the negative result for In(3L)P, underscore the importance of accounting for population structure and demography when analyzing the effects of environmental variables on the distribution of genetic variants (Frichot et al. 2013).
We also tested the effects of climatic variables (monthly temperature minima, maxima, precipitation) on seasonal changes in inversion frequencies in the temperate Pennsylvanian population by using generalized estimating equations (GEEs) to account for temporal autocorrelations. For In(2R)NS, we found negative correlations between seasonal frequencies and both temperature minima and maxima across years (supplementary table S7, Supplementary Material online). Remarkably, temporal changes in frequency are in almost perfect antiphase relative to changes in temperature (fig. 3): In fall the postsummer frequency of In(2R)NS is high but temperatures have dropped already, whereas in summer the postwinter/-spring frequency is still low while temperatures have already reached a high. This might support the idea that In(2R)NS exhibits a seasonal phase cline, whereby in temperate locales selection favors the northern (standard) arrangement during the colder period but the southern (inverted) arrangement during the warmer period (Lemeunier and Aulard 1992; Cogni et al. 2015).
Local seasonal changes in In(2R)NS as compared with changes in temperature. Seasonal fluctuations of In(2R)NS in the Linvilla Orchards population (Media, PA) across four consecutive years, 2009–2012 (summer: red dots, fall: black dots), contrasted with changes in monthly average minimum (blue) and maximum (green) temperatures. Also see figure 2 and supplementary table S3, Supplementary Material online. Binomial standard errors for inversion frequency estimates are given in supplementary table S1, Supplementary Material online.
Adaptive versus Demographic Effects on the Distribution of Inversions
Clinality versus Neutrality
So far we have provided three pieces of evidence suggesting that several of the seven inversions might be subject to selection: (1) Clinality (including stability of the In(3R)Payne cline) and/or the existence of seasonal phase clines; (2) local seasonality; and (3) strong correlations with multiple climatic factors related to temperature and precipitation, independent of population structure. However, formally demonstrating an effect of selection requires that inversions behave at odds with neutral expectations (Ayala et al. 2011; Guerrero et al. 2012). We therefore tested whether genetic differentiation in inversion frequencies differs from neutrality by comparing inversion-specific SNPs and a genome-wide panel of ∼10,000 presumably neutral SNPs in short introns outside inversions (Clemente and Vogl 2012).
On average, the frequencies of neutral SNPs were not significantly correlated with latitude (average correlation coefficient r = 0.18; average P >0.05) (fig. 1G–I, supplementary fig. S1E–H and Supplementary Data, Supplementary Material online). Consistent with previous work (Caracristi and Schlötterer 2003), this implies that IBD for neutral SNPs is rather small and indicates that gene flow may be sufficiently strong to homogenize neutral variation across the North American cline. For inversion-specific SNP markers, on the other hand, frequencies were negatively correlated with latitude for In(2L)t, In(3L)P, and In(3R)Payne (fig. 1G–I and supplementary table S8, Supplementary Material online). To test for differences between clinal inversion patterns and neutrality, we compared the bootstrapped distributions of regression slopes between inversion-specific SNPs and neutral SNPs. For In(2L)t and In(3R)Payne—but not for In(3L)P—the distribution of clinal slopes departed clearly from neutral expectations (supplementary fig. S2 and Supplementary Data, Supplementary Material online). As shown in figure 1, the clinal slopes of In(2L)t and In(3R)Payne are much steeper than the neutral cline (fig. 1G and I), whereas for In(3L)P the inversion and neutral clines parallel each other (fig. 1H). Interestingly, using several neutrality tests (Tajima, Fu and Li, Hudson-Kreitman-Aguadé test [HKA]), Hasson and Eanes (1996) also failed to detect deviations from neutrality for In(3L)P. Our results thus corroborate the hypothesis that the clines in In(2L)t and In(3R)Payne are subject to spatially varying selection, which also agrees with our LFMM analysis above.
Clinality versus Admixture
Two recent studies suggest that admixture along the North American east coast contributes to clinality, either independent of or coincident with clinal selection, due to secondary contact of North American populations with ancestral African and European populations (Kao et al. 2015; Bergland et al. 2016 also see Caracristi and Schlötterer 2003; Duchen et al. 2013). Specifically, Bergland et al. (2016) found that the proportion of African ancestry is negatively correlated with latitude, whereas the proportion of European admixture is positively correlated with latitude, thus generating an “ancestry cline.” This prompted us to ask whether such a demographic process might be able to explain the clinality of inversions in our data.
Our analysis is necessarily preliminary: The true sources of influx of European and African variants into North America remain unknown, and the same is true for the details of the demographic history of North American populations, for example, the question of whether immigrating genotypes were subject to bottlenecks (Kao et al. 2015; Bergland et al. 2016). Based on estimates of 1) inversion frequencies from populations in Africa (the ancestral origin of D. melanogaster and many of its inversions; Aulard et al. 2002; Corbett-Detig and Hartl 2012) and Europe and 2) ancestry and admixture for six populations along the North American cline (Bergland et al. 2016), we calculated inversion frequencies expected under admixture (supplementary table S9, Supplementary Material online). To determine whether there are differences in slope between observed and expected clinal gradients, we used homogeneity-of-slopes tests in analysis of covariance (ANCOVA) (details not shown).
For most inversions, the observed slopes could not be distinguished from the slopes expected under admixture (fig. 1J–L and supplementary fig. S1I–L, Supplementary Material online), highlighting that accounting for admixture is important when analyzing clinal patterns (Kao et al. 2015; Bergland et al. 2016). In contrast, for In(2L)t, In(3L)P, and In(3R)Payne, we found significant differences (P < 0.05) between observed and expected clinal slopes (fig. 1K–L). The difference was particularly pronounced for In(3R)Payne: In Florida, its frequency was considerably higher than expected, while in Maine its frequency was markedly lower than expected under admixture (fig. 1L).
Thus, our preliminary analysis suggests that admixture contributes little—if anything—to the clines in In(2L)t, In(3L)P, and In(3R)Payne, but that this demographic process might make an important contribution to the clinality of several other inversions (Bergland et al. 2016).
Genetic Variation Associated with Inversions
Enrichment of Clinal and Seasonal SNPs within Inversions
To identify inversion-associated SNPs that underlie clinality and/or seasonality, we used genome-wide binomial generalized linear models (GLMs) to test for enrichment of the top 0.1% of the most strongly clinal and/or seasonal SNPs within inversions (fig. 4 and supplementary tables S10 and S11, Supplementary Material online).
Genome-wide distribution of SNPs associated with clinality and/or seasonality. Manhattan plots showing the genome-wide distribution of −log10 transformed P values based on SNP-wise GLMs testing for associations of allele frequencies with latitude, season, and the latitude by season interaction. The bottom subfigure shows GLM results for the effect of local seasonality in the temperate Linvilla Orchards population (Media, PA). The top 0.1% of the most significant SNPs for each main effect is shown in red. Inversion breakpoints for each chromosomal arm are indicated by black boxes, with the four partially overlapping inversions on 3R being delineated as follows: In(3R)C, dashed line; In(3R)K, dotted line; In(3R)Mo, solid line; In(3R)Payne, dashed-dotted line. For details of clinal and/or seasonal enrichment of SNPs within inversions see supplementary table S10, Supplementary Material online; for a corresponding list of inversion-associated genes, see supplementary table S11, Supplementary Material online.
We found highly significant overrepresentation of clinal SNPs on chromosomal arm 3R (700 of 720 clinal SNPs ≈ 97%, P < 0.001) and within the region spanned by In(3R)Payne (568/720 SNPs ≈ 79%, P < 0.001) (fig. 4 and supplementary tables S10 and S11, Supplementary Material online), similar to our previous findings based on three populations along the North American cline (Florida, Pennsylvania, Maine) (Fabian et al. 2012). Specifically, clinally varying candidate SNPs were overrepresented within In(3R)Payne, but not in the regions of the partially overlapping inversions In(3R)K and In(3R)Mo (fig. 4 and supplementary table S10, Supplementary Material online). We also detected enrichment in In(3R)C; however, because this arrangement partially overlaps with In(3R)Payne, we cannot exclude that this is due to their overlap. In general, we failed to detect enrichment of clinal SNPs within any inversions other than In(3R)Payne (and/or In(3R)C).
The strong and extremely localized overrepresentation of clinal SNPs within the region spanned by In(3R)Payne deviates markedly from random expectations: While part of this enrichment might be due to strong LD and genetic draft within the inversion, our finding indicates that in North American D. melanogaster In(3R)Payne is “the” dominant driver of clinality (Fabian et al. 2012). This also agrees well with the situation in Australia where In(3R)Payne exhibits a strong, parallel latitudinal cline; has undergone a recent upward shift in clinal frequencies in response to climate change; contains a large number of clinal variants; is strongly correlated with the body size cline; and has indeed been shown to affect body size (Weeks et al. 2002; Rako et al. 2006; Kennington et al. 2007; Kolaczkowski et al. 2011; Fabian et al. 2012; Reinhardt et al. 2014; Rane et al. 2015). Although recent evidence indicates that both the North American and the Australian cline are subject to admixture from Africa and Europe (even though the pattern appears to be weaker for Australia; Bergland et al. 2016), thus potentially explaining part of the clinal parallelism between the two continents, our results suggest that the North American cline in In(3R)Payne is not due to admixture.
We next examined seasonally varying SNPs within inversions across all genomic data for the entire cline. We detected significant but numerically weak enrichment of seasonal SNPs within In(3L)P and In(3R)Payne, as well as within In(3R)C, In(3R)K, and In(3R)Mo (fig. 4 and supplementary tables S10 and S11, Supplementary Material online). For the region spanned by In(3R)Payne (and its partly overlapping inversions), but not for In(3L)P, this matches well with our finding of local seasonal changes. Similarly, SNPs that represent the latitude by season interaction were enriched within In(2L)t, In(2R)NS, In(3R)Payne, In(3R)K, and In(3R)Mo (fig. 4 and supplementary tables S10 and S11, Supplementary Material online), in support of the existence of seasonal phase clines. Finally, when we restricted our analysis to the temperate orchard population from Pennsylvania, we found enrichment of locally fluctuating seasonal SNPs in In(2L)t and In(3R)Payne, but not for In(2R)NS (supplementary table S10, Supplementary Material online). At least for In(3R)Payne, this is consistent with our observation that this arrangement undergoes local seasonal changes in frequency. Again, we note that we cannot properly distinguish among In(3R)C, In(3R)K, In(3R)Mo, and In(3R)Payne in these analyses, due to their partial overlap. Bergland et al. (2014) also found marginal enrichment of seasonal SNPs in the region spanned by In(3R)Mo. Together with our data on seasonal phase clines and local frequency changes, our analysis lends further credence to the hypothesis that several inversions fluctuate seasonally in North American D. melanogaster, across the entire cline as well as locally.
Genome-Wide Associations with Inversions
To identify SNPs that are statistically associated with inversions and thus presumably in LD with them, we applied genome-wide linear regression to all populations along the North American cline, but excluding the sample from Raleigh (North Carolina; based on the DGRP lines; see Materials and Methods; see below).
Using this genome-wide association study (GWAS) approach, we identified many significant inversion-associated candidates for every arrangement, in the following ascending order: In(3R)C (only 6 SNPs) < In(3R)K < In(2L)t < In(3R)Mo < In(2R)NS < In(3L)P < In(3R)Payne (3,823 SNPs) (fig. 5, supplementary fig. S3 and Supplementary Data, Supplementary Material online). Despite large differences in the number of candidate SNPs across arrangements, we observed overrepresentation within the putative breakpoints for every inversion (supplementary table S12, Supplementary Material online), with many SNPs clearly tracing out the breakpoints, as visible from the Manhattan plots in figure 5 and supplementary figure S5, Supplementary Material online.
Genome-wide associations between SNPs and inversions In(2L)t, In(3L)P, and In(3R)Payne. Manhattan plots showing the distribution of candidate SNPs according to their −log10P value, based on genome-wide SNP-wise linear regressions with inversion frequencies for In(2L)t, In(3L)P, and In(3R)Payne. SNPs significant after Bonferroni correction (α′ = 3.1 × 10−8 = 0.05/1,574,911 tests) are shown in red. Inversion breakpoints are indicated by black boxes; overlapping inversions on 3R are delineated as follows: In(3R)C, dashed line; In(3R)K, dotted line; In(3R)Mo, solid line; In(3R)Payne, dashed-dotted line. For details of SNP enrichment within these inversions, see supplementary table S12, Supplementary Material online; for a corresponding list of inversion-associated genes, see supplementary table S13, Supplementary Material online; corresponding Manhattan plots for In(2R)NS, In(3R)C, In(3R)K, and In(3R)Mo are shown in supplementary figure S3, Supplementary Material online.
We also tested for an unequal distribution of candidate loci within each inverted region (as compared with randomly drawn, equally sized SNP sets outside the inversion) and, more specifically, within the inversion breakpoint regions (within a distance of 25 kb from each breakpoint, relative to SNP sets in the center of the inversion). Within the breakpoint regions, where suppression of gene flux is supposed to be maximal (Navarro et al. 1997,, 2000), we detected an excess of candidates for In(2L)t, In(3L)P, and In(3R)Payne, but not for In(2R)NS, In(3R)C, In(3R)Mo, and In(3R)K (supplementary tables S12 and S13, Supplementary Material online).
To independently test for associations between GWAS-based candidate SNPs and inversions, we used the sample based on the DGRP lines from Raleigh. Inversion-specific candidate SNPs were significantly more differentiated between standard and inverted karyotypes than randomly drawn noncandidate SNPs from within the inverted region (Student’s t-test; P < 0.001 for In(2L)t, In(2R)NS, In(3L)P, In(3R)Mo, In(3R)Payne; P < 0.05 for In(3R)K; supplementary table S14, Supplementary Material online). This confirms that the candidate SNPs identified in our GWAS are inversion specific.
Several studies have reported matching patterns of gene or SNP differentiation between the North American and Australian clines, indicating that many loci might be subject to parallel clinal selection along the two gradients (Turner et al. 2008; Kolaczkowski et al. 2011; Fabian et al. 2012; Reinhardt et al. 2014). We were therefore interested in testing whether the inversion-specific SNPs identified above are also polymorphic between the endpoints of the Australian cline, using the data of Reinhardt et al. (2014) and focusing on In(2L)t, In(3L)P, and In(3R)Payne.
Remarkably, 97.8% for In(2L)t, 90.9% for In(3R)Payne, and 69.24% for In(3L)P of all North American inversion-specific SNPs were also polymorphic and clinally differentiated in the Australian dataset (supplementary table S14, Supplementary Material online). For all three inversions in the Australian data, North American candidates were more differentiated than random sets of SNPs from within each inversion (supplementary table S14, Supplementary Material online). As shown above, it is improbable that the North American clines for these inversions are caused by admixture; thus, because for these arrangements the great majority of inversion-specific SNPs is shared between the North American and Australian clines, this matching pattern is most likely driven by parallel effects of selection (including genetic draft) across both clines (Fabian et al. 2012; Reinhardt et al. 2014).
Candidate Genes Associated with Inversions
Although we failed to find significant enrichment of gene ontology terms (not shown), we identified SNPs in numerous inversion-associated genes with effects on fitness-related traits known from studies of laboratory mutants and transgenes, including loci affecting growth, body size, germline development and oogenesis, receptivity upon mating, reproductive diapause, stress resistance, and lifespan (supplementary tables S11 and S13, Supplementary Material online; see flybase.org for details of gene function and original source references) (De Jong and Bochdanovits 2003; Flatt and Heyland 2011; Flatt et al. 2013). Many of these candidates have been previously identified as being strongly clinal in genomic analyses of both the North American and Australian cline (Kolaczkowski et al. 2011; Fabian et al. 2012) (supplementary tables S11 and S13, Supplementary Material online).
For example, we found several candidate genes known to be involved in regulating size-related traits (e.g., Eip63E, foxo, Hmgcr, InR, Imp-L2, Orct2, path, Stat92) (supplementary tables S11 and S13, Supplementary Material online) (Kolaczkowski et al. 2011; Fabian et al. 2012), which is interesting because body size is strongly clinal across latitudinal gradients on multiple continents and has been linked to In(3R)Payne (Coyne and Beecham 1987; James and Partridge 1995; James et al. 1995; Weeks et al. 2002; De Jong and Bochdanovits 2003; Rako et al. 2006; Klepsatel et al. 2014; Fabian et al. 2015). Indeed, given that QTL and association mapping efforts have mapped clinal size differences to the region spanned by In(3R)Payne (Gockel et al. 2002; Calboli et al. 2003; Rako et al. 2006; Kennington et al. 2007), it is noteworthy that this region harbors multiple genes with functional effects on growth and size (De Jong and Bochdanovits 2003; Fabian et al. 2012) (supplementary tables S11 and S13, Supplementary Material online).
Similarly, we also identified genes involved in the determination of adult lifespan, another trait known to be clinal (Mitrovski and Hoffmann 2001; Schmidt and Paaby 2008; Sgrò et al. 2013; Fabian et al. 2015), including, for example, cher, cnc, cpo, foxo, Imp-L2, InR, mld, and pnt (supplementary tables S11 and S13, Supplementary Material online). For the majority of these life-history candidate genes, however, nothing is known about the functional effects of naturally occurring polymorphisms upon fitness-related traits (De Jong and Bochdanovits 2003; Flatt and Schmidt 2009; Paaby and Schmidt 2009; Flatt et al. 2013).
Two notable exceptions are insulin-like receptor (InR) and couch potato (cpo), both located in the region spanned by In(3R)Payne. InR is well known from mutant studies to have pleiotropic effects on various fitness traits, including developmental time, body size, ovarian development, lifespan, and stress resistance (Tatar et al. 2001). Interestingly, Paaby et al. (2010,, 2014) have identified a clinal insertion–deletion (indel) polymorphism in InR which exhibits multifarious effects on many of these traits. In the case of cpo, a single amino acid polymorphism has been shown to underlie clinal variation in the propensity of reproductive diapause along the North American east coast (Schmidt et al. 2008; Cogni et al. 2013), a plastic and pleiotropic trait syndrome, affecting ovarian development, fecundity, stress resistance, and lifespan and thought to represent an overwintering adaptation of temperate populations (Schmidt et al. 2005; Schmidt and Paaby 2008; Schmidt 2011; Flatt et al. 2013). Although the specific life-history polymorphisms in InR and cpo discussed here are not in LD with In(3R)Payne (Schmidt et al. 2008; Paaby et al. 2010,, 2014), other inversion-associated alleles at these loci might have—as of yet unknown—functionally important effects on fitness-related traits.
Distinguishing Mechanisms of Inversion Evolution
Above we have provided evidence suggesting that In(2L)t and In(3R)Payne are subject to spatially varying selection, but can we say something from our data about the selective mechanisms that might be at work? Several hypotheses have been proposed to explain the establishment and maintenance of chromosomal rearrangements (Dobzhansky 1970; Kirkpatrick and Barton 2006; Hoffmann and Rieseberg 2008; Kirkpatrick 2010; and references therein). A simplified classification of these mechanisms distinguishes three types of selection: 1) Local adaptation without epistasis, whereby a new inversion captures an advantageous haplotype (i.e., two or more locally adapted alleles) and protects them from maladaptive gene flow (Kirkpatrick and Barton 2006); 2) local adaptation with epistasis (coadaptation), whereby an inversion captures and keeps together a set of locally adapted, epistatically interacting alleles (Dobzhansky 1970); and 3) direct selection for the inversion, with the inversion generating an adaptive mutation at the breakpoint. A key difference between (1) and (2) versus (3) is that we expect the same patterns of divergence at the breakpoints under all three models, but additional peaks of divergence within the inversion if there exist locally adapted alleles under models (1) and (2) (Guerrero et al. 2012).
The patterns of differentiation we see for In(2L)t and In(3R)Payne are consistent with scenarios (1) and (2), but not with model (3): For both rearrangements—especially for In(3R)Payne —we observe multiple, strongly clinally varying SNPs distributed within the body of the inversion (fig. 5). Moreover, our analysis of candidate genes associated with these inversions suggests that multiple loci might be functionally important for inversion evolution. Alternatively, the observed patterns might be explained by a combination of mechanisms (1) or (2), and (3). In contrast, the pattern for In(3L)P deviates markedly from that seen for In(2L)t and In(3R)Payne: We see sharp differentiation at the breakpoints but practically no divergence inside the inversion (fig. 5). This is either consistent with model (1) or, alternatively, with the expectation that neutral divergence between arrangements is high at the breakpoints but decays toward the center of the inversion (Navarro et al. 1997,, 2000; Andolfatto et al. 2001). We tentatively favor neutrality as the more likely explanation because the observed pattern is in good agreement with our other analyses above and with previous data by Hasson and Eanes (1996) suggesting that In(3L)P evolves neutrally.
Our data for In(2L)t and In(3R)Payne—but not for In(3L)P—are thus compatible with either local adaptation mechanism (1) or (2). Distinguishing these models is challenging as it requires analyzing details of haplotype structure and LD using phased sequence data. Although long-range LD among strongly differentiated alleles within the inversion might speak in favor of epistatic selection (Schaeffer et al. 2003), tests of the coadaptation model will ultimately necessitate direct fitness measures of alleles to demonstrate positive epistasis. However, not all aspects of our data are consistent with simple versions of local adaptation mechanisms. First, both models predict that selected inversions would go to near-fixation or fixation (Kirkpatrick and Barton 2006). This is clearly inconsistent with the currently observed frequency clines of In(2L)t and In(3R)Payne (also see discussion in Rane et al. 2015): In both North America and Australia, the frequencies of these inversions are higher at one cline end than the other, with the standard arrangement being favored at high latitudes, and they tend to be intermediate, that is, not close to fixation. Second, a strict formulation of local adaptation models predicts that the allelic content of an inverted arrangement might differ significantly among distinct populations, owing to different local conditions (Schaeffer et al. 2003). Because the majority of the North American inversion-specific SNPs we have identified are also polymorphic and clinal in Australia, our data do not support this prediction.
A perhaps more likely scenario is that In(2L)t and In(3R)Payne might have captured locally adapted alleles in their ancestral habitats, subsequently invaded other parts of the world, and that similar environmental gradients then led to parallel clinal assortment of these arrangements across multiple continents (Rane et al. 2015). Inverted arrangements might thus carry ancestral alleles that have “preadapted” them to subtropical/tropical conditions elsewhere, whereas standard arrangements might be favord by temperate, seasonal conditions. Although our analysis suggests that the clinal slopes of In(2L)t and In(3R)Payne are unlikely to be explained by admixture, an attractive alternative is that clinal selection and assortment might go hand in hand with admixture from Africa and Europe (Bergland et al. 2016). Future comparisons of inversion-associated African and European alleles with those analyzed here might be able to corroborate or refute this scenario.
Conclusions
Chromosomal inversions are often thought to play a major role in adaptation but the genic targets of selection they might carry or how they are maintained by selection remains poorly understood (Dobzhansky 1937, 1970; Hoffmann and Rieseberg 2008; Kirkpatrick 2010; Kirkpatrick and Kern 2012). To address this fundamental issue, we have analyzed seven cosmopolitan inversion polymorphisms in D. melanogaster, using the largest whole-genome data set for the well-known North American latitudinal cline available to date. Our genomic analyses provide multiple lines of plausible evidence that several inversion polymorphisms in D. melanogaster are maintained by spatially and/or temporally varying selection (table 1).
Summary of Results
| . | In(2L)t . | In(2R)NS . | In(3L)P . | In(3R)Payne . | In(3R)Mo . | In(3R)C . | In(3R)K . |
|---|---|---|---|---|---|---|---|
| Latitudinal clinality? | No | No | Yes | Yes | Yes | No | No |
| Seasonal phase cline? | Yes | Yes | No | Yes | No | No | No |
| Local seasonality? | No | Yes | No | Yes | No | No | No |
| Temporal stability of cline? | No (up) | — | No (down) | Yes | — | — | — |
| Climatic effects? (w/o population structure?) | Yes (yes) | No | Yes (no) | Yes (yes) | No | No | No |
| Seasonal climatic effects? | No | Yes | No | No | No | No | No |
| Nonneutrality? | Yes | No | No | Yes | No | No | No |
| Nonadmixture? | Yes | No | Yes | Yes | No | No | No |
| Latitudinal SNP enrichment? | No | No | No | Yes | No | Yes | No |
| Seasonal enrichment? | No | No | Yes | Yes | Yes | Yes | Yes |
| Latitudinal × seasonal enrichment? | Yes | Yes | No | Yes | Yes | No | Yes |
| Local seasonal enrichment? | Yes | Yes | No | Yes | Yes | No | Yes |
| Parallel differentiation of SNPs along Australian cline? | Yes | No | Yes | Yes | No | No | No |
| Known life history genes? | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| . | In(2L)t . | In(2R)NS . | In(3L)P . | In(3R)Payne . | In(3R)Mo . | In(3R)C . | In(3R)K . |
|---|---|---|---|---|---|---|---|
| Latitudinal clinality? | No | No | Yes | Yes | Yes | No | No |
| Seasonal phase cline? | Yes | Yes | No | Yes | No | No | No |
| Local seasonality? | No | Yes | No | Yes | No | No | No |
| Temporal stability of cline? | No (up) | — | No (down) | Yes | — | — | — |
| Climatic effects? (w/o population structure?) | Yes (yes) | No | Yes (no) | Yes (yes) | No | No | No |
| Seasonal climatic effects? | No | Yes | No | No | No | No | No |
| Nonneutrality? | Yes | No | No | Yes | No | No | No |
| Nonadmixture? | Yes | No | Yes | Yes | No | No | No |
| Latitudinal SNP enrichment? | No | No | No | Yes | No | Yes | No |
| Seasonal enrichment? | No | No | Yes | Yes | Yes | Yes | Yes |
| Latitudinal × seasonal enrichment? | Yes | Yes | No | Yes | Yes | No | Yes |
| Local seasonal enrichment? | Yes | Yes | No | Yes | Yes | No | Yes |
| Parallel differentiation of SNPs along Australian cline? | Yes | No | Yes | Yes | No | No | No |
| Known life history genes? | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
Summary of Results
| . | In(2L)t . | In(2R)NS . | In(3L)P . | In(3R)Payne . | In(3R)Mo . | In(3R)C . | In(3R)K . |
|---|---|---|---|---|---|---|---|
| Latitudinal clinality? | No | No | Yes | Yes | Yes | No | No |
| Seasonal phase cline? | Yes | Yes | No | Yes | No | No | No |
| Local seasonality? | No | Yes | No | Yes | No | No | No |
| Temporal stability of cline? | No (up) | — | No (down) | Yes | — | — | — |
| Climatic effects? (w/o population structure?) | Yes (yes) | No | Yes (no) | Yes (yes) | No | No | No |
| Seasonal climatic effects? | No | Yes | No | No | No | No | No |
| Nonneutrality? | Yes | No | No | Yes | No | No | No |
| Nonadmixture? | Yes | No | Yes | Yes | No | No | No |
| Latitudinal SNP enrichment? | No | No | No | Yes | No | Yes | No |
| Seasonal enrichment? | No | No | Yes | Yes | Yes | Yes | Yes |
| Latitudinal × seasonal enrichment? | Yes | Yes | No | Yes | Yes | No | Yes |
| Local seasonal enrichment? | Yes | Yes | No | Yes | Yes | No | Yes |
| Parallel differentiation of SNPs along Australian cline? | Yes | No | Yes | Yes | No | No | No |
| Known life history genes? | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| . | In(2L)t . | In(2R)NS . | In(3L)P . | In(3R)Payne . | In(3R)Mo . | In(3R)C . | In(3R)K . |
|---|---|---|---|---|---|---|---|
| Latitudinal clinality? | No | No | Yes | Yes | Yes | No | No |
| Seasonal phase cline? | Yes | Yes | No | Yes | No | No | No |
| Local seasonality? | No | Yes | No | Yes | No | No | No |
| Temporal stability of cline? | No (up) | — | No (down) | Yes | — | — | — |
| Climatic effects? (w/o population structure?) | Yes (yes) | No | Yes (no) | Yes (yes) | No | No | No |
| Seasonal climatic effects? | No | Yes | No | No | No | No | No |
| Nonneutrality? | Yes | No | No | Yes | No | No | No |
| Nonadmixture? | Yes | No | Yes | Yes | No | No | No |
| Latitudinal SNP enrichment? | No | No | No | Yes | No | Yes | No |
| Seasonal enrichment? | No | No | Yes | Yes | Yes | Yes | Yes |
| Latitudinal × seasonal enrichment? | Yes | Yes | No | Yes | Yes | No | Yes |
| Local seasonal enrichment? | Yes | Yes | No | Yes | Yes | No | Yes |
| Parallel differentiation of SNPs along Australian cline? | Yes | No | Yes | Yes | No | No | No |
| Known life history genes? | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
Our case for an adaptive role of inversions is strongest for In(2L)t and In(3R)Payne and rests upon five principal arguments: 1) clinality and/or seasonal phase clinality; 2) associations between climatic predictors and inversions independent of population structure; 3) nonneutrality of clinal slopes; 4) departure of clinal patterns from those expected under admixture; and 5) patterns of divergence within inversions compatible with local adaptation (table 1; also see figs. 1 and 5).
For several inversions we also find tentative evidence for seasonal dynamics (table 1; also see figs. 2 and 3). Here the case is most compelling for In(2R)NS. The clinal slope of this arrangement reverses in sign between summer and fall across latitude, suggesting the presence of a seasonal phase cline (Rhomberg and Singh 1988). Moreover, locally, in a temperate Pennsylvanian orchard, this inversion exhibits minor but predictable fluctuations across time that are in almost perfect antiphase with seasonal changes in temperature (fig. 3). The probably most convincing explanation of this pattern is that in temperate locales the northern (standard) karyotype is selected during cold periods, whereas the southern (inverted) karyotype is favored during warm periods (Lemeunier and Aulard 1992; Cogni et al. 2015).
Together, our results therefore strongly suggest that several inversion polymorphisms in this system are adaptive, being maintained by spatially—and maybe also temporally—varying selection.
Materials and Methods
Fly Populations and Samples
We analyzed 28 Pool-seq samples from 10 populations along the North American cline, from Florida to Maine, collected by the Dros-RTEC (12 unpublished samples; https://sites.sas.upenn.edu/paul-schmidt-lab/pages/opportunities, last accessed February 1, 2016), from Bergland et al. (2014) (14 samples) and from Fabian et al. (2012) (2 samples; see supplementary table S1, Supplementary Material online). For 9 of the 10 North American populations, samples were based on pools of wild-caught individuals; one sample from Raleigh (North Carolina) was generated by resequencing a pool consisting of 1 male from each of the 92 Drosophila Genetic Reference Panel (DGRP) inbred lines; see Zhu et al. (2012) and Bergland et al. (2014) for details of this sample and Mackay et al. (2012) for information on the DGRP. For several populations, samples were available from both the summer and fall season (in some cases for multiple years); the most intensively sampled population from Linvilla Orchards (Media, PA, USA) was sampled ten times (four summer samples, six fall samples) during 4 consecutive years (2009–2012; supplementary table S1, Supplementary Material online; also see Bergland et al. 2014). For a description of general methods used for sample preparation, sequencing, and bioinformatics of the Dros-RTEC samples, see Bergland et al. (2014). In addition, we used data from populations in Australia (Queensland, Tasmania: Reinhardt et al. 2014), Europe (Italy, Portugal: Orozco-terWengel et al. 2012, Bastide et al. 2013; Austria, Spain: Dros-RTEC: See supplementary table S1, Supplementary Material online), and Africa (Drosophila Genome NEXUS; http://www.johnpool.net/genomes.html, last accessed February 1, 2016; Lack et al. 2015; four populations with ≥10 sequenced individuals of known karyotype: Cameroon [CO], Gabon [GA], Rwanda [RG], Zambia [ZI]). For details, see supplementary table S1, Supplementary Material online.
Mapping of Sequencing Data
Reads were mapped using the methods described in Bergland et al. (2014); for consistency, published reads from North America (Fabian et al. 2012), Australia (Reinhardt et al. 2014), and Europe (Orozco-terWengel et al. 2012; Bastide et al. 2013) were remapped using the same method. We used the following modifications: Reads were filtered for a base quality of 18; trimmed to a minimum read length of 50 with PoPoolation (Kofler et al. 2011); and mapped to a “hologenome” reference consisting of the genomes of D. melanogaster (v.5.40) and microbial symbionts (http://bergmanlab.ls.manchester.ac.uk/?p=2033). Mapped reads were filtered for a minimum mapping quality of 20 using “samtools” (v.0.1.19) prior to synchronizing with SNPs from the Dros-RTEC data set using custom software. A modified version of the bioinformatics pipeline of Bastide et al. (2013) was used to identify D. melanogaster-specific reads in Pool-seq data collected in Italy that were contaminated with Drosophilasimulans DNA. To estimate the frequency of D. simulans reads in the Pool-seq data from Bolzano (Italy), we used a data set with SNPs that are divergent between D. melanogaster and D. simulans (not shown). Next, we generated a combined reference sequence consisting of the genomes of D. melanogaster and D. simulans and deconvoluted the species-specific reads from the Italian data set via competitive mapping against the combined references using the mem algorithm of bwa v.0.7.7 (Li 2013). Reads mapping uniquely to the D. simulans genome were removed from the data set using custom software.
Estimation of Inversion Frequencies
For each Pool-seq sample we estimated the frequencies of seven cosmopolitan inversion polymorphisms (In(2L)t, In(2R)NS, In(3L)P, In(3R)C, In(3R)K, In(3R)Mo, In(3R)Payne; Lemeunier and Aulard 1992) by applying a panel of 400 inversion-specific diagnostic SNP markers (Kapun et al. 2014). Frequencies were estimated from average frequencies of inversion-specific alleles by cumulating allele counts and dividing them by the cumulative coverages across all marker positions (Kapun et al. 2014). Inversion frequency estimates are given in supplementary table S1, Supplementary Material online. For each sample, we calculated binomial standard errors for inversion frequencies based on the average coverage across all diagnostic, inversions-specific SNP markers (supplementary table S1, Supplementary Material online).
General Methods for Downstream Analyses
Here we provide an overview of general statistical methods used in our downstream analyses. First, to examine variation in the frequency of a given inversion i (fi) we either applied 1) GLMs using maximum-likelihood estimation; 2) GEEs using quasi-likelihood estimation (Halekoh and Højsgaard 2007; Bergland et al. 2014) (with both approaches assuming a binomial error structure); or 3) general linear (regression; ANCOVA) models, assuming a normal (Gaussian) error structure and using standard least squares estimation. We employed these modeling methods to analyze the effects of clinality and/or seasonality by using either one or both of the following independent variables: “Latitude” (l) as a continuous predictor and/or “sampling season” (s) as a nominal fixed factor with two levels (“summer”: Sampling dates before 1 September; “fall”: After 1 September; see supplementary table S1, Supplementary Material online); with their interaction being denoted as l × s. The binomial GLMs were used to account for sampling noise in allele frequency estimates (e.g., variance in read depth among samples); GEEs were used to account for the structured, time-series nature of temporal (seasonal) changes in inversion frequencies (cf. Bergland et al. 2014); and regression or ANCOVA models were used to specifically test for effects on (linear) clinal slopes and/or to test the homogeneity-of-slopes assumption. Second, clinal curves (inversion frequency against latitude) in figure 1 and supplementary figure S1 (Supplementary Material online) represent logistic (logit) curves based on parameters from the binomial GLMs (see above). Third, to test for overrepresentation of clinal and/or inversion-associated SNPs, or to test for significant pairwise differentiation between populations (SNP-wise FST) relative to randomly drawn SNPs, we used random resampling with replacement (bootstrapping). We obtained an empirical null distribution by generating 10,000 randomly drawn SNP sets of a size equal to that of the candidate set. Candidate SNPs were considered to be overrepresented relative to chance, or to be more strongly differentiated than randomly drawn SNPs, if more than 95% of all 10,000 tests (Fisher’s exact or Wilcoxon rank-sum tests) resulted in P values < 0.05. Fourth, generally we used a significance threshold of P < α = 0.05; however, because in several analyses we tested specific hypotheses for “each” of the seven inversions, we also report significant P values after Bonferroni correction across the seven inversions (α′ = 0.05/7 = 0.0071; highlighted in green in the supplementary tables). Similarly, we applied Bonferroni correction when performing genome-wide association of SNPs with inversions (see below).
Clinal and Seasonal Changes in Inversion Frequencies
To investigate the effects of clinality and seasonality on inversion frequencies, we applied a binomial GLM to each inversion: fi = l + s + l × s + ϵi, where ϵi denotes the error term. The l × s interaction term tests whether the effects of latitude and seasonality are interdependent.
To specifically test for local seasonal fluctuations in inversion frequencies, we restricted our analysis to the Linvilla Orchards (Media, PA) population, for which we had the highest number of seasonal samples. We used GEEs implemented in the R package geepack (Halekoh and Højsgaard 2007) to fit the following model with binomial error structure (Bergland et al. 2014): fi = s + id + ϵi, where id denotes the random effect of “sample identity,” with the levels representing the different Pool-seq samples from Linvilla Orchards (supplementary table S1, Supplementary Material online). We note that using generalized linear mixed models gave results that were qualitatively identical to those from GEEs (not shown).
To compare the effects of clinality (latitude), seasonal phase clinality, and local seasonality, we estimated effect sizes for 1) the whole cline using the clinal endpoints (i.e., Florida versus Maine); 2) the seasonal phase cline (i.e., between seasons for a given population); and 3) local differences between seasonal samples at Linvilla Ochards (Media, PA) by calculating average inversion frequency differences. To compare the relative magnitudes of these effects we calculated the percentage difference of frequency changes relative to the changes between clinal endpoints; in addition, for the effects of local seasonality at Linvilla Ochards, we estimated the effect sizes of mean seasonal inversion frequency differences by calculating Cohen’s standardized effect size d (Cohen 1988).
We also aimed to examine the temporal stability of inversion clines by comparing our genomic data with previous estimates of inversion frequencies based on karyotyping or polymerase chain reaction markers, with the combined data spanning approximately four decades (Mukai et al. 1974; Stalker 1976; Langley et al. 1977; Mettler et al. 1977; Mukai and Voelker 1977). We focused on the three common cosmopolitan inversions In(2L)t, In(3L)P, and In(3R)Payne for which we had sufficient data to examine temporal dynamics. Population samples (inversion frequency estimates) were grouped according to the approximate decades in which they were sampled (“1970s”: 1969–1973; “2000s”: 1997–2003; “2010s”: 2008–2012) and analyzed using ANCOVA: “arcsine squareroot” (fi) = l + d + l × d + ϵi, where d is a nominal fixed factor referring to the effect of sampling decade and where the l × d interaction represents a homogeneity-of-slopes test.
Estimates from six locations were excluded from this analysis: Orlando, Lake Placid, Lake Wales, and Merritt Island (all FL); Mexico City (Mexico); and San Juan (Puerto Rico). We excluded data from Mexico and Puerto Rico (Mettler et al. 1977) because these locations fell outside the latitudinal range of the Dros-RTEC data set (supplementary table S1, Supplementary Material online); Orlando was excluded because there were no records for In(3L)P and In(3R)Payne for this population (Mettler et al. 1977); and estimates from Lake Placid, Lake Wales (Mettler et al. 1977), and Merrit Island (Sezgin et al. 2004) were excluded because the frequency of In(3R)Payne has been found to be unusually high at these locations (∼0.7–0.8) and because no recent records exist for comparison. Importantly, however, we note that including these three estimates did not qualitatively change our results for the temporal stability of In(3R)Payne (not shown).
Effects of Climatic Factors on Inversion Frequencies
To examine how climatic factors affect inversion frequencies across the North American cline, we obtained environmental data from WorldClim (http://www.worldclim.org/, last accessed February 1, 2016) (Hijmans et al. 2005). We employed the R package “raster” (Hijmans and van Etten 2012) and custom software to associate inversion frequencies with 19 climatic variables (spatial resolution: 2.5° × 2.5° latitude by longitude), using coordinates nearest to the sampling locations (supplementary table S1, Supplementary Material online). Because the WorldClim data are based on annual averages, we calculated annual mean inversion frequencies for each location (population), averaging across seasonal estimates.
To account for intercorrelations among climatic variables and to examine their joint effects on inversion frequencies, we performed PCA of all 19 z-transformed climatic variables available from the WorldClim data set using “FactoMineR” in R (Lê et al. 2008) and used PCs 1, 2, and 3 as individual predictors for downstream analysis. To analyze the effect of each PC on the frequency of each inversion i, we used linear regression: arcsine squareroot (fi) = z (PC)+ ϵi, where z represents the standard (z) score. In addition, because climatic effects on inversions might be confounded by population structure, we analyzed the effect of each PC using LFMMs, a method that allows estimating associations between genetic variation and environmental factors while simultaneously inferring and accounting for background levels of population structure (Frichot et al. 2013). We created a combined data set consisting of inversion-specific SNPs for all seven cosmopolitan inversions (see above; Kapun et al. 2014) and 9,996 putatively neutral, genome-wide SNPs located in small introns (<60 base pairs) (Clemente and Vogl 2012) outside the inversions. For populations sampled over time we averaged allele frequencies across seasons and years. We chose the number of latent factors K (representing approximately the number of distinct groups or clusters) by iteratively calculating SNP-wise tests with K ranging from 1 to 10. A value of K = 2 produced the best fit between the empirical cumulative distribution function of P values and the uniform expectation (not shown). Because the number of neutral SNPs exceeded the number of inversion-specific SNPs by a relatively large margin, we applied bootstrapping; we then used the bootstrapped samples to test for differences between the z-score distributions of neutral versus inversion-specific SNPs with Wilcoxon rank-sum tests.
We next examined the effects of climatic factors on seasonal changes in inversion frequencies in Linvilla Orchards (Media, PA) across the 4-year sampling range (2009–2012). Daily estimates of temperature minima and maxima and precipitation were obtained from a weather station in ∼45 km distance (Seabrook Farms, NJ; http://climate.psu.edu/data/ida, last accessed February 1, 2016); from these data we calculated z-transformed monthly averages for each sampling month and year (supplementary table S1, Supplementary Material online). For each inversion i, seasonal frequency estimates across the 4-year period were analyzed with the following GEE model: fi,seasonal = z (climatic predictor) + id + ϵi, where id refers to the random effect of sample identity (supplementary table S1, Supplementary Material online).
Adaptive versus Demographic Effects on the Distribution of Inversions
To distinguish adaptive from demographic effects (e.g., due to drift) on the distribution of inversions, we tested whether differentiation in inversion frequencies differs from neutrality by comparing inversion-specific and putatively neutral, genome-wide SNPs in short introns (see above). We deliberately used a genome-wide panel of intronic SNPs in this analysis because chromosome-specific intronic SNPs could be in LD with a particular inversion located on the same chromosomal arm.
To match the pattern of major cosmopolitan inversions whose frequencies decrease with latitude, we conditioned biallelic neutral SNPs to decrease from south to north; in contrast, for In(3R)Mo, whose frequency increases with latitude, neutral SNPs were conditioned to increase from south to north. For each SNP in both SNP data sets, we calculated a linear regression between arcsine squareroot-transformed allele frequencies and latitude as the predictor. From these data we generated empirical distributions of regression slopes. To account for the different numbers of inversion-specific versus neutral SNPs, we applied bootstrapping and then used Wilcoxon rank-sum tests to test for differences in the distributions of bootstrapped slopes between the two groups of SNPs.
To investigate whether admixture might explain the clinality of inversions in our data, we compared our observed inversion frequencies with frequency estimates expected under admixture from Africa and Europe of six populations along the North American cline (Bergland et al. 2014; 2016). Frequencies expected under admixture were approximated by estimating inversion frequencies from four African and four European populations (see above) and by multiplying them with previously estimated chromosome-specific proportions of admixture for each North American location (Bergland et al. 2016) as follows: fi,j,expected = fi,Africa × Adj,Africa + fi,Europe × (1−Adj,Africa), where fi,j,expected represents the expected frequency under admixture of the ith inversion in the jth North American population; fi,Africa and fi,Europe refer to the average frequency of the ith inversion in Africa and Europe, respectively; Adj,Africa denotes the estimated proportion of African admixture for the jth North American population; and Adj,Europe denotes the estimated proportion of European admixture for the jth North American population (Adj,Europe = 1−Adj,Africa). To test for differences in observed versus expected clinal slopes we used ANCOVA: arcsine squareroot (fi) = l + ad + l × ad + ϵi, where ad is a nominal fixed factor referring to the effect of observed versus expected admixture. The l × ad interaction represents a homogeneity-of-slopes test; a significant effect of the interaction implies that the observed clinal slope differs from the slope expected under admixture. Because the true population sources of admixture in North America remain unknown, our admixture analysis must be taken as being provisional.
Enrichment of Clinal and Seasonal SNPs within Inversions
We next asked whether clinally and/or seasonally varying SNPs might be enriched within the breakpoint boundaries of the seven inversions examined. We tested for genome-wide effects of clinality and/or seasonality on SNP frequencies with SNP-wise binomial GLMs by fitting both l and s and their interaction to the sample frequencies of each biallelic SNP. We restricted this analysis to high-confidence SNPs, defined by a minimum allele frequency >0.05 (averaging across samples), a sample-wise minimum minor allele count of 2, a minor coverage of 10×, and a maximum coverage not exceeding the top 2% (98th percentile) of coverage for each population.
To define inversion-associated SNPs which might be differentiated due to spatially and/or temporally varying selection, either directly or indirectly due to genetic hitchhiking (“genetic draft”), SNP-wise P values were subjected to a stringent empirical outlier approach (Fabian et al. 2012): Only SNPs falling into the upper 0.1% tail of the P value distribution were considered to represent candidates.
Overrepresentation of clinal and/or seasonal candidate SNPs within all nonoverlapping inversions was tested by using bootstrapping and Fisher’s exact tests. For the partially overlapping inversions on 3R (In(3R)C, In(3R)K, In(3R)Mo, and In(3R)Payne), we tested for overrepresentation of SNPs by considering only nonoverlapping regions; note that for overlapping regions we were unable to distinguish among these arrangements.
Genome-Wide Associations with Inversions
Because the nonphased nature of Pool-seq data did not allow us to investigate haplotype structure and LD (Schlötterer et al. 2014), we aimed to identify SNPs that are statistically associated with inversions and thus likely in strong (statistical and/or physical) LD with them, using a GWAS approach.
To correlate genome-wide SNP frequencies with inversion frequencies, we used linear regressions; only SNPs from tests with P < 3.14 × 10−8 (Bonferroni-corrected α′ = 0.05/1,574,911 tests) were considered to be associated with a given inversion. The sample from Raleigh (supplementary table S1, Supplementary Material online) was excluded from this analysis in order to have an independent data set for testing associations between candidate SNPs and inversions (see below). We used bootstrapping, combined with Fisher’s exact tests, to detect overrepresentation of inversion-specific candidate SNPs 1) within the region spanned by the corresponding inversion as compared with randomly drawn, equally sized SNP sets outside the inversion and 2) within the breakpoint regions, that is, within a distance of 25 kb to each breakpoint, relative to randomly drawn, equally sized SNP sets located in the center of the inversion, away from the breakpoint regions.
To independently test for associations between candidate SNPs and inversions, we used the Pool-seq sample from Raleigh (based on the DGRP lines; Zhu et al. 2012; Bergland et al. 2014), and source information from the original DGRP data (Mackay et al. 2012; Huang et al. 2014). Based on the known karyotypes of the DGRP lines, we established two groups (inverted versus noninverted) and estimated differentiation between them by calculating SNP-wise FST. To test for an excess of FST values for inversion-specific candidate SNPs, we performed bootstrapping combined with Wilcoxon rank-sum tests.
To examine whether SNPs that are significantly associated with inversions in North America are also polymorphic in Australia and whether they show parallel differentiation across both clines, we calculated SNP-wise FST values for the endpoints of the Australian cline, that is, for the comparison Queensland, Tasmania (Reinhardt et al. 2014), and compared FST distributions of candidate SNPs by means of bootstrapping and Wilcoxon rank-sum tests. We focused our analysis on In(2L)t, In(3L)P, and In(3R)Payne; only SNPs polymorphic in Australian populations were considered.
Acknowledgments
We thank A. Bergland, P. Schmidt, S. Laurent, and J. Polechová for discussion and two reviewers and the Associate Editor for helpful comments on the manuscript. We are grateful to the members of the Dros-RTEC consortium (led by A. Bergland, D. Petrov, P. Schmidt) who have collected the samples used here, especially the members of the Schmidt, Petrov, Lazzaro, Pool, and Gonzalez laboratories (supplementary table S1, Supplementary Material online). We also acknowledge the support of the Department of Ecology and Evolution and the Vital-IT bioinformatics core facility at the University of Lausanne. This work was supported by the Swiss National Science Foundation (grant number PP00P3_133641 to T.F). The unpublished raw sequence data used in this study are available from NCBI SRA (BioProject PRJNA308584#).
References
Author notes
Associate editor: John Parsch




