Evolution of Chromosomal Inversions across an Avian Radiation

Abstract Chromosomal inversions are structural mutations that can play a prominent role in adaptation and speciation. Inversions segregating across species boundaries (trans-species inversions) are often taken as evidence for ancient balancing selection or adaptive introgression, but can also be due to incomplete lineage sorting. Using whole-genome resequencing data from 18 populations of 11 recognized munia species in the genus Lonchura (N = 176 individuals), we identify four large para- and pericentric inversions ranging in size from 4 to 20 Mb. All four inversions cosegregate across multiple species and predate the numerous speciation events associated with the rapid radiation of this clade across the prehistoric Sahul (Australia, New Guinea) and Bismarck Archipelago. Using coalescent theory, we infer that trans-specificity is improbable for neutrally segregating variation despite substantial incomplete lineage sorting characterizing this young radiation. Instead, the maintenance of all three autosomal inversions (chr1, chr5, and chr6) is best explained by selection acting along ecogeographic clines not observed for the collinear parts of the genome. In addition, the sex chromosome inversion largely aligns with species boundaries and shows signatures of repeated positive selection for both alleles. This study provides evidence for trans-species inversion polymorphisms involved in both adaptation and speciation. It further highlights the importance of informing selection inference using a null model of neutral evolution derived from the collinear part of the genome.


LD between inversions
The similar inversion frequency clines shared across all chromosomes may suggest co-adapted gene complexes between inversions (epistatic interactions) that would lead to linkage disequilibrium (LD) between them.We tested this by estimating all pairwise correlation coefficients between inversions (genotypes coded as 0, 1, 2 copies of the minor allele) within each of the 18 populations and then summarizing the Pearson correlation coefficients in separate meta-analyses for each chromosome.None of the overall correlation coefficients was significant, indicating no LD between inversions (Table S6).However, such LD could only arise through selection within a single generation and not through admixture.Thus, with only ten individuals per population, the power to detect any such LD is low.

Taxonomic and biological implications of the Lonchura spectabilis split across major radiations
Based on genome-wide data, the two Lonchura spectabilis subspecies we sampled were placed into separate radiations on Sahul and the Bismarck Archipelago, with the population from New Guinea (Madang) sharing genome-wide genetic ancestry with the sympatric L. castaneothorax.However, there are at least five narrow outlier regions at which Madang L. castaneothorax and L. spectabilis are fixed or nearly fixed for divergent alternative alleles and at which the two spectabilis populations (Madang and New Britain) share similar alleles.These five outlier regions include two that overlap genes known to underlie coloration phenotypes (IGFS11 (Eom, et al. 2012), SOX10 (Dutton, et al. 2001;Hubbard, et al. 2010;Gunnarsson, et al. 2011;Domyan, et al. 2014;Fariello, et al. 2014); see Figure 5 in Stryjewski and Sorenson 2017).The same pattern is observed for the inversion on chromosome chrZ: L. castaneothorax and L. spectabilis from Madang are fixed for alternative states of the inversion, whereas Madang L. spectabilis (and L. caniceps from Central Province, PNG) share the ancestral allele along with all the Bismarck Archipelago populations.The inversion includes at least two known avian color gene (SLC45A2 (Gunnarsson, et al. 2007;Xu, et al. 2013;Domyan, et al. 2014;Campagna, et al. 2017), FST (Toews, et al. 2016)).
Subspecies from New Britain (L.s. spectabilis, Sclater 1879) and from New Guinea (L.s. mayri , Hartert 1930;L. s. gajduseki, Diamond 1967;L. s. wahgiensis, Mayr and Gilliard 1952;L. s. sepikensis, Jonkers and Roersma 1990) have been grouped into the same species, Lonchura spectabilis, solely on the basis of plumage color pattern similarity (Jonkers and Roersma 1990).Whereas genome-wide data suggest that L. spectabilis is not a monophyletic group, but instead consists of at least two species that evolved a similar plumage coloration convergently (compare photos of Lonchura s. spectabilis and the phenotypically almost identical L. s. mayri in Jonkers and Roersma 1990), it is also possible that patterns of allele sharing in the outlier regions noted above provide evidence for a shared genetic basis for their similar phenotypes.Interestingly, at another known color gene (KITLG ;Miller, et al. 2007;Sulem, et al. 2007;Fariello, et al. 2014;Guenther, et al. 2014), L. caniceps is the only Sahul population fixed for the allele common in all the Bismarck Archipelago populations, whereas Madang L. castaneothorax and L. spectabilis share a different allele at this gene.This gene may underlie a difference in bill color, as L. caniceps and all Bismarck Archipelago populations (including New Britain spectabilis) have black bills while all other Sahul populations (including Madang spectabilis) have blue bills.Thus, the genomes of these species comprise regions with different evolutionary histories, making taxonomic determinations particularly challenging.Depending on the weight given to genome-wide patterns of divergence versus the specific regions responsible for phenotypic differences among species, L. spectabilis populations from New Britain and Sahul may warrant reclassification into separate species.

Neutral expectation of trans-specific shared polymorphism
Let  =  1 / 2 be the size ratio between the two populations with effective sizes of  1 and  2 diploid individuals.We assume that the two populations were constant in size since their split  ⋅ 2 1 generations ago from a joint ancestral population of effective size   .We assume that two gametes were sampled from each of the two present populations and calculate for a given locus the probability that a mutation occurred and that we observed both the derived and the ancestral allele in both population samples.A necessary condition for this is that -back in time -the lineages in each population neither coalesce nor are hit by a mutation in the last  time units of 2 1 generations.The probability of this condition is  −⋅(1++2) , where /2 is the mutation rate per 2 1 generations.If this condition is fulfilled, another necessary condition is that the next event that happens back in time is that one lineage  that is ancestral to a sample from population 1 and one lineage  that is ancestral to a sample from population 2 coalesce.As 4 out of 6 possible pairs of lineages could be this pair (, ), the probability of this event is (4/)/(6/ + 2) = 4/(6 + 2 ⋅ ), where  =   / 1 and Na is the ancestral population size.For the next event there are two possibilities.One possibility is that a mutation occurs on the joint ancestral lineage of  and  and the three remaining lineages then coalesce without any further mutations.The probability of this is /2/(3/ + 3 ⋅ /2) ⋅ 3//(3/ + 3 ⋅ /2) ⋅ 1//(1/ + ).The other possibility is that the two other lineages that are ancestral to samples from populations 1 and 2 coalesce and that a mutation then happens on one of the two ancestral lineages before the ultimate coalescence takes place.The probability of this is 1//(3/ + 3 ⋅ /2) ⋅ /(1/ + ) ⋅ 1//(1/ + ).Putting it all together, we obtain the following probability for observing a polymorphism that is shared in both population samples For comparison, we now calculate the expected total number of polymorphisms of any kind in the sample, which is /2 times the expected total length of the genealogy of the sample.The expected total length is the sum of the expected lengths of the branches in the two present populations and in the ancestral population.For the latter, we need to consider the possibilities that all four lineages trace back into the ancestral population or that one or two coalescent events have reduced the number or lineages to three or two.
For the expected length of branches in population 1, we need to account for the possibilities that the lineages have not coalesced in population 1 and that they may have coalesced at some time s < τ.Thus, we obtain For Population 2 we obtain analogously The probabilities that 4, 3 or 2 lineages make it into the ancestral population are , respectively.Conditioned on the cases of 4, 3 or 2 lineages, the expected total length of the genealogy in the ancestral population is 2 ⋅ + 1� and 2, respectively.
Putting all terms together we obtain the following for the expected number of polymorphisms: The ratio of eq1 and eq2 then describes the probability of observing the polymorphism in both populations relative to all possibilities of observing the mutation.
We obtain the following upper bound for eq2 this by neglecting that the possibility that the ancestral lineages can coalesce within the derived populations: Dividing the right-hand side of eq1 by eq3 (and assuming that  is small) we obtain the following lower bound for the fraction of joint polymorphisms among all polymorphisms in the four sequences:    S3 | Linkage disequilibrium networks obtained using all SNPs on chromosomes chr1, chr5, chr6 and chrZ.On each chromosome, the inversion cluster is highlighted in purple.The clusters on chr1, chr6 and chrZ were also visible when analyzing genome-wide SNP data but to aid illustration we restricted the SNP set to each respective chromosome for the diagrams presented here.Fig. S5 | Neighbor-joining trees using dxy-estimates from the inverted segments on chromosomes (A) chr1, (B) chr5, (C) chr6 and (D) chrZ.Filled symbols mark groups of individuals belonging to the same population homozygous for the derived allele, open symbols those groups homozygous for the ancestral allele.Asterisks mark groups of individuals that come from the same population but are homozygous for the ancestral and derived inversion type.We considered only groups of individuals with more than four haplotypes of the same inversion type in the dxy-estimation and tree-building procedure.

Fig. S6 | Conceptualization of shared allelic variation between ancestral and derived inversion types
and the outgroup species.Gene flow between arrangements occurs in the form of gene conversion and occasional double crossovers, thereby homogenizing allelic variation between arrangements and resulting in some shared polymorphism between the derived inversion type and the outgroup species.SNP alleles are indicated with 0 (ancestral) and 1 (derived) and their font size represents the relative numbers of shared polymorphisms.

Fig. S7 |
The percentage of shared variants of all ingroup Lonchura finches with (A) the Bengalese finch (Lonchura striata) and (B) the zebra finch (Taeniopygia guttata) across chromosomes chr1, chr5, chr6 and chrZ.The position of the inversions is highlighted in light yellow and the position of centromeres in turquoise.As a yellow line, the percentage of shared SNPs with the putative derived inversion type and in purple with the ancestral type is shown.On chromosomes chr1 and chr6, the number of shared SNPs drops markedly in the derived inversion type.On chromosome chr5, the effect is less pronounced and on chrZ, the number of shared SNPs drops in both types, consistent with multiple (nested) inversions and /or selection on both types.S12 | Ancestral state reconstruction of the inversion on chr1 using the polymorphism parsimony method on the dxy-tree.Purple: ancestral allele, yellow: derived allele.Leaf nodes display the allele frequencies within populations, internal nodes depict whether the inversion was polymorphic (both colors) or fixed for the ancestral or derived allele (purple and yellow, respectively).Fig. S13 | Ancestral state reconstruction of the inversion on chr5 using the polymorphism parsimony method on the dxy-tree.Purple: ancestral allele, yellow: derived allele.Leaf nodes display the allele frequencies within populations, internal nodes depict whether the inversion was polymorphic (both colors) or fixed for the ancestral or derived allele (purple and yellow, respectively).Fig. S14 | Ancestral state reconstruction of the inversion on chr6 using the polymorphism parsimony method on the dxy-tree.Purple: ancestral allele, yellow: derived allele.Leaf nodes display the allele frequencies within populations, internal nodes depict whether the inversion was polymorphic (both colors) or fixed for the ancestral or derived allele (purple and yellow, respectively).Fig. S15 | Ancestral state reconstruction of the inversion on chrZ using the polymorphism parsimony method on the dxy-tree.Purple: ancestral allele, yellow: derived allele.Leaf nodes display the allele frequencies within populations, internal nodes depict whether the inversion was polymorphic (both colors) or fixed for the ancestral or derived allele (purple and yellow, respectively).

Fig. S2 |
Fig. S2 | Admixture analysis using K=2 to K=19 clusters.Depicted is the major cluster across 10 replicate runs per K.

Fig.
Fig.S3| Linkage disequilibrium networks obtained using all SNPs on chromosomes chr1, chr5, chr6 and chrZ.On each chromosome, the inversion cluster is highlighted in purple.The clusters on chr1, chr6 and chrZ were also visible when analyzing genome-wide SNP data but to aid illustration we restricted the SNP set to each respective chromosome for the diagrams presented here.

Fig. S4 |
Fig. S4 | Principal component analyses including Sahul populations only and all variants in the inverted regions on chromosomes chr1, chr5, chr6 and chrZ.

Fig. S8 |
Fig. S8 | Overview on the different population genetic estimates derived from individuals belonging to the same population and homozygous for one or the other inversion type on chromosome chr1.On the x-and y-axis, estimates from the collinear and inverted part of the chromosome are shown, respectively.Filled symbols mark groups of individuals belonging to the same population homozygous for the derived allele, open symbols those groups homozygous for the ancestral allele.The dashed line indicates the identity line.

Fig. S9 |
Fig. S9 | Overview on the different population genetic estimates derived from individuals belonging to the same population and homozygous for one or the other inversion type on chromosome chr5.On the x-and y-axis, estimates from the collinear and inverted part of the chromosome are shown, respectively.Filled symbols mark groups of individuals belonging to the same population homozygous for the derived allele, open symbols those groups homozygous for the ancestral allele.The dashed line indicates the identity line.

Fig. S10 |
Fig. S10 | Overview on the different population genetic estimates derived from individuals belonging to the same population and homozygous for one or the other inversion type on chromosome chr6.On the x-and y-axis, estimates from the collinear and inverted part of the chromosome are shown, respectively.Filled symbols mark groups of individuals belonging to the same population homozygous for the derived allele, open symbols those groups homozygous for the ancestral allele.The dashed line indicates the identity line.

Fig. S11 |
Fig. S11 | Overview on the different population genetic estimates derived from individuals belonging to the same population and homozygous for one or the other inversion type on chromosome chrZ.On the x-and y-axis, estimates from the collinear and inverted part of the chromosome are shown, respectively.Filled symbols mark groups of individuals belonging to the same population homozygous for the derived allele, open symbols those groups homozygous for the ancestral allele.The dashed line indicates the identity line.

Fig.
Fig.S12| Ancestral state reconstruction of the inversion on chr1 using the polymorphism parsimony method on the dxy-tree.Purple: ancestral allele, yellow: derived allele.Leaf nodes display the allele frequencies within populations, internal nodes depict whether the inversion was polymorphic (both colors) or fixed for the ancestral or derived allele (purple and yellow, respectively).

Fig. S16 |
Fig.S16| Isolation-by-distance of the four inversions on (A) chr1, (B) chr5, (C) chr6 and (D) chrZ (red) and the collinear parts of the respective chromosomes (blue).On all four chromosomes, IBD is more pronounced in the inverted than the collinear regions of a chromosome, suggesting ecological selection along the latitudinal gradient acting on the inversions.Data points are not independent and confidence intervals thus anticonservative.

Fig. S17 |
Fig.S17| Isolation-by-distance (IBD, left column) and isolation-by-ecology (IBE, right column) on chr5.The upper row displays IBD and IBE for the inversion, the bottom row for the collinear part of the chromosome.In the center, differences in the two ecological variables -precipitation (blue) and temperature (red) -are shown in relation to geographic distance (eco-spatial autocorrelation; Shafer and Wolf 2013).In the IBE plots, precipitation is used as the ecological variable.

Fig. S18 |
Fig.S18| Isolation-by-distance (IBD, left column) and isolation-by-ecology (IBE, right column) on chr6.The upper row displays IBD and IBE for the inversion, the bottom row for the collinear part of the chromosome.In the center, differences in the two ecological variables -precipitation (blue) and temperature (red) -are shown in relation to geographic distance (eco-spatial autocorrelation; Shafer and Wolf 2013).In the IBE plots, temperature is used as the ecological variable.

Fig. S19 |
Fig.S19| Isolation-by-distance (IBD, left column) and isolation-by-ecology (IBE, right column) on chrZ.The upper row displays IBD and IBE for the inversion, the bottom row for the collinear part of the chromosome.In the center, differences in the two ecological variables -precipitation (blue) and temperature (red) -are shown in relation to geographic distance (eco-spatial autocorrelation; Shafer and Wolf 2013).In the IBE plots, precipitation is used as the ecological variable.

Table S2 |
Clusters of high linkage disequilibrium identified using network analyses (LDna;Kemppainen, et al. 2015) and ddRAD-seq data.The positions of high LD clusters correspond to the positions of high FST (cf.Tab.S1).

Table S4 |
Multiple regression on distance matrices (MRM) results.MRM was fitted using either FST of the inversion (Estimate and P-value[FST]) or the frequency of the ancestral inversion type (Estimate and P-value [ITF]).P-value[SNPs]was derived in comparison to the randomly drawn SNPs from the collinear parts of the chromosomes and was Bonferroni corrected for the four chromosomes.Geodesic is the geographic distance effect.Significant effects are printed in bold.

Table S5 |
Spatially-explicit mixed-effects model estimates using allele frequency of the ancestral inversion allele as the dependent variable and precipitation and temperature as fixed covariates.We controlled for species identity by fitting it as a random intercept term and fitted the geographic coordinates (Universal Transverse Mercator) of the sampling locations as a second random effect.