Structural genomic variation and migratory behavior in a wild songbird

Abstract Structural variants (SVs) are a major source of genetic variation; and descriptions in natural populations and connections with phenotypic traits are beginning to accumulate in the literature. We integrated advances in genomic sequencing and animal tracking to begin filling this knowledge gap in the Eurasian blackcap. Specifically, we (a) characterized the genome-wide distribution, frequency, and overall fitness effects of SVs using haplotype-resolved assemblies for 79 birds, and (b) used these SVs to study the genetics of seasonal migration. We detected >15 K SVs. Many SVs overlapped repetitive regions and exhibited evidence of purifying selection suggesting they have overall deleterious effects on fitness. We used estimates of genomic differentiation to identify SVs exhibiting evidence of selection in blackcaps with different migratory strategies. Insertions and deletions dominated the SVs we identified and were associated with genes that are either directly (e.g., regulatory motifs that maintain circadian rhythms) or indirectly (e.g., through immune response) related to migration. We also broke migration down into individual traits (direction, distance, and timing) using existing tracking data and tested if genetic variation at the SVs we identified could account for phenotypic variation at these traits. This was only the case for 1 trait—direction—and 1 specific SV (a deletion on chromosome 27) accounted for much of this variation. Our results highlight the evolutionary importance of SVs in natural populations and provide insight into the genetic basis of seasonal migration.


Introduction
Structural variants (SVs) include duplications, deletions, transpositions, and inversions.Existing research suggests that these variants represent a major source of genetic variation and could have important fitness consequences (Collins et al., 2020;Cridland et al., 2013;Qin et al., 2021).For example, SVs can have deleterious effects on fitness, disrupting functional features of the genome (e.g., exons) and/or suppressing recombination (Alonge et al., 2020;Hämälä et al., 2021;Hirsch & Springer, 2017).Suppressed recombination can lower effective population size at the local genomic level, reducing the efficiency of purifying selection (Charlesworth et al., 2009).There is growing evidence SVs could also have the opposite effect, facilitating adaptation and speciation (Chakraborty et al., 2019;Kirkpatrick & Barton 2006;Todesco et al., 2020;Weischenfeldt et al., 2013).For example, reductions in recombination can allow co-adapted alleles at separate loci to segregate together, sheltering alleles that underlie adaptive phenotypic traits.SV breakpoints can also serve as adaptive mutations themselves (Villoutreix et al., 2021).If these traits are important for maintaining reproductive isolation and gene flow with other populations is occurring, these SVs could also facilitate speciation (Kirkpatrick & Barton, 2006;Noor et al., 2001;Schwander et al., 2014).
Despite their potential fitness effects, data on the genomewide distribution of SVs and their frequency within populations is largely lacking.This dearth of knowledge is especially true in natural populations of nonmodel organisms and is related in large part to technological limitations.Specifically, advances in sequencing technology have made it possible to obtain genome-wide data from nonmodel organisms, but existing work is often limited to short (~150 bp) sequencing reads.SVs are often larger than these reads and can be highly repetitive, making them difficult to assemble (Chakraborty et al., 2018;Huddleston & Eichler, 2016;Peona et al., 2021;Weissensteiner et al., 2020).A complete understanding of SVs will require contiguous genomes from multiple individuals where repetitive regions of the genome have been assembled accurately (Alkan et al., 2011;Huddleston et al., 2017;Lutgen et al., 2020).Linked reads are one technology that can help meet this need.They use molecular barcoding to preserve long-range sequencing information.Here we used this technology to identify SVs in a natural population of European blackcap.We have matching data on the migratory behavior of each bird, allowing us to gain inference into the genetics of seasonal migration as well.
Seasonal migration is the yearly long-distance movement of individuals between their breeding and wintering grounds.Successful migration requires the integration of several behavioral, physiological, and morphological traits (Dingle, 2006;Piersma, 2011).Decades of research has shown that there is a genetic basis to many of these traits, but the actual identity of genes underlying them remains largely unknown.Genes controlling the circadian clock have been linked to some traits, but unbiased, genome-wide studies have only recently been applied to this question (Delmore & Liedvogel, 2016;Justen & Delmore, 2022;Liedvogel et al., 2011;Merlin & Liedvogel, 2019).Most genome-wide studies are limited to population-level comparisons, estimating genomic differentiation between populations that differ in one or more migratory traits (Delmore et al., 2015;Delmore et al., 2020a;Lundberg et al., 2017;Toews et al., 2019;von Rönn et al., 2016).These comparisons are valuable for identifying genomic regions under positive selection (i.e., areas of elevated differentiation), but caution is needed when interpreting their results as other processes can also elevate differentiation, including background selection and selection unrelated to the trait of interest (Barrett & Hoekstra, 2011;Cruickshank & Hahn, 2014;Noor & Bennett, 2009).A complete understanding of migration genetics will require complementary work at the individual level, including genome-wide association studies (GWAS) connecting specific genomic regions to individual migratory traits.GWAS will not only tell us about the genetics of individual migratory traits, but will also help us understand how these traits are modulated and integrated at the molecular level.Given their potential to shelter co-adapted alleles, SVs may be especially important for this integration.Indeed, there is already evidence that SVs underlie migratory traits in two avian systems; separate inversions on chromosome 1 underlie migratory orientation in willow warblers and wing shape in common quails (Caballero-López et al., 2022;Lundberg et al., 2017Lundberg et al., , 2023;;Sanchez-Donoso et al., 2022).
The Eurasian blackcap is found throughout much of Europe, northern Africa and central Asia (Figure 1).This species exhibits considerable variation in migratory behavior-resident and migratory populations exist and among migrants, three main orientations have been described (northwest [NW], southwest [SW], and southeast [SE] on fall migration) (Cramp & Brooks, 1992;Delmore et al., 2020b) (Figure 1).SE and SW migrants form a migratory divide in central Europe.Birds at the center of this divide orient in intermediate, southern (S) directions (Delmore et al., 2020b).Additional differences in the distance, timing, speed, and duration of both fall and spring migration have also been documented in migrants (Delmore et al., 2020b) (Figure 1).Researchers have capitalized on variation in the migratory behavior of blackcaps to study the genetics of migration for decades, including experimental and quantitative genetics approaches showing there is a strong genetic basis to migratory traits (Berthold et al., 1992;Helbig, 1991;Pulido & Berthold, 2010).Genetic surveys indicate that this variation arose recently and has not resulted in substantial, genome-wide differentiation (Delmore et al., 2020a;Mettler et al., 2013;Pérez-Tris et al., 2004).These genetic surveys include a recent study that used whole genome resequencing data to identify eight small genomic regions under positive selection in migrants that orient in different directions (four, three, and one in the NW, SE, and SW groups, respectively).The former study was limited to single nucleotide polymorphisms (SNPs) and population-level comparisons using individually resequenced birds distant from the contact zone with population-averaged phenotypes (Delmore et al., 2020a).
Here, we used linked reads to generate haplotype-resolved de novo assemblies for 79 individual blackcaps with individual-based phenotype characterization, including NW, SW, and SE migrants and individuals from the migratory divide in central Europe (Figure 1).We called SVs using these de novo assemblies and had three main objectives: (a) characterize SVs in natural populations, including their genome-wide distribution and overall fitness effects, (b) test for genome-wide population differentiation using these variants, and (c) use complementary population-and individual-level analyses to study the genetics of seasonal migration, including (a) local estimates of genomic differentiation to identify regions under selection and (b) GWAS to test if these regions are linked to specific migratory traits.All of the individuals used in the present study were tracked with light-level geolocators (Delmore et al., 2020b).Accordingly, we have individual-level phenotype data to run GWAS on multiple migratory traits, including direction, distance, the location of wintering grounds, and both the duration and timing of fall and spring migration.

Results and discussion
We constructed de novo genome assemblies for 79 blackcaps using 10X Genomics linked-read technology (Supplementary Table S1).Final assemblies averaged 999.3 Mb in size and included an average of 1,710 scaffolds.Average scaffold and contig N50 sizes were 10.2 Mb and 106.2 kb, respectively, and >91% of the universally conserved single-copy benchmark (BUSCO) genes were present in these assemblies (Supplementary Table S1).Given considerable uncertainty associated with calling SVs (Chander et al., 2019), we used these assemblies and three separate pipelines to genotype birds, limiting our analysis to SVs called in two or more pipelines.Following these criteria, we identified between 9,246 and 12,585 SVs per individual.Note, additional information on pipelines used to genotype birds can be found in the Methods, but briefly, two pipelines started with the assembly of de novo genomes (two pseudohaplotypes) that were subsequently aligned to the blackcap reference genome, and one pipeline aligned reads directly to the reference.All three pipelines included heterozygous and alternate homozygous genotype calls.After merging data from all individuals and filtering out variants with minor allele frequencies <0.05, our final dataset comprised 15,764 SVs (Supplementary Table S1 includes data on how many genotypes were called for each pipeline individually).

Characterizing SVs and examining their overall fitness effects
We started our analysis by examining the distribution of SVs across the genome, counting the number of SVs in nonoverlapping windows of 200 kb.We found an average of three variants per window, with some windows harboring much larger numbers of SVs, especially on microchromosomes, where densities exceeded 15 SVs/window (Figure 2A).Similar enrichment of SVs on microchromsomes has been reported in blackcaps (on an independent Illumina WGS dataset, Bascón-Cardozo et al. 2022) and other birds and may be related to increased rates of recombination and/or enrichment of TEs on these chromosomes (e.g., Peona et al., 2022).Guanine Cytosine (GC) content is also higher on microchromosomes (Spearman's rank correlation between GC content in 200 kb windows and chromosome size = −0.67 [p < .0001])and there is an overall positive relationship between GC content and the density of SVs independent of micro versus macrochromosomes (Spearman's rank correlation between GC content and SV density in 200 kb windows = 0.10 [p < .0001]).
Deletions were the most numerous type of SV (n = 9341), followed by insertions (n = 6393), inversions (n = 24), tandem duplications (n = 4), and translocations (n = 1).There was a bias towards smaller SVs among deletions and insertions, with median sizes of 245 and 265 bp in each class, respectively (size range of 58-65,285 bp for deletions and 52-15,209 bp for insertions; Figure 2B).The tandem duplications were a similar size (median 272 bp; range 144-357 bp), but the median size of inversions was approximately 10 times larger (2,485 bp; range 365-7,774 bp).The size of the single translocation was 10,456 bp.
We used minor allele frequency spectra (AFS) to examine the overall fitness effects of SVs using all individuals, starting with a comparison of variant types and including an AFS for SNPs called using the same sequencing data for comparison.AFS of all SV types were skewed towards rare alleles when compared to the AFS for SNPs (Figure 2D), indicating that alleles at SVs segregate at lower frequencies than alleles at SNPs and are under stronger purifying selection (i.e., are more deleterious than alleles at SNPs).This finding is supported by AFS constructed for SVs that overlap functional and nonfunctional elements in the genome, with SVs overlapping exons being more skewed towards rare alleles than variants overlapping any other feature (Figure 2E).This pattern is not as striking as the difference between SV types and SNPs, but both results are in line with theoretical predictions that SVs are often deleterious.This is also empirically supported by results from other species using a similar approach to quantify overall fitness effects associated with SVs (e.g., Drosophila, Zichner et al., 2013;cacao trees, Hamala et al. 2021;European crows, Weissensteiner et al., 2020;Lycaeides butterflies, Zhang et al. 2023).Population structure can also skew AFS, but previous genomic work on the blackcap and results below reveal limited structure in this species (Delmore et al., 2020a;Mettler et al., 2013;Pérez-Tris et al., 2004).Note, the presence of SVs can affect the quality of SNP calls and subsequent AFS but AFS built using SNPs proximate to SVs in our dataset are similar to the AFS obtained using genome-wide SNPs (Supplementary Figure S1).SVs frequently occurred in repetitive regions of the genome.Specifically, we annotated SVs using a repeat library manually curated for the Eurasian blackcap (Bascón-Cardozo et al., 2022;Bours et al., 2022).Thirty-seven percent of the variants overlapped one or more repetitive elements in this library.Close to half of the repeats that overlapped these variants were simple repeats (47.8%); one quarter overlapped long terminal repeats (LTR) retrotransposons (25.5%).The rest were long interspersed nuclear elements/chicken repeat 1 (LINE/CR1) retrotransposons (17.5%), low complexity repeats (8.2%), and to a much smaller degree DNA transposons (0.79%) and short interspersed nuclear elements (SINE) retrotransposons (0.24%; Figure 2F; similar overlap proportions were noted for each SV type alone as well [e.g., focusing on just inversions the division was 41.6%, 23.2%, 27.2%, 6.4%, and 1.5%, respectively]).The AFS for LINE/CR1 retrotransposons was skewed towards rare alleles suggesting they have the strongest deleterious effects and/or have recently been expanding.Comparable proportions of repeat elements were reported in other songbird genomes, suggesting that transposable elements (especially LTR and LINE/CR1) are highly active in this group of organisms (Ficedula flycatchers, Suh et al., 2018;European crows, Weissensteiner et al., 2020).This overlap between TEs and SVs could also account for the skew towards rare alleles noted in our study (Figure 2D; Bergman & Bensasson 2007;Horvath et al., 2022).Note, summary statistics for AFS confirmed the patterns we documented as well.For example, Tajima's D was lowest for LINE/CR1 compared to other repeat classes (−1.04 vs. −0.81 to −0.61 for the remaining classes).
Only a small number of inversions were identified in our dataset, but they exhibited molecular signatures expected of this variant type.For example, we estimated linkage disequilibrium (LD) in inversions and a control set of colinear regions (same sizes and number).As expected, LD dropped off rapidly in colinear regions (at ~2,000 bp).However, this was not the case in inversions, with SVs continuing to exhibit elevated levels of LD as distances between variants increased beyond 2,000 bp (4,000 bp and beyond) (Figure 2c).

Genome-wide levels of population differentiation at SVs
Consistent with previous genetic surveys using molecular tools to characterize population structure of European blackcaps, we found little evidence for population structure using SVs.Previous studies using marker-based approaches, such as mitochondrial haplotypes, microsatellites, but also genome-wide SNP-based approaches clearly show that population structure among medium-distance migrants with distinct migratory orientation is very low (Delmore et al., 2020a;Mettler et al., 2013;Pérez-Tris et al., 2004).We assigned birds from the present study to NW, SW, SE, or intermediate (S) groups using their vector between breeding and wintering locations to characterize autumn migratory direction (Supplementary Table S1) and tested if we could recover population structure using SVs.Specifically, we summarized genetic variation at SVs using a principal component analysis (PCA).We limited this analysis to autosomal chromosomes because our dataset comprises both males and females and the sex chromosomes accounted for a large amount of documented variation (Supplementary Figure S2A).Once the sex chromosomes were excluded, only the first principal component (PC) explained a significant proportion of the variation present in the dataset (p = .0033,eigenvalue = 1.17).Birds did not cluster based on breeding or wintering location (Supplementary Figure S2B).This lack of structure was true even when contrasting birds with breeding locations furthest away from the contact zone (e.g., the Netherlands vs. Austria) (Supplementary Figure S2C).And genome-wide estimates of F ST support this lack of genetic variation (average weighted values of F ST ranged from 0.021 between SW and NW and 0.031 between SE and NW).Combined with previous genetic surveys, these results indicate that differences in migration do not generate strong genome-wide differentiation at SVs or any other genetic marker examined in the blackcap thus far (Delmore et al., 2020b).

Local genomic patterns of differentiation
Low levels of genetic differentiation between populations that exhibit distinct differences in phenotypic traits are ideal for work on the genetic basis of phenotypic traits, as genomic regions that underlie these traits should standout against the backdrop of limited differentiation.Accordingly, we used a series of analyses aimed at identifying genomic regions that underlie migratory traits in blackcaps.We started with population branch statistics (PBS) (Yi et al., 2010), an F ST -based statistic that can be applied to comparisons with more than two groups and identifies allele frequency differences specific to each group.We limited our analysis to SVs in NW, SW, and SE migrants, excluding birds exhibiting intermediate (S) orientations to contrast the most extreme phenotypes.Among SW and SE migrant, we limited our analysis to birds that bred in a geographically confined area across the migratory divide in Austria (i.e., excluded birds breeding in the Netherlands) to minimize any potential confounding effects of even small amounts of population structure.
PBS was lowest for SW birds, and very few variants stood out against baseline levels in this group (average PBS in SW = 0.009; 0.013 in both NW and SE) (Figure 3).Both NW and SE migrating birds had several SVs that stood out against baseline levels of PBS (Figure 3).Variants exhibiting extreme values of PBS may be under positive selection and important for encoding variation in migratory behavior of these birds.Accordingly, we extracted genes that overlapped SVs in the top 5% of the PBS distribution for each phenotype (PBS > 0.06 [SW], 0.10 [NW], and 0.10 [SE]; 172 [SW], 155 [NW], and 162 [SE] genes, respectively; only 1/24 inversions fell in the top 5% of any PBS distributions with the remaining loci being deletions or insertions) and ran a functional enrichment analysis, looking for gene ontology (GO) terms, biological pathways and regulatory motifs that were overrepresented in these genes.Note, caution should be used when interpreting these results as many additional processes beyond selection can also elevate genomic differentiation (Barrett & Hoekstra, 2011;Cruickshank & Hahn, 2014;Noor & Bennett, 2009).
Regulatory motifs are sequences of DNA that are bound by transcription factors.Ebox is a regulatory motif that was enriched in NW migrants.The Ebox motif is often bound by basic helixloop-helix (bHLH) transcription factors and is of significance for migration as several genes that regulate the circadian clock are bHLH transcription factors that bind Ebox motifs (Cassone, 2014).In a response to changes in photoperiod, the circadian clock entrains circadian (possibly also circannual) rhythms as well as initiates migratory behavior (Gwinner, 1996;Visser et al., 2010).Interestingly, we documented similar enrichment of Ebox motifs in our previous genomic survey of blackcaps, using SNPs and PBS to identify genomic regions under selection in the same migratory phenotypes (Delmore et al., 2020a).No functional enrichment was found in the SW migrants.
One GO term was enriched in the SE migrants: "MAPK cascade" (mitogen-activated protein kinase cascade).This cascade is highly conserved across vertebrates and important for translating extracellular signals to intracellular responses.Of particular relevance to seasonal migration, MAPK cascades facilitate learning and memory, consolidating learning following specific behaviors and eliciting memory formation (Atkins et al., 1998;Day & Sweatt, 2011;Thomas & Huganir, 2004).MAPK cascades are also important for mounting immune responses in many vertebrates.Considerable research has focused on the relationship between immunity and migration, with several studies suggesting that migrants suppress their immune system on migration, allowing them to allocate more resources to this costly life history event.It has also been noted that migrants with different routes and wintering sites likely encounter different parasites throughout their annual cycle; local adaptation associated with this variation may also drive changes in the immune system (Altizer et al., 2011;Eikenaar et al., 2018;Møller & Erritzøe, 1998).

Genome-wide association analyses and individual migratory traits
The former analyses used local estimates of differentiation to identify genomic regions that distinguish the main migratory phenotypes present in blackcaps.In this last set of analyses, we broke migration behavior down into distinct traits and examined the genetic basis of each one with GWAS.We used all birds in these analyses (i.e., added birds exhibiting intermediate [S] orientations back in to the analysis along with those breeding in the Netherlands) and focused on seven traits: direction (the vector orientation between breeding and wintering sites), distance (direct connection between breeding and wintering sites, in km), wintering location (longitude), and both the duration (days) and timing (date when birds reached the halfway point between breeding and wintering sites) of fall and spring migration.
We started our analyses by estimating PVE (the proportion of phenotype variation explained by genetic variation in our SV set) for each trait.We used Bayesian sparse linear mixed models (BSLMMs) for these analyses (Zhou et al., 2013).BSLMMs use an markov chain monte carlo (MCMC) algorithm to fit all variants to the phenotype simultaneously and control for population structure with a kinship matrix.We ran 20 million MCMC steps, extracting parameter values every 10,000 steps.Mean values of PVE across these steps ranged from 0.45 to 0.82 (direction = 0.73 ± 0.27 [SD], distance = 0.45 ± 0.29, winter longitude = 0.82 ± 0.28, fall timing = 0.47 ± 0.28, fall duration = 0.50 ± 0.30, spring timing = 0.77 ± 0.29, and spring duration 0.46 ± 0.29).These values are relatively high, suggesting our SVs capture a good amount of the variation present in the migratory traits we measured.Standard deviations around these means, however, are quite wide and indicate we have limited precision in these estimates.Accordingly, we ran a complementary set of analyses obtaining polygenic scores (PGS) for each individual and trait.Specifically, we randomly masked phenotypic values for a subset of individuals and tested if we could predict these missing values with the remaining dataset.Migratory orientation was the only trait where predicted and actual phenotypic values were correlated (Figure 4A; r = 0.22, F 1,77 = 3.92, p = .05,all p-values for remaining traits > .11).Combined, results from PVE and PGS analyses suggest that even though PVE values were relatively high for all traits, we can only be confident that our SVs capture sufficient variation in migratory direction.This does not necessarily mean there is no genetic basis to the remaining traits; rather, future work using larger sample sizes and additional variants (e.g., SNPs) may be needed to explain variation in these traits.For now, we focus our remaining analyses on migratory direction.
Beyond PVEs, BSLMMs also estimate posterior inclusion probabilities (PIPs) for each variant.PIPs represent the proportion of MCMC iterations where a variant has a nonzero effect size.One variant stood out against the rest in our BSLMM for migration direction-a 710 bp deletion on chromosome 27 that had a PIP of 0.28 (Figure 4B).Birds Figure 3. Signatures of selection.Population branch statistics estimated for birds that migrated along northwest, southwest, or southeast (SE) routes from their breeding grounds.Structural variant circled in black in the SE panel was also identified in our genome-wide association analyses (Fig. 4).The 5% cutoff used for enrichment analyses is shown with a dotted line for each phenotype.homozygous for the reference allele oriented in directions that were further east (Figure 4C).This variant also stood out in our analysis of genomic differentiation, exhibiting an elevated estimate of PBS in SE migrants (Figure 3).SE migrants were nearly fixed for the reference allele at this locus.The relationship between migratory orientation and genotypes at this locus remained even when we limited the dataset to S migrants (the group with the most variation in migratory direction) (F 2,16 = 8.84, p = .003)suggesting variation at this SV is related to orientation, not just fixation in SE migrants.Combined, these findings suggest that this variant is under selection in SE migrants and helps control migratory orientation across blackcaps.Concerning the actual identity/effect of this variant, it occurs in an intron of T cell receptor alpha chain (TRA).TRA helps T cells respond to specific antigens in their cellular environment (Göbel et al., 1994) and this finding continues to support a connection between immune response and migration in blackcaps; recall, the functional category "MAPK cascade" was enriched in SE migrating birds.This pathway plays a role in memory and learning and is also important for mounting immune responses in many vertebrates.Together, these findings add support for an important role of immunity in the context of migration behavior, for example, related to the fact that birds using different migratory routes are challenged with different environments throughout their annual cycle, and local adaptation associated with this variation may facilitate changes in the immune system.

Conclusion
We conducted an extensive study of SVs in natural population of migratory songbirds, using de novo assembled genomes to genotype 79 individually phenotyped birds at thousands of SVs.We found evidence for purifying selection on SVs, suggesting they have an overall deleterious effect on fitness also supporting previous work on SVs.We also documented considerable overlap between SVs and transposable elements, suggesting transposable elements comprise a large proportion of SVs and genetic variation in the genome.We did not find evidence for genome-wide population differentiation between blackcaps with different migratory strategies, but our individual-based phenotypic characterization indicates local genetic variation at SVs does account for a large proportion of the phenotypic variation observed in specific migratory traits.
Seasonal migration is a complex behavior that comprises many traits.SVs like inversions are strong candidates for capturing loci that underlie complex behaviors and evidence from other systems has connected inversions with migration (e.g., an inversion on chromosome 1 underlies migratory direction in willow warblers) (Caballero-López et al., 2022;Lundberg et al., 2017).We did not make such a connection here; we only identified a small number of small inversions suggesting there are several paths for the evolution of complex traits like migration.LD was reduced in these inversions suggesting they are suppressing recombination, but they did not exhibit signatures of selection and were not linked to any of our focal migratory traits.Blackcaps only began to diverge recently (30,000 years ago (Delmore et al., 2020a)) and seasonal migration is highly dynamic in this species (e.g., the NW population only recently [in the last 70 years] started to growing in size) (Cramp & Brooks, 1992).Accordingly, it is possible that inversions will capture genetic variation underlying migratory behavior in the future, but that is not currently the case.Inversions are Figure 4. Results from genome-wide association analysis using migration direction during autumn.(A) Relationship between original values of migratory orientation and those predicted by cross-validation procedure (p = .05).(B) Posterior inclusion probabilities for all variants, highlighting the deletion on chromosome 27 that exhibited elevated population branch statistics in southeast migrating birds (also highlighted in Fig. 3).(C) Relationship between genotypes at the deletion on chromosome 27 and migration direction (p < .0001).Individuals are colored by their migratory phenotype.
only one type of SV; deletions, insertions, as well as translocations or duplications can also drastically alter phenotypes with important evolutionary consequences, such as the text book example of industrial melanism that is the darker morph of the peppered moth (Hof et al., 2016), or as an example within the songbirds, plumage color divergence between hooded and carrion crows (Corvus corone cornix, C.c. corone, respectively) has been linked to a LTR retrotransposon insertions in crows, where hooded crows are homozygous for the insertion (Weissensteiner et al., 2020).
Future work using additional long-read technologies (e.g., PacBio HiFi and HiC) may uncover additional variants that could be connected to migration in the system but for now, we conclude that seasonal migration has a highly polygenic basis in blackcaps.Beyond the deletion on chromosome 27, none of the SVs identified here stood out in our analyses of selection or GWAS.We reported similar findings in a previous study using an independent dataset of genome-wide SNP data (Delmore et al., 2020b).Interestingly, the Ebox regulatory motif identified in the NW migrants here was also identified in enrichment analyses with population-averaged phenotypes based on SNP data and could reveal a mechanism through which multiple migratory traits could be controlled by a similar mechanism.The Ebox motif is bound by transcription factors that regulate circadian rhythms in birds.Circadian rhythms are important for migration (e.g., songbirds like blackcaps switch from diurnal to nocturnal behavior on migration and circadian rhythms likely entrain circannual rhythms which are important for migratory timing).Perhaps seasonal migration in blackcaps is regulated by a small number of transcription factors that affect expression at multiple genes.In a general context, we see recurrent pathways and functional categories connected to migration in many systems, including immunity, circadian rhythm regulation, learning, and memory.Although the actual genes under selection do not seem to match in an across species comparison, we would assume that the adaptation of the central pathways needs to be optimized to and constrained by the migratory niche of each species or population, that is, each species adapt their phenotype in a specific way, fitting to its ecological demands, which could explain consistency in general regulatory pathways despite the apparent lack of commonly identified genes.

Sampling and phenotypic analysis
We included data from 79 blackcaps in the present study.A subset of these birds were captured using mist nets on the breeding grounds in Austria (n = 45) and the Netherlands (n = 16); the remaining birds were captured on the wintering grounds in the UK (n = 18; Supplementary Table S1).We obtained blood samples from each bird and fitted them with light-level geolocators using leg-loop backpack harnesses.Light-level geolocators record light intensity data at specific time intervals.These data are stored until the devices are retrieved at which point light intensity data are converted to day length and time of local midday and used to estimate daily longitude and latitude (McKinnon & Love, 2018).
We describe methods used to analyze light-level geolocator data in full in Delmore et al. (2020b).Of relevance for the present study, we categorized birds into four broad phenotypic classes (migrating NW, SW, SE, and S in fall) using their wintering locations.For birds wintering north of 37.5° N, we considered those west of 5° E to be SW migrants, those east of 20° E to be SE migrants, and those between 5 and 20° E to have intermediate southerly (S) routes.For birds wintering south of 37.5° N, we used a cut-off of 0° instead of 5° E to distinguish SW from S because these longer routes require less of a westerly component to reach the same longitude.
We estimated migration direction and distance by fitting a rhumb line between their breeding and wintering sites.We estimated timing by identifying the shortest distance route (i.e., a great circle routes) between their breeding and wintering sites and determining the date when birds reached 50% of the way between these sites.Duration was estimated as the number of days it took each bird to travel from early (30%) to late (70%) migration stages and speed as migration distance divided by duration.

Assemblies and variant calling
High molecular weight DNA was extracted from blood samples, 10X Chromium libraries were constructed and sequenced using Illumina technology (150 bp, paired end) by Novogene (Hongkong).The mean molecule length of resulting libraries was 32,657 bp (range 10,062-54,206 bp) and sequencing reached a mean coverage of 55X (range 7-74X; based on reads aligned to reference by LongRanger [see below]; Supplementary Table S1).
We called SVs using three pipelines.The first two pipelines relied on de novo assemblies of reads.Specifically, we assembled reads into two parallel pseudohaplotypes (phased contigs and scaffolds) with Supernova2 (Weisenfeld et al., 2017) and used two different approaches to align these pseudohaplotypes to the blackcap reference genome (Ishigohoka et al., 2021) and call genotypes in relation to the reference genome: (a) MUmmer4 (Marçais et al., 2018) for alignment and MUM&Co (-g 1080000000 -b) (Hämälä et al., 2021) to call genotypes and (b) Minimap2 (Li, 2018) for alignment and SVIM-Asm (diploid, tandem duplications as insertions and interspersed duplications as insertions) (Heller & Vingron, 2021) for genotype calling.We aligned 10X reads directly to the blackcap reference for the third pipeline.We used LongRanger wgs for alignment (average mapping rate of 87%) and the same program to genotype SVs (--vcfmode gatk) (Marks et al., 2019).
Once SVs were genotyped, we used a series of filters to identify high-quality variants.Starting at the level of individuals, we limited the dataset to variants identified by at least two callers and had matching genotypes.We used SURVIVOR (settings 1000 2 0 0 0 50) (Jeffares et al., 2017) and a custom R script to conduct this filtering.Variants with strings of >10Ns were also removed to reduce potential errors caused by contig scaffolding.We generated a multi-individual vcf (i.e., merged variants across individuals) using SURVIVOR (settings 1000 4 0 0 0 50) and limited our analyses to SVs with maf > 0.05 using vcftools (Danecek et al., 2011).We focused on five types of SVs: insertions (sequence inserted into query), deletions (sequence deleted from the query), tandem duplications (sequence duplicated in the query), inversions (sequence with reversed orientation), and translocations (sequence moved between chromosomes).
Following Hamala et al. (2021), we chose a random set of 50 SVs for visual validation in IGV.We used alignments from LongRanger and confirmed the presence of all but one of these SVs (see examples in Supplementary Figure S3, including one of the main variants identified in our subsequent analyses), suggesting false positives are rare in our dataset and likely related to the stringent filtering we applied.

Population genetics and GWAS
We used AFS to examine the overall fitness effects of SVs.We constructed these AFS using vcf2sfs in R (Liu et al., 2018) and included all SVs (i.e., not excluding those with MAF < 0.05 at this stage).Following Hämälä et al. (2021), we combined insertions and deletions into the same category (INDEL) for this analysis because these variant types cannot be defined based on the reference and this could affect inferences related to fitness.We used scripts from Hämälä et al. (2021) to estimate LD (squared Pearson's correlation coefficients) between arrangements at inversions and a random set of colinear regions with the same distribution of sizes as inversions.
We used a PCA to examine genome-wide patterns of genomic differentiation.This analysis was conducted using smartpca (EIGENSOFT version 5.0) after standardizing loci to have equal variance.
We used estimates of PBS to identify SVs exhibiting signatures of selection.PBS is similar to F ST but can be used with more than two populations and identifies selection specific to one population.We estimated this parameter in two steps, calculating F ST between NW, SW, and SE migrants using the estimator derived by Hudson (Hudson et al., 1992) and scripts from Hämälä et al., (2021).These estimates of F ST were then converted to PBS following (Zhan et al., 2014) (T = log transformed estimates of F ST , example is for SE population): (T SE-NW + T SE-SW − T NW-SW )/2.Raw estimates of F ST can be found in Supplementary Figure S4.
We used two different programs to look for functional enrichment at genes overlapping SVs showing evidence for selection (i.e., with PBS values in the top 5% of the distribution): (a) BINGO (Maere et al., 2005) to look for enrichment of specific GO categories and (b) go:Profiler (Raudvere et al., 2019) to look for enrichment in additional functional including biological pathways, regulatory motifs of transcription factors and microRNAs and protein-protein interactions.We used a custom annotation for the blackcap in BINGO and an annotation for the chicken in go:Profiler.
GWAS were run as BSLMMs in Genome-wide efficient mixed model association algorithm (GEMMA) (Zhou et al., 2013).BSLMMs are adaptive models that include linear mixed models (LMM) and Bayesian variable selection regression as special cases and that learn the genetic architecture from the data.These models are run separately for each phenotype but allele frequencies at all variants are considered together and included as the predictor variable.A kinship matrix is also included to control for factors that influence the phenotype and are correlated with genotypes (e.g., population structure).We ran four independent chains for each BSLMM, with a burn-in of 5 million steps and a subsequent 20 million MCMC steps (sampling every 1,000 steps).We report one hyperparameter from this model (PVE: the proportion of variance in phenotypes explained by all SVs, also called chip heritability) and focus on two variant-specific parameters: PIPs and (β, variant effects).We calculated genetic correlations between traits by identifying SNPs with PIP > 0.01 and correlated model-averaged estimates of β (β weighted by their PIPs) (Comeault et al., 2014;Gompert et al., 2019).In order to facilitate comparisons across traits and limit the effects of outliers, we normal quantile-transformed all of our phenotypic traits before running these analyses.We also regressed each trait against sex to remove the effects of this variable on migratory traits.
We used a cross-validation procedure to obtain polygenic scores for each individual (and trait) (Gompert et al., 2019;Villoutreix et al., 2022).Specifically, we masked the phenotype of 25% of the sampled individuals and reran BSLMMs using the remaining individuals and same parameters as the original BSLMM (with only one MCMC chain and the "predict Suh-1" plugin in GEMMA).We repeated this procedure four times for each trait, obtaining predicted values (polygenic scores) for each individual and used a linear model to estimate correlation between predicted values and the original phenotype of each bird.

Figure 2 .
Figure 2. Structural variants and their fitness effects.(A) Density (number/200 kb window) along the genome and (B) size distributions for indels (insertions and deletions) including dotted line for median size.(C) Average decay of linkage disequilibrium as a function of physical distance in major and minor inversion homozygotes and collinear regions.Loess curves and their standard errors are shown.Allele frequency spectrums for (D) each type of structural variant and (E) those overlapping different repeat elements in the genome.