Chromosome Fusion Affects Genetic Diversity and Evolutionary Turnover of Functional Loci but Consistently Depends on Chromosome Size

Abstract Major changes in chromosome number and structure are linked to a series of evolutionary phenomena, including intrinsic barriers to gene flow or suppression of recombination due to chromosomal rearrangements. However, chromosome rearrangements can also affect the fundamental dynamics of molecular evolution within populations by changing relationships between linked loci and altering rates of recombination. Here, we build chromosome-level assembly Eueides isabella and, together with a recent chromosome-level assembly of Dryas iulia, examine the evolutionary consequences of multiple chromosome fusions in Heliconius butterflies. These assemblies pinpoint fusion points on 10 of the 20 autosomal chromosomes and reveal striking differences in the characteristics of fused and unfused chromosomes. The ten smallest autosomes in D. iulia and E. isabella, which have each fused to a longer chromosome in Heliconius, have higher repeat and GC content, and longer introns than predicted by their chromosome length. When fused, these characteristics change to become more in line with chromosome length. The fusions also led to reduced diversity, which likely reflects increased background selection and selection against introgression between diverging populations, following a reduction in per-base recombination rate. We further show that chromosome size and fusion impact turnover rates of functional loci at a macroevolutionary scale. Together these results provide further evidence that chromosome fusion in Heliconius likely had dramatic effects on population level processes shaping rates of neutral and adaptive divergence. These effects may have impacted patterns of diversification in Heliconius, a classic example of an adaptive radiation.


Introduction
Structural changes in the genome can be an important factor for speciation (Feulner and De-Kayne 2017), population divergence, and adaptation (De Storme and Mason 2014). Although studies of structural evolution often focus on small to mediumscale structural variants, such as inversions or translocations, large-scale structural changes, like chromosome fusion, can have substantial and immediate evolutionary impacts. The consequences of chromosome fusions occur via two main effects. First, fusions can disrupt meiotic sorting of chromosomes in heterozygous individuals or result in imbalanced gametes, both of which can lead to reproductively isolated "chromosomal races" (Dobzhansky and White 1977;Hauffe and Searle 1998;de Vos et al. 2020). Second, fusion events reduce the number of unlinked DNA molecules, which results in less independence among loci. This lower per chromosome recombination rate may be favored in certain circumstances. For example, fusions may aid adaptation if recombination is reduced between coadapted alleles at multiple loci (Guerrero and Kirkpatrick 2014;Veller et al. 2019Veller et al. , 2020. A beneficial alteration of the recombination landscape might explain why chromosomal fusions could become fixed despite initial deleterious impacts on meiosis in species with monocentric chromosomes (Lunt et al. 2014;Fradin et al. 2017).
The altered recombination rate associated with chromosome fusions can have permanent downstream consequences, affecting both the efficacy of purifying selection (Charlesworth 1990) and the impact of selection on linked sites, which consequently determine levels of genetic diversity (Cutter and Payseur 2013;Corbett-Detig et al. 2015). Indeed, although the effect is modest, genome-wide nucleotide diversity in 38 butterfly species was found to be correlated with chromosome number (Mackintosh et al. 2019). Although neutral DNA is highly susceptible to nonselective evolutionary processes such as variation in recombination rate, functional DNA elements (e.g., genes, enhancers, promoters, etc.) may be more constrained due to selective forces. Turnover of functional DNA (e.g., gain or loss of conserved cis-regulatory elements [CREs]) is strongly associated with divergence time between taxa (Villar et al. 2015;Lewis et al. 2016;Lewis and Reed 2019), but also mechanisms and processes that implicate selection, such as epistasis, gene pleiotropy, adaptive population divergence, and positive selection from environmental pressures (Kim et al. 2016;Van Belleghem et al. 2017;Concha et al. 2019;Lewis et al. , 2020Moest et al. 2020). Thus, the extent to which changes in recombination rate might alter functional DNA evolution remains relatively unknown, and prior studies seemingly support the potential for both strong and weak effects.
In Heliconius butterflies, 10 pairs of the 31 ancestral chromosomes are thought to have fused, resulting in 21 chromosomes, whereas the ancestral chromosome number is retained in their sister genus Eueides and most other outgroup genera within Heliconiini . The impact of these chromosome fusions on the recombination landscape may be especially pronounced in Heliconius due to their holocentric chromosomes where multiple kinetochores, rather than a single centromere, may favor the escape of some of the deleterious effects of chromosome fusions (Mandrioli and Manicardi 2020). There also appears to be a strict rule of one crossover per chromosome per meiosis in this clade . Fused chromosomes should therefore have a reduced recombination rate per base pair (bp) compared with their ancestral, unfused progenitors. In H. melpomene, genetic diversity is strongly and negatively correlated with chromosome length (Martin et al. 2016), and longer chromosomes show reduced levels of introgression between Heliconius species (Edelman et al. 2019;Martin et al. 2019). Therefore, the fusion events that produced the ten longest chromosomes in Heliconius likely altered the evolutionary trajectories of the ten pairs of shorter progenitor chromosomes involved. Indeed,  hypothesized that the altered genome structure may contribute to the elevated speciation rate in Heliconius (Kozak et al. 2015). Here, we aimed to test whether increases in chromosome length have altered levels of nucleotide diversity and rates of evolutionary turnover in Heliconius. We combine our newly available Dryas iulia genome assembly (Lewis et al. 2021), with a chromosome-level assembly of Eueides isabella. We then compare these genomes, which have the ancestral chromosome number of 31, with the genomes of H. melpomene and H. erato. Our findings show that the ten smallest chromosomes in Dryas and Euiedes, which are fused in Heliconius, have a distinct genomic and gene-structural composition. Chromosome fusions are associated with a dramatic change in recombination and background selection. Finally, we show that at a macroevolutionary time scale this had a significant effect on DNA content putatively under selection.

Results and Discussion
Chromosome-Level Assembly of the Eueides isabella Genome and Serial Chromosome Fusion at the Origin of Heliconius Using a combination of high coverage long (Pacific Biosciences reads) and short-read data, we generated a highly complete de novo assembly for E. isabella (supplementary fig. S1, Supplementary Material online). Final chromosome-level scaffolding brought the contiguity to 14.7 Mb with a final genome size of $440 Mb, and very minimal N/X characters (<55 kb) (supplementary figs. S1ÀS4, Supplementary Material online). Genome sequence composition was highly similar between E. isabella, D. iulia, and the H. melpomene and H. erato reference genomes (supplementary table S1, Supplementary Material online). Genome completeness, as measured using the BUSCO database (insect_db9: 1658 gene), was high, with 99.3%, of genes being complete (single þ duplicated) and only 0.5% (9/1658) of genes missing. Again, extremely similar results were found for D. iulia and the two Heliconius genomes (supplementary table S1 and fig. S5, Supplementary Material online). Therefore, these species represent four of the highest quality lepidopteran genomes.
Our E. isabella assembly recovered the 31 expected chromosomes in 31 contiguous sequences. E. isabella chromosomes displayed a highly conserved synteny with D. iulia and few putative inversions (mostly at the end of chromosomes) or translocations ( fig. 1; supplementary figs. S6ÀS8, Supplementary Material online). This allowed us to identify the ancestral-chromosome fusion points in the two Heliconius species with a high degree of accuracy, all of which were close to their previously estimated locations . We also combined multiple annotation methods to predict 26,555 protein-coding genes in the E. isabella genome (supplementary table S1, Supplementary Material online). This estimate is slightly higher than the other Heliconiini genomes, probably due to a higher degree of gene model fission as indicated by the lower distribution of completeness (supplementary fig. S9, Supplementary Material online). Repetitive elements account for 29.80% of the E. isabella genome, which is slightly less than in D. iulia and H. erato (31.74% and 33.02%, respectively), but significantly more than H. melpomene (25.90%; supplementary table S1 and fig. S10, Supplementary Material online) and, in line with other Heliconiini genomes, total interspersed repeats constitute the largest fraction of repetitive elements (23.91%).
Chromosome Fusion Leads to a Change in the Spatial Distribution of Repetitive Elements Comparative analyses of chromosome composition in Lepidoptera has shown that the distribution of repeats, GC content, and gene content do not follow a consistent pattern across chromosome length (Ahola et al. 2014). We explored these features in our assemblies to test for an effect of Cicconardi et al. . doi:10.1093/molbev/msab185 MBE chromosome fusions on chromosome composition. We found that across all four species, chromosome lengths are highly correlated with repeat content (P < 0.0001; Pearson's q ¼ 0.89; supplementary fig. S11, Supplementary Material online). All chromosomes in D. iulia, E. isabella, and the unfused chromosomes in Heliconius have higher GC and repetitive element frequencies toward the chromosome ends. In contrast, coding region density shows little variation across iulia and E. isabella, although the increase is mostly absent in Heliconius spp. (fig. 2b). If CG-rich and repetitive element accumulations in the chromosome's tails were retained after chromosome fusions, then fusion events should produce a W-shaped distribution with peaks centered around the fusion point that reflect the remnants of the ancestral chromosomes. Although a small peak is present at the fusion point (0.5 of the x-axis), this increase is substantially lower 2. (a) GC content (%), repeat content (%) and coding density within 100 kb nonoverlapping windows in not fused chromosomes (NFC) of the four species (showed in color). (b) Same as (a) but for Heliconius fused chromosomes and their homologous in Dryas iulia and Eueides isabella. In both cases, GC and repeat distributions are accumulating at the chromosome tails, whereas coding density seems to be slightly denser in the central part of chromosomes. For fused chromosomes in Heliconius spp. fusion points is placed at 0.5 of the x-axis. (c) Violin plots of the different chromosome types, showing how and where size, CG content (%) and repetitive elements (%) are different. Short-fused chromosomes (SFC) appear to be smaller and with higher GC and repeat content compared with not-fused (NFC) and long-fused (LFC) chromosomes, which, in turn, to not appear significantly different. ***Wilcoxon rank-sum test "less"; P values <0.001.  (fig. 2b). This suggests selection acted to remove repetitive elements in the central part of chromosomes following the fusion events, whereas the difference in magnitude between Heliconius spp. and non-Heliconius spp. is likely due to the overall loss of repetitive elements in Heliconius spp., an observation previously described by Ray et al. (2019). Comparing the ten smallest Heliconiini chromosomes with their homologues in Melitaea cinxia, it seems clear that chromosome lengths have been highly stable over this long evolutionary time period. Our data therefore seem to contradict the notion that holocentric chromosomes have uniform distributions of GC, repeats and gene content across chromosomes (Grbi c et al. 2011;Mandrioli and Manicardi 2020), and contrasts with monocentric chromosomes where these features are compartmentalized to GC-rich and GCpoor regions (Bernardi et al. 1985). Whether the uneven distributions we observe are favored due to meiotic pairing or some other selective driver remains to be tested.

Chromosomes That Were Fused in Heliconius Are Small and Have Distinct Nucleotide Compositions
Previous comparisons between Bombyx mori, M. cinxia, and H. melpomene suggested that lepidopteran chromosomal fusion events involved the shortest chromosomes (Ahola et al. 2014). These tend to have high rates of chromosomal evolution and rearrangement, and high proportions of repetitive elements (see above). These factors might explain the disproportionate involvement of short chromosomes in fusions in Heliconius (Ahola et al. 2014). We used our chromosome-level assemblies of E. isabella and D. iulia to test whether this pattern is consistent in a shallower phylogenetic framework, and to identify how chromosome fusion events have shaped Heliconius genomes.
To test the effects of chromosome fusions on Heliconius genomes, we split the fused chromosomes in Heliconius into their corresponding homologous chromosomes in E. isabella and D. iulia. These were categorized as: 1) not-fused chromosomes (NFC), homologous chromosomes that are compositionally conserved within Heliconiini; 2) long-fused chromosomes (LFC), the longer split chromosome sequence in Heliconius and its ancestral homolog; and 3) short-fused chromosomes (SFC), the shorter split chromosome sequence in Heliconius and its ancestral homolog. Across all species, SFCs are the smallest chromosomes (Wilcoxon rank-sum test "less"; P 4.5 Â 10 À08 ), even in D. iulia and E. isabella (Wilcoxon P < 0.01). Heliconius melpomene has smaller chromosomes overall (Mandrioli and Manicardi 2020) compared with the two outgroups (Wilcoxon P 1.3 Â 10 À11 ), but in H. erato only the fused chromosomes, both LFC and SFC, differ in size from D. iulia and E. isabella Within D. iulia and E. isabella, SFCs (which are not fused in these species) displayed a different composition from other chromosomes, with significantly greater GC and repetitive element content in SFCs (Wilcoxon rank-sum test "greater"; P 6.7 Â 10 À06 ) ( fig. 2c). The relationship between chromosome size and repetitive elements across species and chromosome types (NFC, LFC, and SFC) also shows striking differences. Our analysis revealed significant shifts in the distribution of the four different types of repetitive elements. Retroelements and total interspersed repeats seem to be significantly reduced in Heliconius compared with E. isabella and D. iulia, mainly in SFCs; whereas rolling circles and DNA transposons seem to be reduced specifically in E. isabella (supplementary fig. S11, Supplementary Material online). This indicates that rates of repetitive element expansion/contraction vary predictably between chromosome types. SFCs show particularly high interspecific variation in chromosome composition, with D. iulia having the highest repeat density, as reflected having the smallest chromosomes for a given repeat content (chr. size $ repetitive content, intercept ¼ 0.23 6 0.11; vs. other Heliconiini: P-adjusted 0.0018), followed by E. isabella and H. erato (0.31 6 0.10 and 0.39 6 0.07, respectively), and finally H. melpomene (0.53 6 0.04; P-adjusted 8.2 Â 10 À06 ) ( fig. 2d; supplementary table S2, Supplementary Material online). Hence, the more typical composition of SFCs in Heliconius is explained by SFCs shifting in composition following the fusion events to match those of the larger LFC and NFC chromosomes.
As recently reported in millipedes (Qu et al. 2020), genomes with more repetitive elements show significant expansions in intron size. We explored this effect in Heliconius by looking at the composition of annotated gene models in SFCs. Although transcript, CDS, exon, and UTR lengths show highly similar distributions across species, intron lengths differ between both species and chromosome types. Dryas iulia has significantly longer intron sizes (Wilcoxon rank-sum test "greater"; P < 2.2 Â 10 À16 ; mean: 2,688 bp), followed by E. isabella (Wilcoxon rank-sum test "greater"; P < 2.2 Â 10 À16 ; mean: 2,335 bp), H. erato (Wilcoxon rank-sum test "greater"; P < 2.2 Â 10 À16 ; mean: 1,770 bp), and H. melpomene (mean: 972 bp) (supplementary fig. S15, Supplementary Material online). We also found that, within species, intron length for genes on SFCs are significantly longer than for LFCs or NFCs (Wilcoxon ranksum test "greater"; P 2.63 Â 10 À14 ) (supplementary fig. S15, Supplementary Material online). This pattern is highly similar to that of repeat content, supporting evidence of a correlation between repeat content and intron length (Qu et al. 2020). Indeed, SFC introns contain significantly higher amounts of transposable elements (TE) than would be predicted by their length (P 6.7 Â 10 À09 ; intercept for intergenic regions 0.79 6 0.03; intercept for intronic regions 0.67 6 0.02). In contrast, repeat content for introns and intergenic regions was not significantly different in NFCs and LFCs. For D. iulia and E. isabella, the slope of the relationship between repeat content and introns or intergenic regions showed no significant difference (P ! 0.15). The y-intercept (elevation along the y-axis), however, did indicate a significant change, with intronic intercept $50% lower than intergenic regions (P < 2.2 Â 10 À16 ; supplementary table S4, Supplementary Material online), reflecting the higher amount of repetitive elements in Dryas. Overall, these findings show how repetitive element content Chromosome Fusion Affects Genetic Diversity and Evolutionary Turnover . doi:10.1093/molbev/msab185 MBE influence not only genome size but also the gene structure itself. Why introns are more affected by repetitive elements is not clear. It is possible that purifying selection may be stronger on intergenic regions, perhaps to avoid the disruption of regulatory elements in relatively compact genomes. Although regulatory elements do occur in introns, they are substantially less common in our ATAC-Seq data set (1 per 7,924 bp of intronic sequences, compared with 1 per 2,861bp of intergenic sequence). Alternatively, the presence of TE in intronic regions may favor exonization, the acquisition of new exons from nonprotein-coding, primarily intronic, DNA sequences (Sela et al. 2010;Schmitz and Brosius 2011).

Chromosome Fusions Caused Reductions in Recombination Rate and Levels of Diversity
Although the fused chromosomes in Heliconius tend to be much longer than the unfused chromosomes (by $40-60% in H. erato and H. melpomene, respectively), genetic linkage maps Van Belleghem et al. 2017) reveal that map lengths are consistently close to 50 centimorgan (cM) for all chromosomes, with fused chromosome maps only marginally longer than unfused (by 5% in H. melpomene and 7% in H. erato) (supplementary fig. S18, Supplementary Material online). This implies an average of one crossover per pair of bivalents per meiosis, regardless of chromosome length . If this trend was present in the 31 ancestral chromosomes, this would imply that fused chromosomes, especially SFCs, have experienced a dramatic decrease in their per-base recombination rate. For example, ancestral chromosome 31, an SFC forming part of chromosome 6 in Heliconius, accounts for just 6 cM of the total 48 cM (supplementary fig. S18, Supplementary Material online). We therefore hypothesized that the decrease in recombination rate would result in a decrease in genetic diversity for the SFCs and LFSc, as recombination rate determines the extent to which the processes of background selection and genetic hitchhiking tend to reduce diversity at linked sites (Cutter and Payseur 2013;Campos and Charlesworth 2019). As a proxy for the level of neutral genetic diversity in each of the four species, we computed nucleotide diversity (p) at 4-fold degenerate third codon positions (hereafter p 4D ) using resequenced individuals (see supplementary table S5, Supplementary Material online for sample information and accession numbers). There is a strong negative relationship between chromosome length and p 4D in D. iulia (R 2 ¼ 0.506, S19, Supplementary Material online), consistent with lower perbase recombination rates on longer chromosomes resulting in greater levels of background selection and/or hitchhiking (Martin et al. 2016). This trend was not seen in E. isabella, which generally shows low genetic diversity across all chromosomes. It is possible that a reduction in effective population size has reduced the efficacy of background selection in E. isabella such that chromosome-level recombination rate is no longer a good predictor of diversity. We therefore focused on D. iulia as representative of the ancestral state for comparison with the Heliconius species.
Relative levels of diversity tend to be reduced on both the LFCs and SFCs in Heliconius compared with their unfused homologues in D. iulia ( fig. 3a and b). The difference is most pronounced for the SFCs, which tend to have experienced the greatest decrease in recombination rate following the fusion events. Indeed, assuming map lengths of 50 cM for D. iulia chromosomes, we can estimate recombination rates for each chromosome before and after the fusions. This estimated reduction in recombination rate is a strong predictor of the reduction in relative levels of diversity (Spearman's q ¼ 0.757, P ¼ 3 Â 10 À6 for H. melpomene; q ¼ 0.662, P ¼ 1 Â 10 À4 for H. erato; fig. 3c and d).
Our findings demonstrate that the ten fusion events in the ancestor of all Heliconius species resulted in a dramatic change in the fate of the chromosomes involved. First, the reduced effective population size caused by increased background selection likely also leads to a reduction in the efficacy of selection. Adaptive evolution may be further impeded by HillÀRobertson interference between tightly linked selected loci (Hill and Robertson 1966;Barton 1995). Evidence for less efficient selection in genomic regions of reduced recombination rate has been reported in Drosophila (Presgraves 2005;Haddrill et al. 2007;MacKay et al. 2012). Given that the gene complement of each fused chromosome in Heliconius is largely unchanged from that in outgroup species, we hypothesize that the fusions may have resulted in a systematic shift in the relative importance of certain genes for adaptation. A second side-effect of reduced recombination rate in the fused Heliconius chromosomes is an increase in barriers against introgression between hybridizing species (Edelman et al. 2019;Martin et al. 2019). This occurs because introgressed tracts take longer to break down in regions of lower recombination rate and are consequently more rapidly purged by selection. This raises the intriguing possibility that the fusion events caused some regions of the genome to become less permeable to gene flow .
Combined with our assessment of TE abundance, our findings contrast with the widespread trend in other organisms for repetitive elements to be more common in regions of low recombination rate (reviewed by Kent et al. 2017). Instead, the ten smallest lepidopteran chromosomes have the highest recombination rates and the highest repeat content, and their repeat content appears to have decreased following the fusions events in Heliconius that lowered their recombination rates. It is generally thought that more efficient selection in regions of higher recombination rate should limit the spread of TEs. We therefore hypothesize that the efficiency of selection is not the only factor at play. For example, the rate of TE insertion may be greater on these small chromosomes, outweighing the effect of more efficient selection against them.
Chromosome Fusions Alter the Rate of Turnover in Functional DNA Finally, we sought to assess how the ten chromosome fusions in Heliconius have affected loci putatively under selection. Although it is clear that chromosome fusions can drive a decrease in neutral nucleotide diversity, the extent to which the recombination landscape alters the evolution of We therefore assessed evolutionary turnover of accessible chromatin and protein coding exons to test whether the fusion of Heliconius chromosomes had a similar effect on functional loci as we observed at neutral sites. We used DNA sequence of CREs and gene exons from D. iulia to set between the inferred ancestor and the extant species H. melpomene (c) and H. erato (d) plotted as a function of the relative decrease in recombination rate following the ten fusion events. The ancestral relative change in diversity and relative decrease in recombination rate are calculated assuming that the ancestral chromosomes had lengths and levels of diversity equal to those in D. iulia, and recombination map lengths of 50 cM each (see main text). Points further to the right indicate chromosomes that have experienced a greater decrease in recombination rate. Note that in H. melpomene the NFCs show negative values on the x-axis, indicating an overall increase in recombination rate, due to the genomesize reduction in H. melpomene. Nonetheless, the trend that fused chromosomes (especially SFCs) have experienced the greatest decrease in relative recombination rate and relative diversity is present in both H. melpomene and H. erato. A fitted linear regression line is included for convenience.
Chromosome Fusion Affects Genetic Diversity and Evolutionary Turnover . doi:10.1093/molbev/msab185 MBE a baseline for functional DNA turnover in the Heliconiini lineage. As expected, divergence time strongly affected conservation of both functional categories. We observed a similar degree of conservation between annotated D. iulia functional DNA and the E. isabella and Heliconius genomes, which represent 26 million years of sequence divergence ( fig. 4a). Conservation continued to decrease with divergence time in comparisons against M. cinxia (65 million years diverged) and Danaus plexippus (85 million years diverged). For each species comparison, however, conservation of both exons and CREs displayed a strong correlation with chromosome length (Pearson's q ¼ 0.42-0.78). The correlation between functional DNA conservation and chromosome length decayed slightly with increased divergence time, most notably in exons, which showed a higher overall correlation (e.g., q ¼ 0.78 for H. melpomone DNA) at lower divergence times. Thus, although some additional forces may affect functional DNA turnover, these results confirm that functional DNA elements remain subject to the same effects of recombination rate as neutral loci over macroevolutionary time.
To determine the extent to which chromosome fusions, and change in recombination rate, may have altered the turnover of functional DNA in Heliconius, we performed the same DNA conservation analysis using CREs and exons from H. erato. Testing the complete H. erato chromosomes allowed us to determine the effect of changes in recombination rate on functional DNA content over the past 11À19 million years postfusions. In contrast to our analysis of D. iulia functional DNA, conserved H. erato DNA showed a fairly strong correlation with chromosome length against the H. melpomene genome (Exons: q ¼ 0.62, CREs: q ¼ 0.75), but a substantial loss of association between chromosome length and DNA conservation between H. erato and more distantly related species with 31 chromosomes such as E. isabella (Exons: q ¼ 0.26, CREs: q ¼ 0.40).
We next examined the relationship between DNA sequence length and functional DNA turnover in the ancestral chromosome segments of the ten H. erato chromosomes that resulted from chromosome fusions. Functional DNA sequence conservation for these chromosome segments displayed a weaker correlation with sequence length against the H. melpomene assembly, but substantially greater correlation against the unfused heliconiine species and M. cinxia assemblies. In more diverged comparisons, correlation coefficients fell between the full H. erato chromosome values (supplementary fig. S20, Supplementary Material online, Pearson's q ¼ 0.41-0.74) and those from our analysis of D. iulia functional DNA. The effect size (slope of the relationship) of chromosome length on functional DNA turnover remained similar to that observed for the complete H. erato chromosomes. Together, these results suggest Heliconius may be at an intermediate stage where the evolutionary response of functional DNA turnover to changes in recombination rate is ongoing.
The reduced correlation between chromosome length and functional DNA conservation suggests that the relatively recent change in recombination rate in Heliconius may not have impacted functional DNA turnover to the same extent as stable recombination rates over a much longer period of time. To test this more explicitly, we performed multiple linear regression of D. iulia ( fig. 4c) and H. erato (fig. 4d) functional elements to determine the effect of chromosome length relative to divergence time. Multiple regression models for both D. iulia and H. erato functional elements accounted for the substantial majority of variance in functional DNA conservation (r 2 ¼ 0.81-0.95). There was, however, a marked and significant (paired t-test, P < 0.01) difference in the effect size of chromosome length between regression models of D. iulia and H. erato functional DNA. The impact of chromosome length and, by extension, recombination rate on DNA conservation was approximately seven times greater in D. iulia than for H. erato. The contrast between D. iulia and H. erato exon and CRE turnover thus suggests that recombination rate has a significant effect on DNA content putatively under selection, but that changes in recombination rate are likely more impactful on average rates of change across chromosomes over long periods of macroevolution than over shorter divergence times within genera. This impact may not be homogenous across the chromosome, however, and local changes in recombination rate may be important over microevolutionary timescales.

Conclusions
Our analyses of four high-quality chromosome-level assemblies for D. iulia, E. isabella, H. erato and H. melpomene reveal important insights into the evolutionary impact of chromosome fusion. We have shown that Nymphalidae chromosomes differ significantly by size, both in terms of nucleotide composition and gene structure. Specifically, the ten short ancestral chromosomes had the highest repetitive element content, which possibly made them more likely to fuse with longer, more stable chromosomes, as occurred in the ancestor of Heliconius. This is likely a general property of small chromosomes, at least in Lepidoptera, as similar conclusions were drawn by comparative analyses of the M. cinxia genome (Ahola et al. 2014). Interestingly, both whole genomes and individual chromosomes with higher repetitive element content have longer introns on average, which appears to be driven by a higher density of repeats in introns compared with intergenic regions-a feature that, in theory, could also lead to the formation of new functional exons. We further show that the ten fusion events resulted in a dramatic change in the fate of the chromosomes involved, reducing the effective population size by increasing background selection. Finally, we explored whether chromosome fusions may have also changed rates of evolutionary turnover of accessible chromatin and protein coding exons and found that both exon and CRE conservation displayed a strong correlation with chromosome length, confirming that functional DNA elements are subject to the same effects of recombination rate over macroevolutionary time. These findings, and the genomic resources we provide, are not only important to understand how genomic architecture impacted lepidopteran evolution, but also to understand how the karyotype can affect species evolution at a broader biological scale.

DNA and RNA Extraction and Sequencing
Individuals of E. isabella were collected from partially inbred commercial stocks (Costa Rica Entomological Supplies, Alajuela, Costa Rica). High-quality, high-molecular-weight genomic DNA was extracted from late stages of pupae, dissecting up 100 mg of tissue, mainly thorax and wing imaginal disk, frozen in liquid nitrogen and homogenized in 9.2 ml buffer G2 (Qiagen Midi Prep Kit) adding 19 ml of RNAseA. The samples were then transferred in a 15 ml tube adding 0.2 ml of Protease K and incubated at $50 C for 2 h. Samples were transferred to Genomic tip and processed with Qiagen Midi Prep Kit (Qiagen, Valencia, CA) following the manufacturer's instructions. DNA was then precipitated using 2 ml 70% EtOH and dissolved in water.
From the same stocks, RNA was extracted separately from six adult tissue (four wings; three heads; four antennae, legs and mouth parts; thorax; abdomen 1-3, abdomen 4-6), and five tissue parts from early ommochrome stage of the pupae (head and mouth parts; wings, antennae and legs; thorax; abdomen 1-3, abdomen 4-6). Each tissue was frozen in liquid nitrogen and quickly homogenized in 500 ml Trizol, adding the remaining 500 ml Trizol at the end of the homogenization. Phase separation was performed adding 200 ml of cold chloroform. The upper phase was then transfer to RNeasy Mini spin column and processed with Qiagen RNeasy Mini Prep Kit (Qiagen, Valencia, CA) and DNAse purification using the Turbo DNA-free kit (Life Technologies, Carlsbad, CA) following the manufacturer's instructions. All the extractions were finally pooled keeping the same final RNA concentration from all samples.
The Pacific Bioscience (PacBio) data were then sequenced at the Centre for Genomic Research, University of Liverpool using PacBio sequel SMRT cell (2.0 chemistry), whereas
To further error correct for indels we used short Illumina reads for E. isabella (Kozak et al. 2021) with Pilon v.1.23 (Walker et al. 2014) for five iterations. Assemblies were then processed with Purge Haplotigs v.20191008 (purge -a 85) (Roach et al. 2018) in order to remove haplocontigs, artificially duplicated genomic regions due to heterozygosity. To correct for mis-assemblies Polar Star (https://github.com/phasegenomics/polar_star) was employed. This algorithm calculates read depth of aligned PacBio reads to the assembly at each base. Read depth is then smoothed in a 100 bp sliding window, and regions of high, low, and normal depth are merged. Low read depth outliers are identified, and contigs are broken at each such location. Contigs were then rescaffolded using P_RNA_scaffolder (Zhu et al. 2018), which uses information from RNAseq mapping, and LRScaf v.1.1.5 (Qin et al. 2018), which uses information from long-reads. The resulting gaps were then filled using PacBio reads applying LR_Gapcloser v.1.1 (Xu et al. 2019). After the introduction of this new PacBio information, we repeated the previous polishing procedure using five iterations of Pilon, plus three more iterations with Illumina RNA-seq data to correct indels only. Before the chromosome-level scaffolding, we used synteny maps implemented with BLAST (Camacho et al. 2009) and ALLMAPS (Tang et al. 2015) to identify duplicated regions at the end of scaffolds, trimming them away. Illumina RAD-seq data from  were downloaded and aligned to the scaffolds of E. isabella assembly using STAR v.2.7.2c (Dobin et al. 2013) to generate the linkage map for the final scaffolding step. Duplicate reads were removed using the Picard v.2.0.1 MarkDuplicates tool (http://broadinstitute.github.io/picard). Indels were subsequently realigned using the RealignerTargetCreator and IndelRealigner tools from GATK v.3.8-1-0-gf15c1c3ef (Depristo et al. 2011). Individual SNPs were called using mpileup in SAMtools (Li et al. 2009) in combination with pileupParser2 and pileup2posterior modules from Lep-MAP v.3 (Rastas 2017 Transcriptome Annotation RNA-seq raw read data were filtered using Trimmomatic v.0.39 (Bolger et al. 2014) (ILLUMINACLIP:$ILLUMINACLIP: 2:30:10; SLIDINGWINDOW: 5:10; MINLEN: 100). RNA-Seq data were used as training data input to ab initio and de novo predictors and for direct alignments, and were mapped to the genome using STAR v.2.7.2c aligner (alignIntronMax ¼ 500,000; alignSJoverhangMin ¼ 10). The resultant sorted BAM files were used as training for the BRAKER v.2.1.5 pipeline (Hoff et al. 2016), which combines GeneMark-ES Suite v.4.30 (Lomsadze et al. 2005) andAUGUSTUS v.3.3.3 (Stanke et al. 2008), along with the masked genomes, generated with RepeatMasker v.4.1.0 (Smit et al. 2013) using the Lepidoptera database and protein alignments from closely related, or model species (Drosophila melanogaster, B. mori, Bicyclus anynana, D. plexippus, H. erato, and H. melpomene) using Exonerate v.2.4.0 (Slater and Birney 2005). BRAKER is an automated pipeline to predict genes that uses iterative training of AUGUSTUS to generate initial gene predictions. GeneMark-predicted genes are filtered and provided for AUGUSTUS training, followed by AUGUSTUS prediction, integrating the RNA-Seq and protein alignment information, to generate the final gene predictions.
To generate the de novo transcriptome assemblies, quality filtered reads assembled using Trinity v.2.10.0 (Grabherr et al. 2011) and generated transcript were subsequently aligned to the genome using Minimap2 v.2.17-r974-dirty (Li 2018). To generate the ab initio transcriptomes, reads were realigned to the genome using STAR with the same parameters with Braker predicted splice sites, and assembled using Stringtie v.2.1.3b (Pertea et al. 2015) and Cufflinks v.2.2.1 (Trapnell et al. 2010(Trapnell et al. , 2012 for each sample. The assemblies derived from all samples within each program were merged using Stringtiemerge. All such generated transcriptomes (Trinity, BRAKER, StringTie, and Cufflinks) were merged used with STAR to remap once again all raw reads in order to evaluate all detectable splice-sites with Portcullis v.1.1.2 (Mapleson et al. 2018). Transcripts from ab initio and de novo assemblies (Trinity, StringTie, and Cufflinks) without supported splicejunctions were therefore filtered out, and only transcripts with unique intron chain were retained from all annotations.

Chromosome Analysis
Chromosome mapping was carried out using orthologous genes between D. iulia, E. Isabella, H. erato and H. melpomene to define the level of gene conservation and translocations among chromosomes. Synteny maps were implemented with BLAST and ALLMAPS using Minimap2 to map H. erato and H. melpomene loci to the D. iulia and E. isabella assemblies. This let us identify the putative junctions between the two fused chromosome pairs. The midpoints between the flanking orthologs of different homologous chromosomes were used to split the fused chromosomes in H. erato and H. melpomene in two, recreating the ancestral chromosome set. Chromosome mappings are illustrated using Circos (Krzywinski et al. 2009).
We explored the scaling relationship between chromosome size and repetitive elements across species and chromosome types (NFC, LFC and SFC) using SMATR v.3.4-8 (Warton et al. 2012), together with a WilcoxonÀMannÀWhitney rank sum test as implemented in the R function WILCOX.TEST (http://www.r-project.org) v.3 to test for differences in the relationships between chromosome size, GC content, and repetitive element content among chromosome types (chromosomes that were fused/ unfused in Heliconius). In both cases, we adopted a stringent P-value threshold of 0.005 (Benjamin et al. 2017). We also calculated GC, repetitive element and coding-sequence (CDS) content within nonoverlapping 500 kb sliding windows for the chromosomes of all the species, with the order and orientation of the chromosomes determined based on synteny to H. melpomene. We used the annotation and an in house python script to extract strand specific intronic regions (BED format), whereas to extract strand specific intergenic regions the initial annotation was split into plus and minus strands and BEDTools complement v.2.29.0 (Quinlan and Hall 2010) used to generate intronic and intergenic regions. For both annotations BEDTools getfasta were used to extract their relative sequences analyzed with RepeatMasker. Their relative scaling coefficients and intercepts were subsequently analyzed with SMATR as reported above.

Analysis of Genetic Diversity
To investigate whether chromosome fusions are associated with long-term shifts in levels of genetic diversity, we analyzed genome resequencing data from three to six additional individuals of each species (see supplementary table S1, Supplementary Material online for sample details and accession numbers). For newly sequenced individuals, DNA extraction was performed using the Qiagen DNeasy Blood and Tissue kit and libraries were prepared following a low-input Nextera-based DNA library prep protocol (skim sequencing) at the Cornell Institute of Technology Core Facility. Libraries were sequenced on a HiSeq4000. Reads were aligned to their respective reference assemblies using BWA MEM version 0.7.17 (Li 2013). Processing and sorting of SAM and BAM files was performed using SAMtools version 1.9 (Li et al. 2009), and duplicate reads were removed using Picard MarkDuplicates version 2.21.1 (https://broadinstitute.github.io/picard/). For genotyping we targeted only coding sites, as we were specifically interested in comparing levels of diversity among 4-fold degenerate third codon positions, which provide the most reliable comparison of relative diversity at nearly neutrally evolving sites in the genome. We therefore first defined the set of nonoverlapping coding intervals in the genome based on the annotations and then used bcftools call version 1.9 (Li 2011) with the -m option (https://samtools.github.io/ bcftools/call-m.pdf) to call genotypes for this subset of sites. Only individual genotypes with read depth >10 and genotype quality (GQ) >30 were considered, except in D. iulia, where lower coverage sequencing meant that these thresholds had to be decreased to 5 and 20, respectively.
Four-fold degenerate sites ("4D sites" hereafter) were defined as third codon positions at which a substitution to any other base would not alter the encoded amino acid. This condition was evaluated by considering not only the codon sequence in the reference assembly, but also any variants observed in the resequenced individuals, and codons with more than one valiant site were automatically discarded, resulting in the most conservative possible set of 4D sites. These steps were implemented using the script codingSiteTypes.py available at (https://github.com/simonhmartin/genomics_general). Nucleotide diversity was computed in nonoverlapping windows of 500 4D sites each, using the script popgenWindows.py (https://github.com/ simonhmartin/genomics_general). In D. iulia, because three of the re-sequenced individuals were from a small Key Largo population that has reduced diversity, we instead used pairwise diversity between one Key Largo individual and individual used for genome assembly (Lewis et al. 2021).

Analysis of Functional DNA Turnover between Species
To determine the extent to which changes in per base pair recombination rate may have alter functional DNA turnover between species, we tested for sequence conservation of cisregulatory loci and gene exons in species with and without the ten chromosome fusion events. CREs were derived from ATAC-seq data for midpupal wing tissue from H. erato and D. iulia (Lewis and Reed 2019; Lewis et al. 2021) ATAC-seq data were processed as previously described  and peaks were called using F-seq (Boyle et al. 2008). To reduce discrepancies between genome assembly annotations, which may have different degrees of gene fragmentation, we used annotated exons from H. erato and D. iulia to test for gene coding elements (Lewis et al. 2021). A custom script was then used to perform a reciprocal best-hit BLAST search with an e-value threshold of e À10 on ATAC-seq peaks and gene exons to identify conserved functional DNA sequences. In this analysis, 30% conservation would indicate that 30% of the source genome elements tested were found by our BLAST search in the target genome assembly. Peak and exon sets were split into files for each of the 30 (D. iulia) and 20 (H. erato) chromosomes. These data sets were then BLASTed against the H. erato lativitta (Lewis et al. 2016

Data Availability
The sequence data from this study have been submitted to the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) under BioProject numbers PRJNA685624, PRJNA691346, and PRJNA691348 and PRJNA732066 (https://www.ncbi.nlm.nih.gov/bioproject/). The Eueides isabella genome assembly has been submitted to the NCBI (Submission ID: SUB8748660). All accession codes of deposited and retrieved data are provided in supplementary table S5, Supplementary Material online.