Origin and Consequences of Chromosomal Inversions in the virilis Group of Drosophila

Abstract In Drosophila, large variations in rearrangement rate have been reported among different lineages and among Muller’s elements. Nevertheless, the mechanisms that are involved in the generation of inversions, their increase in frequency, as well as their impact on the genome are not completely understood. This is in part due to the lack of comparative studies on species distantly related to Drosophila melanogaster. Therefore, we sequenced and assembled the genomes of two species of the virilis phylad (Drosophila novamexicana [15010-1031.00] and Drosophila americana [SF12]), which are diverging from D. melanogaster for more than 40 Myr. Based on these data, we identified the precise location of six novel inversion breakpoints. A molecular characterization provided clear evidence that DAIBAM (a miniature inverted–repeat transposable element) was involved in the generation of eight out of the nine inversions identified. In contrast to what has been previously reported for D. melanogaster and close relatives, ectopic recombination is thus the prevalent mechanism of generating inversions in species of the virilis phylad. Using pool-sequencing data for three populations of D. americana, we also show that common polymorphic inversions create a high degree of genetic differentiation between populations for chromosomes X, 4, and 5 over large physical distances. We did not find statistically significant differences in expression levels between D. americana (SF12) and D. novamexicana (15010-1031.00) strains for the three genes surveyed (CG9588, Fig 4, and fab1) flanking three inversion breakpoints.


Introduction
Chromosomal inversions are common in many groups of animals, although the production of nonfunctional gametes during meiosis in individuals heterozygous for such rearrangements may lead to reduction in fertility (Powell 1997). In Drosophila, inversions are widespread because recombination is generally suppressed in males, and in females aberrant recombinant products preferentially contribute to the polar body nurse cells. Therefore, the expected reduction in fertility is not so obvious in the Drosophila genus (Powell 1997), making this group of species ideal to study the mechanisms that are involved in the generation of inversions, their increase in frequency, as well as their impact on the genome.
Chromosomal inversions can be generated by chromosomal breakage and erroneous repair of the free ends by nonhomologous end-joining (Sonoda et al. 2006), and this is the prevalent mechanism observed in Drosophila melanogaster and its close relatives (Ranz et al. 2007). However, chromosomal inversions can also be generated by ectopic recombination (or nonallelic homologous recombination) between multiple copies of DNA sequences present in the same chromosome in opposite directions. These DNA sequences can be transposable elements (TEs) (Kupiec and Petes 1988;Lim and Simmons 1994;Delprat et al. 2009;Rius et al. 2013), segmental duplications, or short repeat sequences (Richards et al. 2005;Caceres et al. 2007;Coulibaly et al. 2007). In Drosophila, TEs were implicated in the generation of chromosomal inversions in Drosophila buzzatii (Caceres et al. 1999;Casals et al. 2003;Delprat et al. 2009), Drosophila americana (Evans et al. 2007;Fonseca et al. 2012), Drosophila virilis (Fonseca et al. 2012), as well as in Drosophila mojavensis and Drosophila uniseta (Rius et al. 2013).
Large variations in chromosomal inversions rate have been reported among different lineages of Drosophila (Powell 1997; Bartolome and Charlesworth 2006;Papaceit et al. 2006;Gonzalez et al. 2007;Ranz et al. 2007;Bhutkar et al. 2008) and between Muller's elements (Powell 1997;Vieira et al. 1997;Papaceit et al. 2006). Differences in mutation rate and fitness effects of chromosomal inversions have mostly been proposed as explanations for the observed discrepancies. Chromosomal inversions may increase in frequency in populations due to direct mutational effects associated with their breakpoints (the "position effect" hypothesis) (Sperlich and Pfreim 1986). According to Guillen and Ruiz (2012), this is a likely effect given the high gene density and compact structure of Drosophila genomes. The analysis of the genes around inversion breakpoints may, thus, reveal the targets of selection (Guillen and Ruiz 2012). Nevertheless, chromosomal inversions reduce recombination to some extent around inversion breakpoints (Navarro et al. 1997;Evans et al. 2007) which may also help keeping alleles together at loci with epistatic effects on fitness (the "coadaptation" hypothesis) (Dobzhansky 1970). Moreover, inversions may capture locally adapted sets of genes protecting them from recombination with immigrant chromosomes (Kirkpatrick and Barton 2006;Kirkpatrick 2010), which may facilitate population differentiation and ultimately lead to speciation (Noor et al. 2001;Feder et al. 2011;McGaugh and Noor 2012).
Our current understanding of the molecular basis of the generation and evolution of chromosomal inversions is still limited, mostly due to the lack of extensive comparative studies involving distantly related species. Species of the virilis group are diverging from D. melanogaster for at least 40 Myr (Morales-Hojas and Vieira 2012), and thus are suitable to test the generality of the observations made for D. melanogaster and closely related species. This group is divided into two large phylads, namely the montana and the virilis phylads, and consists of 13 recognized species (Morales-Hojas et al. 2011). Species of the virilis phylad (D. virilis, Drosophila lummei, Drosophila novamexicana, and D. americana) have been diverging for <4.1 Myr but show contrasting patterns regarding inversions. There are 14 chromosomal inversions in species of the virilis phylad located on Muller's elements A (Xa, Xb, Xc, and Xd), B (4a, 4 b, 4c, and 4d), C (5a and 5b), D (3a), and E (2a, 2 b, and 2c) (Hsu 1952;Throckmorton 1982) (fig. 1). Inversions Xa, Xb, and 2a are fixed between D. americana/ D. novamexicana and D. virilis (Throckmorton 1982). Inversions Xc, 2 b, 2c, 3a, 4a, and 5 b that are apparently fixed between D. virilis and D. novamexicana are polymorphic in D. americana. Inversions Xc, 2 b, 4a, and 4 b are present at different frequencies in D. americana northwestern, central, and southern populations and all have estimated frequencies higher than 5% in this species as a whole or among populations. Inversions Xd, 2c, 3a, and 4c show estimated frequencies lower than 5% and are present in northwestern D. americana populations only (Hsu 1952). Throckmorton (1982) reported the presence of an additional polymorphic inversion (4d) in this species that is probably rare because it was not identified by Hsu (1952). Inversion Xc overlaps the distal part of Xb inversion, and the small inversion Xd is included both in Xb and Xc. Inversion 4 b is a small inverted segment completely included in the 4a inversion, and it is only found in 4a inverted chromosomes. The other inversions are nonoverlapping when they co-occur (Hsu 1952). Based on a large number of strains, no polymorphic inversions have been described for D. virilis (Hsu 1952). No polymorphic inversions have been reported for D. novamexicana as well, and a single  [Fonseca et al. 2012] and 2a [Throckmorton 1982]). Inversion Xb (green) occurred before the split between D. lummei and the americana complex (D. novamexicana and D. americana), and it is fixed in the americana complex (Hsu 1952). Inversion 4a (orange) occurred before the split between D. lummei and the americana complex, it is polymorphic in D. americana, and it is fixed in D. novamexicana (Hsu 1952). Inversion 5a (gray) occurred before the split between D. lummei and the americana complex, it is polymorphic in D. americana, but it is absent in D. novamexicana (Hsu 1952). Inversions showed in red occurred after the split between D. lummei and the americana complex, and they are polymorphic in D. americana and fixed in D. novamexicana (Hsu 1952). All the other inversions (black) occurred in the D. americana lineage, and the underlined inversions have frequencies below 5% (Hsu 1952). Inversion 4d was reported by Throckmorton (1982) but was not identified by Hsu (1952). The age estimates are based on Spicer and Bell (2002), Caletka and McAllister (2004), Morales-Hojas et al. (2006), Vieira et al. (2006) polymorphic inversion has been reported for D. lummei, although in both cases very few strains have been analyzed (Hsu 1952;Throckmorton 1982). Inversions Xb, 4a, and 5a are present in D. lummei (Throckmorton 1982) and thus must be older than the D. lummei/D. americana lineage split (2.9 Myr) (Morales-Hojas et al. 2011). All inversions mentioned here are younger than the common ancestor of virilis and the other species that lived 4.1 Ma (Morales-Hojas et al. 2011). The Xc inversion is in between 0.27 and 1.6 Myr old (Spicer and Bell 2002;Caletka and McAllister 2004;Vieira et al. 2006;Morales-Hojas et al. 2008. The DAIBAM miniature inverted-repeat transposable element (MITE) has been shown to be involved in the origin of D. americana polymorphic inversions 4a (Evans et al. 2007) and 5a (Fonseca et al. 2012), as well as the Xa inversion that is fixed between D. virilis and D. americana (Fonseca et al. 2012). Therefore, DAIBAM is responsible for at least 20% of the chromosomal inversions that are observed within and between species of the virilis phylad (Fonseca et al. 2012). These observations suggest that ectopic recombination between TEs may be the prevalent mechanism underlying inversion generation in this group of species. This mechanism is very different from that reported in D. melanogaster and close relatives and might be the cause of the observed discrepancies regarding chromosomal inversion rates in the virilis phylad. Nevertheless, only few chromosomal inversions in this phylad have been characterized in detail. The characterization of the remaining inversions will contribute to a more comprehensive understanding of the origin, rise to high frequency, and consequences of inversions at the genomic scale. Therefore, we sequenced the genome of a D. americana strain from the south of the distribution (SF12), as well as the genome of one D. novamexicana strain (15010-1031.00). In addition, we obtained pool-sequencing (pool-seq) data for northwestern, central, and southern populations of D. americana to characterize the impact of chromosomal rearrangements on differentiation patterns at the genome level.

Materials and Methods
Genome Sequencing, Assembly, and Inversion Breakpoint Identification Genomic DNA was isolated from one set of ten females for both D. americana (strain: SF12, established with a single female collected in the summer of 2010 in Saint Francisville, LA) and D. novamexicana (strain: 15010-1031.00, obtained from the Tucson stock center in 1995 and kept in the lab since then, originally collected in Colorado) using a phenol-chloroform extraction protocol. DNA integrity was checked by gel electrophoresis. Library preparation and sequencing was performed at the Transcriptome Analysis Laboratory of the University of Gö ttingen. Briefly, 1 mg of genomic DNA was quantified using Quanti Flour (Promega) and diluted in 100 ml of TE Buffer. DNA was sonicated to 350 bp using the Nano NGS-Bioruptor (Diagenode). Library preparation was performed using 100 ng of sonicated DNA using the TruSeq DNA Nano Library Preparation Kit from . Libraries' sizes were evaluated using the Bioanalyzer 2100 (Agilent), and the quantities were estimated using the Quanti Flour (Promega). Libraries were diluted to 8 pM and sequenced with the MiSeq System (MS-102-2003) using the 500 cycle kit and v3 Reagents (2 Â 250 bp).
The quality of the obtained draft genomes compares well with the previously published D. americana (H5 and W11) draft genomes (supplementary table 1, Supplementary Material online) that were estimated to cover about 80% of the D. americana euchromatic genome (Fonseca et al. 2013). Further support for the comparable quality of these draft genomes comes from the annotation based on the D. virilis 20,302 coding sequences (CDS) that are encoded by 13,374 genes (http://flybase.org FB2017_05). Using these CDS as reference, and the Splign-Compart pipeline as implemented in BDBM (http://sing.ei.uvigo.es/BDBM/index.html), we were able to annotate 14,487 complete CDS in D. novamexicana (15010-1031.00) as well as 13,329, 13,878, and 12,007 complete CDS in the D. americana (SF12, H5, and W11) genomes, respectively (supplementary table 2, Supplementary Material online). The completeness of the D. novamexicana (15010-1031.00) and D. americana (SF12) genomes assembled here is thus similar to that of the D. americana (H5 and W11) draft genomes. Under the assumption that all CDS could be annotated this way, this analysis suggests that the genomes considered here cover between 59% and 71% of the genome. We also assessed assembly completeness for D. novamexicana (15010-1031.00) and D. americana (SF12) with BUSCO v3.0.2 (Simão et al. 2015;Waterhouse et al. 2017) employing NCBI-BLAST v2.6.0 (Camacho et al. 2009), HMMER v3.1b2 (Eddy 2011), and AUGUSTUS v3.2.2 (Keller et al. 2011). Default settings and the Diptera OrthoDB v9 database (Zdobnov et al. 2017) were used. A total of 2,799 BUSCO groups were searched; 97.0% and 92.3% of the BUSCOs were complete and only 0.3% and 0.4% were duplicated in D. novamexicana (15010-1031.00) and D. americana (SF12) assemblies, respectively. The percentage of fragmented BUSCOs was 2.0% and 6.1%, while only 1.0% and 1.6% of the BUSCOs were missing in D. novamexicana (15010-1031.00) and D. americana (SF12) assemblies, respectively.
Inversion breakpoints were identified using an alignment approach. The expected karyotypes of D. novamexicana (15010-1031.00) and D. americana (SF12) are Xbc; 2bc; 3a; 4a; 5b; 6 and Xb; 2; 3; 4; 5a; 6, respectively. As the expected karyotype of D. virilis is Xa; 2a; 3; 4; 5; 6, we used the assembly of D. novamexicana (15010-1031.00) to locate inversions Xc, 2 b, 2c, 3a, 4a, and 5 b, the assembly of D. americana (SF12) to locate inversion 5a, and both assemblies to locate inversions Xa, Xb, and 2a. The scaffolds obtained for D. novamexicana (15010-1031.00) and D. americana (SF12) were ordered using the program Mauve Contig Mover implemented in Mauve v2.4.0 (Rissman et al. 2009) with the D. virilis sequence of each chromosome (Clark et al. 2007) as references. These chromosomes were obtained by using the D. virilis anchored scaffolds (Schaeffer et al. 2008). Moreover, we relocated the region corresponding to the beginning of scaffold 12875 from position 1 to 1.7 Mb on the 5th chromosome based on information provided by Schaeffer et al. (2008). Then, the alignment was manually inspected and the contigs which aligned in two different regions of the same chromosome were identified. The presence of a breakpoint was further confirmed by performing local BlastN (Camacho et al. 2009) or BlastN of these contigs against the D. virilis genome (http://flybase.org/FB2017_05) and the sequences annotated (supplementary files 1-10, Supplementary Material online).

Pool-seq and Population Differentiation Analyses
To estimate differentiation levels between different populations of D. americana, we selected a total of 70 strains from different geographical origins in the United States. We used 25 isofemale lines from the northwest, 23 from the center, and 22 from the south of the distribution. The northwestern lines were established with flies collected at the end of July 2008 in Freemont, Nebraska, while the lines from the center were established with flies collected in the summer of 2004 in Howell Island and Lake Wappapelo, MO. The southern strains were established with flies collected in the late spring of 2005 in Corney Bayou and Cat Island, LA, as well as in the summer of 2010 at Pearl River, MS by Bryant McAllister (Iowa University, Iowa City, IA) who kindly sent us the strains. In addition to these lines, we also used strains established with flies collected in the summer of 2010 in Saint Francisville, LA. We used two females of each strain to prepare three independent pools (North, Center, and South) containing flies from northwestern, central, and southern populations, respectively. The genomic DNA was isolated using a phenolchloroform extraction protocol. DNA integrity was checked by gel electrophoresis. Library preparation and sequencing was performed at the Beijing Genomics Institute. Illumina pairedend libraries with an average fragment size of 300 bp were generated following the instructions of the Illumina Paired-End Sample Preparation Guide (Illumina, San Diego, CA) and sequenced (2 Â 101 bp). We started the analysis with 12 FASTQ files (four for each population) totaling 49,903,200, 58,574,986, and 45,780,837 paired-end reads for the samples from the north, center, and south, respectively. The raw reads are available at ENA ERR2610693, ERR2610694, and ERR2610695. Quality checks and appropriate read trimming were performed as described above.
In order to obtain a reference genome to map the reads, we reordered all D. americana (SF12) contigs using Mauve Contig Mover implemented in Mauve v2.4.0 (Rissman et al. 2009) and each of the estimated ancestral chromosomes with D. virilis sequences. The estimated ancestral chromosomes were obtained by using the D. virilis anchored scaffolds (Schaeffer et al. 2008), as described above, and by reordering the segments corresponding to inversions Xa and 2a which occurred in the lineage leading to D. virilis (see Results; supplementary figs. 1 and 4, Supplementary Material online). After reordering, we identified the contigs aligning with each chromosome and removed the others. Then, these contigs were concatenated using Unipro UGENE (Okonechnikov et al. 2012) to get the estimated D. americana (SF12) chromosomes. We introduced 100 Ns between putative adjacent contigs, to preserve the original information obtained from the assembly.
Reads obtained for each of the three populations were mapped against the reference genome using Bowtie2 v2.2.5 (Langmead and Salzberg 2012) using default settings. The overall alignment rates for northwestern, central and southern populations were 80%, 81% and 79%, respectively. We have further mapped the reads using BWA v0.7.12 (Li and Durbin 2009) with default settings, because it has been reported that the intersection of more than one mapping algorithm efficiently reduces the amount of false positives (Kofler et al. 2016). Ambiguously mapped reads (quality below 20) and putative optical duplicates were removed with SAMtools v1.3.1 ). Index files were created for the reference genome as well as for the bam files using PICARD tools v2.1.1 (http://broadinstitute. github.io/picard/), and variants were called for each of the six chromosomes and populations using GATK HaplotypeCaller v3.4.46 (Van der Auwera et al. 2013) with default settings. The SNP data were extracted from the raw *.vcf files and we applied hard filtering criteria according to GATK developer's guidelines (Van der Auwera et al. 2013). All monomorphic and triallelic SNPs were removed, thus only biallelic SNPs were further analyzed. Depth of coverage analysis was conducted using GATK DepthOfCoverage v3.4.46 (Van der Auwera et al. 2013) for each population. The distributions of coverage were close to normal, and the average values for north, center and south were 47X, 56X and 41X, respectively. For all populations, more than 94% of the sites show coverage values above 15Â. To avoid individual resampling, for each population, we used a coverage interval which included 68.2% of the total amount of sites around the mean ([33-65Â], [40-75Â] and [31-55Â], for northwestern, central and southern populations, respectively). All variants showing depth of coverage equal to one among the three populations were discarded. Information about chromosomal location, quality scores, as well as the frequencies of the reference and alternative variants for each population and chromosome was retained. The data obtained with the two alignment algorithms (Bowtie2 and BWA) were intersected, and only those SNPs identified using both tools were used for further analyses. The frequencies obtained for each polymorphic site were used to estimate total and within-population heterozygosity values as well as F ST using classical formulas as implemented in Popoolation2 (Kofler et al. 2011) and described by Hartl and Clark (1997). These values were further averaged in 100-kb nonoverlapping sliding windows to estimate average population differentiation values along chromosomal arms for all pairwise comparisons. The values within the inverted segments were reordered accordingly to get the differentiation levels in inverted chromosomes.

Gene Expression
To test whether the presence of inversions might affect the expression levels of at least some genes flanking the breakpoints, we performed qPCR for the orthologs of D. melanogaster PI31, Fig4, CG9588, and fab1 in both D. americana (SF12) and D. novamexicana (15010-1031.00) strains. These genes flank inversions 5 b (PI31), 4a (Fig4), and 2c (CG9588), while gene fab1 is located <4 kb from inversion 5a (see Results for further details about the rationale behind gene selection).
We used 1-day-old female flies, because there is no easy way to distinguish males from females during earlier life stages of these species and we wanted to avoid confounding effects caused by a mixture of males and females. The flies were reared under 12L:12D cycles at 25 C (D. americana [SF12] and D. novamexicana [15010-1031.00]) to avoid putative effects of temperature and/or photoperiod in gene expression. Three independent sets of three female flies for each of the six strains were snap-frozen and stored at À80 C for further mRNA extraction. Total RNA was isolated from whole bodies using TRIzol Reagent (Invitrogen, Spain) according to the manufacturer's instructions and treated with Turbo DNAfree kit (Life technologies, Carlsbad, CA). The purity and concentration of each extracted sample were measured with NanoDrop ND-1000 spectrophotometer (NanoDrop, Thermo Scientific, Portugal), and RNA integrity was checked using Experion platform (Bio-Rad, Portugal; all the samples had RNA Quality Indicator (RQI) values above 8.5). cDNA was synthesized by reverse transcription of 1.0 lg of RNA of each sample with SuperScript III First-Strand Synthesis SuperMix for qRT-PCR (Invitrogen, Spain) using random primers. Reactions where template was not added and reactions with RNA that was not reverse transcribed were performed to confirm the absence of genomic DNA contamination. Specific primers were designed for PI31 (5 0 -AACTCCGACTTCTTTTACG-3 0 and 5 0 -ATTTGTAGT GCTTGGTCAG-3 0 ), Fig4 (5CACAGCGACCTCCACATC-3 0 and 5 0 -AGGGTGGGAGTCAAGTCA-3 0 ), CG9588 (5 0 -GCTCCATCAACGAAAATAAC-3 0 and 5 0 -GGGACAAGT ATCAAATCC-3 0 ), and Fab1 (5 0 -GAGCAAAGGCAACAATC GT-3 0 and 5 0 -ACTGTGGAGGCGAGCATC-3 0 ) orthologs based on the assembly of the D. novamexicana (15010-1031.00) and D. americana (SF12, H5, and W11) genomes (see Results). RpL32 was used as the reference gene (5 0 -ACAACAGAGTGCGTCGTC-3 0 and 5 0 -ATCTCCTTG CGTTTCTTC-3 0 ) ( Reis et al. 2015). Primers' efficiencies were measured using serial dilutions of cDNA, and they were all in the 90-100% range. All qPCR reactions were carried out in 10 ml final volume solution containing 5 ml of iTaq SYBR green supermix (Bio-Rad), 3.75 ml of pure water, 0.125 ml of forward and reverse primers (10 mM), and 1 ml of cDNA. The reactions were performed on a Bio-Rad iCycler with the following program: 3 min at 95 C; 40 cycles of 30 s at 94 C, 30 s at 56 C and 30 s at 72 C followed by a standard melt curve. The threshold cycle (CT) values of technical replicates did not differ by more than 0.5 cycles in any case. Differences in expression levels were calculated using the 2 ÀDCT method (Livak and Schmittgen 2001;Schmittgen and Livak 2008).

A Clear DAIBAM Signature Is Found at Proximal and Distal Breakpoints of Nearly All Characterized Inversions
We used an alignment approach to identify inversion breakpoints. The contigs obtained for D. novamexicana (15010-1031.00) and D. americana (SF12) genomes were reordered based on the D. virilis sequence for each chromosome (Clark et al. 2007) as references. Then, the contigs aligning in two different regions of the same chromosome were identified and the presence of a breakpoint was further confirmed by BlastN against the D. virilis genome (http://flybase.org/ FB2017_05).
The precise chromosomal location of the newly identified breakpoints of the Xb, Xc, 2a, 2 b, 2c and 5 b inversions, as well as the previously identified breakpoints (Evans et al. 2007;Fonseca et al. 2012 Tetris that shares the final portions of its TIRs with DAIBAM and thus we cannot confidently say that this is a clear signature of a DAIBAM element, or that the inversion originated through ectopic recombination between two TE copies.

Drosophila americana Polymorphic Inversions Create High Differentiation Levels between Populations over Large Physical Distances
To address the impact of polymorphic inversions on variability patterns between populations, we performed whole-genome sequencing of pools of individuals from the northwestern, center, and south of D. americana distribution in the United States. The D. americana (SF12) genome was used as reference after reordering the contigs based on the D. We only observed highly differentiated regions on the X, 4th, and 5th chromosomes (Muller's elements A, B, and C, respectively) which are those carrying inversions (fig. 4). These differentiation levels were accompanied by an increase of the frequency of the reference allele in the southern population and a decrease in the northwestern one ( fig. 4). Moreover, chromosome diversity, estimated by heterozygosity levels, was reduced in these regions and the effect was more pronounced in populations showing the inversions (fig. 4). The second, third, and sixth chromosomes (Muller's elements E, D, and F, respectively) showed similar average levels of differentiation along the entire chromosomes irrespective of the pair of populations being compared (fig. 4). These differentiation levels were caused by similar frequencies and chromosomal diversity levels along the entire chromosomes among populations what should reflect the average levels observed in D. americana populations under free historical recombination.
The D. americana X chromosome (Muller's element A) shows two polymorphic inversions (Xc and Xd), but inversion Xd is limited to northwestern populations and is present at a very low frequency (6%). Inversion Xc is fixed in northwestern populations and is almost absent in southern populations (7.5% frequency). Its frequency shows a north-south cline that has been extensively documented (Vieira et al. 2001McAllister 2002;Reis et al. 2008). The observed patterns of differentiation between populations suggest that the presence of inversion Xc strongly suppresses recombination in 64% of the chromosome. The differentiation levels dropped with increased distance to the breakpoints, and this effect was more pronounced for the distal breakpoint ( fig. 5A). The differentiation levels also dropped toward the centromere and the most telomeric one-third of the chromosome likely undergoes free recombination, because it presented the same differentiation levels as the chromosomes not carrying inversions ( fig. 4).
The 4th chromosome (Muller's element B) shows four polymorphic inversions (4a, 4 b, 4c, and 4d). Inversion 4c is present in northwestern populations only, and at a low frequency (6.1%), while 4d must be extremely rare because it was not observed by Hsu (1952) based on a large sample size. The estimated frequencies in northwestern, central, and southern populations are 83.3%, 19.6%, and 3.8% for inversion 4a and 68.2%, 19.6%, and 3.8% for inversion 4 b, respectively. Inversion 4 b mostly co-occurs with inversion 4a (Hsu 1952). The population differentiation analysis showed that the telomeric one-third of the chromosome was the only portion that freely recombines in heterokaryotypes (figs. 4 and 5B). We also observed a reduction of the differentiation levels with increased distance to the breakpoints, and the effect was more pronounced in the distal breakpoint ( fig. 5B). These levels were caused by an increase in the frequency of the reference variant in the southern population and a reduction in the northwestern one, as it was observed for the Xc inversion ( fig. 4). The broader pattern of high differentiation might be caused by the co-occurrence of the 4 b inversion ( fig. 5B), which apparently creates a region of reduced heterozygosity within the 4a inverted segment (fig. 4).
The D. americana 5th chromosome (Muller's element C) shows two polymorphic inversions (5a and 5b). Inversion 5 b is fixed in northwestern populations, while inversion 5a is almost fixed in the southern populations (91% frequency) (Hsu 1952). Moreover, in the center of the distribution, inversions 5a and 5 b have an estimated frequency of 61% and 39%, respectively. Therefore, the frequency difference of inversion 5a and 5 b was higher between northwestern and southern populations than between the central and southern populations, which were reflected in the differentiation patterns observed ( fig. 5C). Together, inversions 5a and 5 b likely suppress recombination in at least 80% of the chromosome, but the differentiation levels were remarkably lower than those observed for the X and 4th chromosomes. Moreover, the differentiation levels were lower in the middle of the 5a inversion ( fig. 5C). The diversity levels observed for the 5th chromosome in the southern population were lower than those observed for northwestern and central populations ( fig. 4), and the reduction of the frequency of the reference allele in the northwestern population did not lead to a reduction in the diversity levels ( fig. 4). This result coupled with the lower overall reduction in differentiation levels suggest that the 5 b inversion is not totally fixed in the northwestern population.

The Effect of Inversions on Genes Flanking the Breakpoints
The detailed analysis of the distance from DAIBAM insertions to transcription start sites (TSS), and depending on the transcription orientation of the gene being analyzed, to the start or stop codons, can give insight into whether the expression of genes flanking the inversion breakpoints could be affected by the presence of a given inversion.
We found many cases where the DAIBAM insertion is located <100 bp away from the TSS meaning that the insertion could disrupt the enhancers and other regulatory elements of genes such as elF2B-B, Csat, CG8206, CG15896, Asta-R1, CG11755, Invadolysin, ncRNA GJ26151, Lkb1, mura, RnpS1, Fig4, vls, and CG12134 (table 1). The TSS of one out of the four putative transcripts of PI31 is disrupted by the proximal breakpoint of inversion 5 b, and in the case of Fig4 the DAIBAM insertion located at the proximal breakpoint of inversion 4a disrupts the TSS and is located 20 bp from the start codon (table 1). Therefore, these two genes represent good candidates to test the "position effect" hypothesis (Sperlich and Pfreim 1986). According to the interactome data available in FlyBase (http://flybase.org), CG9588 interacts with PI31 (5b) and its gene locus flanks the proximal breakpoint of inversion 2c (table 1). Moreover, there is also evidence for an interaction between Fig4 (4a) and fab1 (http://flybase. org) of which the gene locus is located <4 kb away from the proximal breakpoint of inversion 5a. Thus, we also included CG9588 and fab1 to evaluate their expression levels between D. americana (SF12) and D. novamexicana (15010-1031.00). We were not able to evaluate the expression of PI31, because the expression levels were too low to be quantified in a proper manner. Moreover, we could not find statistically significant differences in expression levels between D. americana (SF12) and D. novamexicana (15010-1031.00) strains for the three genes surveyed (CG9588, Fig4, and fab1) (fig. 6). Although the result for CG9588 is significant on its own at the 0.05 level, it is not listwise (after Bonferroni correction).

Discussion
The D. americana (SF12) and D. novamexicana (15010-1031.00) draft genomes generated in this study allowed the molecular characterization of the breakpoints of six new inversions (Xb, Xc, 2a, 2 b, 2c, and 5b) as well as those of inversions 4a (Evans et al. 2007), Xa and 5a (Fonseca et al. 2012) already characterized. Therefore, only those of inversions Xd, 3a, 4 b, 4c, and 4d remain to be characterized in the virilis phylad. Inversions Xd, 4c, and 4d are not present in either D. americana (SF12, from the south of the distribution) or in D. novamexicana (15010-1031.00 strain), and inversion 4 b is extremely rare in southern populations and it is absent in D. novamexicana (Hsu 1952). It is thus not surprising that inversion 4 b is not present in D. americana (SF12) genome. Nevertheless, inversion 3a is present in all D. novamexicana strains examined (Hsu 1952), but we failed to characterize its breakpoints most likely because there was no D. novamexicana (15010-1030.00) contig spanning the breakpoints. A similar result was observed for inversion 4a (supplementary file 8, Supplementary Material online).
The MITE element DAIBAM is clearly present in eight breakpoints of the nine characterized inversions, suggesting that this element was fundamental in the generation of most inversions observed in the virilis phylad. These results resemble those of Delprat et al. (2009) that showed unequivocally that ectopic recombination between Galileo TEs generated three The differentiation levels drop with increased distance to the breakpoints. Differentiation levels were measured by F ST for every pairwise comparison between northwestern, central, and southern populations of D. americana for the X (A), 4th (B), and 5th (C) chromosomes (red-Northwest v. South; black-Northwest v. Center, and green-Center v. South). The position of the inversions present in these chromosomes was estimated based on D. americana (SF12) reference genome after performing BlastN of the genes flanking the breakpoints. The hypothetical position of the 4b inversion (hyp4b), as well as the historical inversions frequencies were estimated based on data provided by Hsu (1952). polymorphic D. buzzatii inversions. DAIBAM can only be found in species of the virilis group (D. virilis, D. novamexicana, and D. americana), and thus could be a relatively recent invader (Fonseca et al. 2012). Penelope elements have repeatedly invaded and reinvaded species of the D. virilis group (Morales-Hojas et al. 2006), and Polyphemus may have repeatedly invaded D. virilis as well (Blumenstiel 2014). Moreover, Penelope is able to promote the comobilization of other unrelated TEs (Vieira et al. 1998), and thus it is conceivable that the repeated invasion of the virilis group by Penelope led to an increase of TE activity and, as a consequence, of chromosomal inversions. Dias et al. (2014) found a nonautonomous foldback transposon, named Tetris, present in D. virilis and D. americana, that shares the final portions of its TIRs with DAIBAM. It is thus likely, that both elements are mobilized by the same unknown autonomous TE. Tetris may have contributed to the shaping of the genomes of D. virilis and D. americana, by providing internal tandem repeats that served as building blocks for the amplification of satellite DNA arrays.
The results presented here show that ectopic recombination between TEs (Kupiec and Petes 1988;Lim and Simmons 1994;Delprat et al. 2009) is the prevalent mechanism involved in the generation of inversions in the virilis phylad. This is in contrast with what happens in D. melanogaster and its close relatives where nonhomologous end-joining repair (Sonoda et al. 2006) is the prevalent mechanism (Ranz et al. 2007). Our pool-seq data showed that inverted chromosomes show higher differentiation levels along the inverted segment when compared with regions not physically affected by the inversions. This is the likely reason why when analyzing genes located around inversion breakpoints, sometimes individuals cluster by the presence of the inversion rather than by known species relationships (Morales-Hojas et al. 2011;Fuller et al. 2018). One possible explanation for the observed differentiation patterns is the eventual disruption of genome regions that are more prone to recombination by crossing-over ("sensitive sites"; Corbett-Detig 2016) by inversion breakpoints. It has been suggested that inversions containing breakpoints affecting these regions in D. melanogaster have a high chance of reaching high frequencies and eventually go to fixation (Corbett-Detig 2016, and references therein). We showed here that the differentiation levels were remarkably higher around the breakpoints and dropped with increased distance to the breakpoints. This effect was most pronounced for inversions Xc and 4a. It has been proposed that the regions within large inversions can pair in heterokaryotypes, and even if the frequency of crossing-over events is reduced, there is still gene conversion (reviewed in Andolfatto et al. 2001). Thus, the differentiation patterns within the inverted segments between D. americana populations may be explained by gene conversion over the last million years. Our observations in D. americana regarding the impact of inversions on the recombination landscape are thus compatible with the hypothesis that inversions locally capture adapted sets of genes, protecting them from recombination with immigrant chromosomes (Kirkpatrick and Barton 2006;Kirkpatrick 2010). It should be noted, however, that the differentiation levels within inverted segments are still higher than those expected under free historical recombination and we observed intermittent peaks of high differentiation ( fig. 5). This pattern of highly differentiated regions in the presence of gene conversion was also observed by Fuller et al. (2017), and it is suggestive of adaptive selection. Guillen and Ruiz (2012) have argued that given the high gene density and compact structure of Drosophila genomes, it is likely that genes around inversion breakpoints could be the targets of selection responsible for the increase in frequency and maintenance of inversions in natural populations. Nevertheless, differences in genome-wide expression levels evaluated by RNA-seq between different chromosomal arrangements in D. pseudoobscura identified only one differentially expressed gene associated with an inversion breakpoint (Fuller et al. 2016). However, there are cases of gene disruption in Drosophila, mice, and plants (see references in Fuller et al. [2016]). Additionally, Puig et al. (2004) observed dramatic changes in gene expression in a gene located close to an inversion breakpoint in D. buzzatii. They have found that a Galileo insertion 12 bp away from the stop codon leads to a reduction in gene expression. Here, we show that the presence of inversions does not significantly affect expression of the breakpoints flanking genes Fig4, fab1, and CG9588. However, we evaluated expression differences in 1-day-old adult females. It is conceivable that inversions may influence modular enhancers and, thus may only affect gene expression in a specific tissue or at a specific time point. Therefore, we cannot rule out the possibility of positional effects of inversions on gene expression ("position effect" hypothesis [Sperlich and Pfreim 1986]). As such, we cannot also rule out that the interactions between genes flanking different inversions (Fig4 [4a] and fab1 [5a] [Botelho et al. 2008], as well as PI31 [5b] and CG9588 [2c]); see the interactome data available in FlyBase (http://flybase.org) are maintaining these inversions at different frequencies in natural populations by keeping together alleles at loci with epistatic effects on fitness ("coadaptation" hypothesis [Dobzhansky 1970]).
Further analysis of genes flanking the inversion breakpoints can give insight into the nature of the selective forces that maintain the north-south inversion clines in D. americana. In a genome-wide survey, 580 out of 11,664 genes have been identified as being involved in heat nociception (or heat perception) (Neely et al. 2011). Of the 36 genes that surround the breakpoints shown here (table 1), 23 genes have been included in this genome-wide survey, and five of them are heat nociception candidates, including gene CG9588. This observation could indicate selection for high temperature avoidance associated with the inversions. This is in agreement with the observation made by Sillero et al. (2014) that avoidance plays an important role in thermoregulation in D. americana. Moreover, it has been shown that D. virilis, D. lummei, D. novamexicana, and southern populations of D. americana have different basal and inducible heat tolerances (Garbuz et al. 2003) at least partially due to variation in Hsp70 gene family copy number (Garbuz et al. 2002). Temperature is also the environmental factor that contributes most to explain the ecological niche distribution of D. americana (Sillero et al. 2014). It is, thus, conceivable that at least a fraction of the chromosomal inversions could be associated with gene variants that promote different susceptibilities to temperature.
In D. americana, there is also a large amount of variation regarding traits such as diapause incidence (Reis et al. 2015), cold resistance , abdominal size , developmental time (Fonseca et al. 2013), body color (Wittkopp et al. 2011), and lifespan (Fonseca et al. 2013). Therefore, the very different frequency of common D. americana polymorphic inversions on chromosomes X, 4, and 5 and associated differentiation over large physical distances also mean that very likely the molecular basis of many north-south phenotypic differences is located on these chromosomes.
In conclusion, species of the virilis phylad represent an excellent model to contrast the findings obtained for D. melanogaster and close relatives regarding the origin, rise to high frequency, and consequences of inversions. The information provided in this work will be extremely useful for many studies addressing the role of chromosomal inversions as a source of genomic variability explaining phenotypic differences and local adaptation.

Supplementary Material
Supplementary data are available at Genome Biology and Evolution online.