Genome Diversity and Divergence in Drosophila mauritiana: Multiple Signatures of Faster X Evolution

Drosophila mauritiana is an Indian Ocean island endemic species that diverged from its two sister species, Drosophila simulans and Drosophila sechellia, approximately 240,000 years ago. Multiple forms of incomplete reproductive isolation have evolved among these species, including sexual, gametic, ecological, and intrinsic postzygotic barriers, with crosses among all three species conforming to Haldane’s rule: F1 hybrid males are sterile and F1 hybrid females are fertile. Extensive genetic resources and the fertility of hybrid females have made D. mauritiana, in particular, an important model for speciation genetics. Analyses between D. mauritiana and both of its siblings have shown that the X chromosome makes a disproportionate contribution to hybrid male sterility. But why the X plays a special role in the evolution of hybrid sterility in these, and other, species remains an unsolved problem. To complement functional genetic analyses, we have investigated the population genomics of D. mauritiana, giving special attention to differences between the X and the autosomes. We present a de novo genome assembly of D. mauritiana annotated with RNAseq data and a whole-genome analysis of polymorphism and divergence from ten individuals. Our analyses show that, relative to the autosomes, the X chromosome has reduced nucleotide diversity but elevated nucleotide divergence; an excess of recurrent adaptive evolution at its protein-coding genes; an excess of recent, strong selective sweeps; and a large excess of satellite DNA. Interestingly, one of two centimorgan-scale selective sweeps on the D. mauritiana X chromosome spans a region containing two sex-ratio meiotic drive elements and a high concentration of satellite DNA. Furthermore, genes with roles in reproduction and chromosome biology are enriched among genes that have histories of recurrent adaptive protein evolution. Together, these genome-wide analyses suggest that genetic conflict and frequent positive natural selection on the X chromosome have shaped the molecular evolutionary history of D. mauritiana, refining our understanding of the possible causes of the large X-effect in speciation.

Takara LA Taq polymerase following the manufacturer's instructions with a 5 minute extension time. PCR products were cleaned prior to sequencing using exoSAPit (USB). Previously published PCR primers for the Dox and MDox regions were used, which amplified complete genes as well as flanking sequence (Tao, Araripe, et al. 2007;Tao, Masly, et al. 2007;Kingan, et al. 2010; see supplementary table S14). Internal sequencing primers were designed using Primer3Plus (Untergasser, et al. 2007) and Amplify (Engels 2005); sequences are available upon request. Sequencing was performed on an ABI3730 capillary sequencer according to manufacturer's protocols; sequences were edited using Sequencher v.4.8 (Gene Codes Corp.).
Many of the haplotypes from the Dox region have a series of identical or nearly identical copies of a 359-bp satellite DNA repeat element (Tao, Araripe, et al. 2007), which is related to the satellite block, Zhr (Ferree and Barbash 2009), which complicated the assembly of contigs. We digested the amplified PCR fragments with BslI, and compared the restriction fragment profile with that predicted from our contig to ensure the correct number of repeat elements were included in our alignments. 2× sequencing coverage was obtained for all samples, forward and reverse reads when possible, but in many cases, the lack of unique priming sites due to the tandemly arrayed repeat elements only allowed for 2× coverage in the same direction.
Alignments of sequences for each gene region, as well as between homologous regions of Dox and MDox were executed by eye using annotated sequence elements as "anchors" for the alignment. In some cases, the bl2seq program of BLAST was used for pair wise alignments (Tatusova and Madden 1999). To insure a proper alignment of each region, we performed a phylogenetic analysis of the 359-bp satellite elements for each gene to assign homology among repeats. For each locus, we extracted the repeat sequences from each sampled isofemale line and aligned the repeats by eye.

Results
At the MDox region, we sample 26 chromosomes and compile a 5,526-bp alignment of the sequences (supplementary table S16). We infer that MDox is fixed in D. mauritiana because all 26 samples contain the gene insertion. We observe copy number variation in the 359-bp satellite elements that flank the MDox gene: copy number ranges from one to four tandemly arrayed repeats with most samples having two elements (supplementary fig. S3). Fig. S3b shows the neighbor-joining tree that was used to create an accurate alignment of the sequenced region (we obtained a similar trees using parsimony, not shown). In addition to the D. mauritiana lines sampled in this study, we include 69 previously sequenced D. simulans samples (Kingan, et al. 2010). We observe four clusters of repeat sequences with more than 75% bootstrap support and define these as MDox repeat types one through four.
At the Dox region, we sample the same 26 D. mauritiana chromosomes and compile an alignment of 8,503-bp. The ancestral Dox [null] allele, which lacks the Dox gene insertion, is more common than the derived allele, and is found in 20 of the sampled chromosomes (~77%) (Kingan, et al. 2010). One of the sampled strains (R10) has a ~3.2-kb deletion that is part of the Dox gene. Furthermore, we observe extensive copy number variation in the 359-bp satellite in D. mauritiana, with the number of elements ranging from four to seven (supplementary fig. S2). Supplementary fig. S2b also shows the neighbor-joining tree that was used to assemble an accurate alignment of the Dox region for our 26 D. mauritiana samples, as well as our previously collected D. simulans samples (Kingan, et al. 2010). We observe five clusters of elements with more than 56% bootstrap support and define these as Dox repeat types one through five. (No quantitative inferences are made about the evolution of repeat types from these trees-they are simply used to compile the best alignment of the sequence data by clustering the most closely related repeat elements.)

Table S4
The 13 strains of Drosophila used in this study. Also included are the median read depth for each aligned position and the percent of mau12 reference genome covered in each of the ten lines of

Table S14
Identity of satellite DNA repeats present in block X.1 (see fig. 3 from main text).

Table S15
The sequences of PCR primers used to amplify the Winters sex-ratio genes in D. mauritiana.   Repeats elements for each sample are numbered in order (e.g., "R18 rep 4" and "R6 rep 2" are homologous and are both type 4, shown as blue). Repeat elements that contain the MDox insertion were concatenated (i.e., the MDox insertion was excised) and are denoted with an asterisk.