Assessing Genomic Admixture between Cryptic Plutella Moth Species following Secondary Contact

Abstract Cryptic species are genetically distinct taxa without obvious variation in morphology and are occasionally discovered using molecular or sequence data sets of populations previously thought to be a single species. The world-wide Brassica pest, Plutella xylostella (diamondback moth), has been a problematic insect in Australia since 1882, yet a morphologically cryptic species with apparent endemism (P. australiana) was only recognized in 2013. Plutella xylostella and P. australiana are able to hybridize under laboratory conditions, and it was unknown whether introgression of adaptive traits could occur in the field to improve fitness and potentially increase pressure on agriculture. Phylogenetic reconstruction of 29 nuclear genomes confirmed P. xylostella and P. australiana are divergent, and molecular dating with 13 mitochondrial genes estimated a common Plutella ancestor 1.96 ± 0.175 Ma. Sympatric Australian populations and allopatric Hawaiian P. xylostella populations were used to test whether neutral or adaptive introgression had occurred between the two Australian species. We used three approaches to test for genomic admixture in empirical and simulated data sets including 1) the f3 statistic at the level of the population, 2) pairwise comparisons of Nei’s absolute genetic divergence (dXY) between populations, and 3) changes in phylogenetic branch lengths between individuals across 50-kb genomic windows. These complementary approaches all supported reproductive isolation of the Plutella species in Australia, despite their ability to hybridize. Finally, we highlight the most divergent genomic regions between the two cryptic Plutella species and find they contain genes involved with processes including digestion, detoxification, and DNA binding.


Introduction
Cryptic species lack conspicuous variation in visible traits, yet can show high levels of ecological, behavioral, and genetic divergence, particularly when they arise in allopatry (Stuart et al. 2006;Bickford et al. 2007;Pfenninger and Schwenk 2007). Morphological resemblance of two or more distinct species can occur when environmental pressures maintain phenotypes or cause convergence, and through introgression of traits by interspecies hybridization (Bickford et al. 2007). Consequently, cryptic species are often overlooked, leading to both underestimates of species richness and overestimates of their geographic range (Stuart et al. 2006;Vod a et al. 2015).
Reproductive barriers can maintain boundaries between sympatric congeneric animal species (cryptic or noncryptic) using a range of isolating mechanisms such as olfaction, pheromone cues, and mating calls (Jones and Hamilton 1998;Andersson et al. 2007), host plant preference or mating timing (H€ anniger et al. 2017), and endosymbiont infection (Shoemaker et al. 1999;Bordenstein et al. 2001). Although these factors can impose reproductive isolation barriers and restrict hybridization, assortative mating does not always occur (Mallet et al. 2007). Interspecific hybridization of two species within the same genera has been found to occur at similar rates across the animal kingdom, after taxonomic groups are adjusted for species richness (Schwenk et al. 2008). While hybridization between related species has been well documented, the process of distinguishing between adaptive introgression and regions of historic population structure has been challenging (Martin et al. 2015).
Closely related allopatric or sympatric species without gene flow should exhibit genetic divergence across the genome, whereas species with gene flow should show lower levels of divergence across broad regions relative to the frequency of interbreeding and how recently it occurred. Detecting hybridization is possible through the use of informal statistical tests on genetic variation, including principle component analysis (Patterson et al. 2006) and Bayesian STRUCTURE model analysis (Pritchard et al. 2000). While these tests can provide results indicative of admixture, they cannot distinguish between introgression, interlineage sorting, or homoplasic genetic drift. Patterson et al. (2012) formalized statistical approaches to estimate admixture based on allele frequencies across multiple populations, namely the f3 and f4 statistics (Dstatistic), which assess the likelihood of hybridization. The f4 statistic has identified introgression between sympatric Heliconius butterfly species (Martin et al. 2013;Zhang et al. 2016) and hominids (Patterson et al. 2012), as allele frequencies across these genomes did not always agree with the expected species tree, or neutral drift.
Hybridization and introgression of genetic variation from a donor species into a recipient can have adaptive advantages. The transfer of advantageous preadapted alleles from one species into another removes the reliance of new traits arising though mutation in the recipient. Examples include the transfer of rodenticide resistance between mice (Song et al. 2011), coat color alleles among jackrabbits and hares (Jones et al. 2018), aposematic wing patterns in Heliconius butterflies (Mav arez et al. 2006;Pardo-Diaz et al. 2012;Wallbank et al. 2016) and insecticide resistance genes in Anopholes mosquitoes (Lee et al. 2013;Norris et al. 2015).
The diamondback moth, Plutella xylostella (L.) (Lepidoptera: Plutellidae), is the most destructive pest of Brassicaceous agricultural crops, including broccoli, cabbage, and canola (Furlong et al. 2013). They are able to cause en masse defoliation, malformed, and improper plant growth (Zalucki et al. 2012), and often develop resistance to insecticides making pest control an ongoing challenge. Plutella xylostella were first documented in Australia in the 1880s (Tyron 1889), yet an endemic and phenotypically cryptic species, P. australiana (Landry and Hebert), was only recently identified through high divergence of mitochondrial COI barcode sequences (8.6%) and morphologically distinct genitalia (Landry and Hebert 2013). The discovery was surprising, as P. australiana was not detected in previous molecular studies of P. xylostella yet, is dispersed across eastern Australia Delgado and Cook 2009).
Insecticide susceptibility appears to limit P. australiana's pest potential among cultivated brassica crops, however, introgression of insecticide resistance loci from P. xylostella could have serious consequences for agriculture. Plutella xylostella and P. australiana can hybridize in experimental laboratory crosses, despite their contrasting infection rates of endosymbiotic Wolbachia (Ward and Baxter 2017;Perry et al. 2018), which are known to cause reproductive incompatibility in some cases (Sasaki and Ishikawa 2000;Duplouy et al. 2013). Wolbachia infection is fixed among P. australiana yet extremely low in Australian P. xylostella (1.5%). Although the strength of reproductive barriers in the field is unknown, limited numbers of SNP markers widely dispersed across the nuclear genome previously identified genetic structure between sympatric populations of P. xylostella and P. australiana (Perry et al. 2018). Due to P. australiana's apparent endemism and the relatively recent invasion of P. xylostella into Australia, we assessed the capacity for sympatric Australian Plutella species to exchange beneficial traits through disassortative mating and introgression in the field through analyzing whole genomes.  À155.636), and reared for one generation. Genomic DNA purification was performed using phenol extractions, treated with RNaseA, precipitated with ethanol, and resuspended in TE buffer (10 mM Tris, 0.1 mM EDTA). Species identification was performed using a PCR-RFLP diagnostic assay of the mitochondrial COI gene (Perry et al. 2018). Genome sequencing was performed using the Illumina HiSeq2500 or NextSeq platforms at the Australian Genome Research Facility and the Australian Cancer Research Facility.

Processing Genome Sequence Data
Summary statistics of Illumina sequence reads were generated with FastQC (Andrews 2010) and visualized using the R package ngsReports (Ward et al. 2018). Trimmomatic v 0.32 (Bolger et al. 2014) was used with the parameters (TRAILING: 15 SLIDINGWINDOW: 4: 15) to trim adapter, quality filter, and retain paired reads. The P. xylostella reference genome (You et al. 2013) was downloaded from NCBI (GCA_000330985.1). Stampy v1.0.21 (Lunter and Goodson 2011) was used to align the paired reads to the reference with the parameters (-gatkcigarworkaround, -substitutionrate ¼ 0.01) which produced Sequence Alignment/Map (SAM) files that were converted to binary format (BAM) and indexed then sorted using SAMtools v1.2 (Li et al. 2009). PCR and optical duplicates were removed using Picard Tools v1.61 (http:// broadinstitute.github.io/picard/). BAM summary statistics including average read depth per site called, coverage of the genome, percent missing data, total number of reads and read quality were generated using SAMtools v1.2.

Genotype Variant Calling
Variant calling was performed using the Genome Analysis ToolKit (GATK) v3.3 (DePristo et al. 2011). GATK: HaplotypeCaller was used to generate gVCF records, containing variant and invariant sites across the genome, on a per sample basis. The HaplotypeCaller parameter heterozygosity (likelihood of a site being nonreference) for each species was estimated by SAMtools v1.2, indicating P. xylostella from Hawaii was most similar to the reference genome (heterozygosity: P. australiana ¼ 0.0497; P. xylostella Australia ¼ 0.0348; P. xylostella Hawaii ¼ 0.0272). Individual gVCF records were combined using GATK: Genotype GVCF and filtered using BCFtools (Li et al. 2009) to a minimum individual depth greater than five reads per base with no greater than 40% of samples missing genotypes at any one site.
Nei's mean intrapopulation nucleotide diversity, p, (Nei and Li 1979) was calculated using egglib (De Mita and Siol 2012). The mean and standard error in p and jackknifing was performed using the R package bootstrap (Canty and Ripley 2017). Pairwise F ST and Tajima's D was calculated across 50kb windows using VCFtools (Danecek et al. 2011) and minimum distances between populations (km's) determined with http://www.movable-type.co.uk/scripts/latlong.html, last accessed October 25, 2018.

Phylogenetic Reconstruction of Plutella Mitochondrial and Nuclear Genomes
All quality filtered variant and invariant sites called against the mitochondrial reference genome (GenBank KM023645) were extracted using BCFtools and converted to a FASTA alignment using the R programming language. Maximum likelihood phylogenetic inference using the nuclear genome consensus, heterozygous sites were replaced with IUPAC ambiguity codes, alignment was performed with exaML (Kozlov et al. 2015) with GTRþGAMMA bootstrap resampling (n ¼ 100; GTRþGAMMA) was then carried out using RAxML v8.2.4 to provide node confidence. The phylogeny was then rooted using the midpoint method in FigTree (v1.4.3, http://tree.bio. ed.ac.uk/software/figtree).

De Novo Assembly of Mitochondrial Genomes and Plutella Split Time Estimates
De novo assembly of Plutella mitochondrial genomes was performed using NOVOPlasty v2.6.3 (Dierckxsens et al. 2017). A sequence read that mapped to the P. xylostella mitochondrial COI gene was used as the seed to initiate assembly. Genomes circularized by NOVOPlasty were then annotated through homology to the P. xylostella mitochondrial reference gene annotation (GenBank KM023645) with Geneious v10.0.6. Potential misassemblies were investigated by mapping individual raw reads to the appropriate de novo assembly on a per sample basis using BWA-MEM (Li 2013). Mapped reads were then used as fragments in Pilon (Walker et al. 2014) to correct the assembly. The sample with the greatest total length (15,962 bp), Paus ACT14.1, was used to produce a reference for the mitochondrial genome of Plutella australiana (Genbank accession MG787473.1).
The mitochondrial split time between P. xylostella and P. australiana was estimated using 13 mitochondrial protein coding genes extracted from 20 Plutella samples with circularized genomes plus Prays oleae (accession no. NC_025948.1) and Leucoptera malifoliella (accession no. JN790955.1). Nucleotide alignments were made for each gene using MAFFT (Katoh et al. 2002), substitution models were determined using JModelTest2 (Darriba et al. 2012) and alignments were then imported into BEAUTi (Drummond et al. 2012). We set the clock model to strict with 0.0177 substitutions Myr À1 according to Papadopoulou et al. (2010). Substitution models were unlinked to allow each sequence to coalesce independently with the Yule speciation model. MCMC sampling was carried out over 1000000 trees sampling every 1000 using BEAST2 v 2.4.7 (Bouckaert et al. 2014). Sampled trees from the chain were checked using Tracer v 1.6 (Rambaut et al. 2018) to determine burn in. Densitree was then used to superimpose MCMC trees to determine the internal node height ranges.

Data Simulation
Coalescent local trees with a total chromosomal length of 25 Mb were simulated for 24 individuals, including eight samples from an outgroup (O) and two ingroups (I 1 and I 2 ) using the Markovian Coalescent Simulator, MaCS (Chen et al. 2009). A coalescent model for the most recent common ancestor of I 1 and I 2 was set to 0.4 Â 4 N generations ago and the root to 1.5 Â 4 N generations ago, providing the topology ((I 1 , I 2 ), O). Simulated divergence was determined using mean d XY values from Plutella samples (see fig. 4). Two approaches were used to simulate introgression events from I 2 to O or from O to I 2 . First, Introgression was simulated as a single en masse admixture event at 0.01 Â 4 N generations ago with admixture frequencies (f) of f ¼ 0, 0.05, 0.1, 0.2, and 0.3. Second, introgression was simulated over five distinct breakdowns in assortative mating (0.01, 0.008, 0.006, 0.004, and 0.002 x 4 N generations ago) and f ¼ 0, 0.05, 0.1, 0.2, and 0.3. Each simulation was carried out with a constant population recombination rate (4Nr) of 0.001. Sequences were generated from the coalescent trees using SeqGen (Rambaut and Grassly 1997) with the Hasegawa-Kishino-Yano substitution model (Hasegawa et al. 1985) and a branch scaling factor of 0.01.

Admixture
The F3-Statistic A formal test for admixture was calculated using the three population test, the f-statistic (f3) (Reich et al. 2009;Patterson et al. 2012). Three possible combinations of tip structures were assessed with the ingroups and outgroup namely f3(I 1 , I 2 ; O), f3(I 1 , O; I 2 ), f3(I 2 , O; I 1 ,). Cases without introgression are expected to return positive f3 values while negative values indicate introgression has occurred from a donor to a recipient population, forming an intermediate ancestor of both source populations. Block jack-knife F3 estimation was carried out using PopStats (https://github.com/pontussk/popstats).

Absolute Divergence (d XY )
Nei's absolute divergence, d XY , was used to calculate the mean number of nucleotide differences between two populations across nonoverlapping 50-kb windows with egglib_sliding_windows.py (https://github.com/johnomics). Comparisons of d XY were made first with simulated data sets, using the five admixture frequencies (f ¼ 0, 0.05, 0.1, 0.2, and 0.3) between I 2 and O, O and I 2 and I 1 and I 2 . The d XY values were summarized by transforming them into density plots to visualize the distribution and frequency across the simulated genome. This provided expected d XY patterns under a range of admixture frequencies. Average d XY was then calculated between 1) P. australiana (O) and Australian P. xylostella (I 2 ) individuals, 2) P. australiana (O) and Hawaiian P. xylostella (I 1 ) and 3) Hawaiian P. xylostella (I 1 ) and Australian P. xylostella (I 2 ). Histograms were plotted after setting the maximum value to 1 using the geom_density function in ggplot2 (Wickham 2009).

Tree-Tip Distance Proportions
Maximum likelihood phylogenetic reconstruction was performed with RAxML (Stamatakis 2014) using nonoverlapping 50-kb genomic windows generated with the python script genoToSeq.py (https://github.com/ simonhmartin). Phylogenies of empirical data each used four individuals, including one P. xylostella and one P. australiana individual from sympatric Australia populations (SA14, ACT14, or ACT15), and two P. xylostella individuals from Hawaii (HO13.1 and HH13.2). Each tree was then converted to a distance matrix using APE (Paradis et al. 2004) and pairwise distances between tips were determined in the R programming language using the equation, where d a is the branch distance between the tree-tip of an Australian P. xylostella individual (PxA) and a P. australiana individual (Pa) and d b is the averaged branch distance between two Hawaiian P. xylostella (Pxyl HO13.1 and Pxyl HH13.2) and a P. australiana individual. Two individuals from Hawaii were used to reduce bias from this ingroup source. The Proportion ab values for each 50-kb window were expected to be $0.5 if the phylogeny was concordant with the species tree. Values much >0.5 indicate genomic windows more similar between P. australiana and Hawaiian P. xylostella, while values much <0.5 indicate genomic windows more similar between P. australiana and Australian P. xylostella and are candidate admixed regions. Only genomic windows with >20% of sites genotyped were analyzed. For comparison, simulated data from 24 individuals and five admixture frequencies (f ¼ 0, 0.05, 0.1, 0.2, and 0.3) described earlier was divided into 50-kb windows (n ¼ 500). Each 50-kb window was then subdivided into 64 separate alignments containing four simulated samples; the same two I 1 individuals in each case, (reflecting the use of the same two P. xylostella samples from Hawaii in the empirical data) and nonredundant pairs of I 2 and O individuals. Four tip unrooted maximum likelihood phylogenies were produced for each alignment using RAxML, then Proportion ab calculated and plotted using a bin width of 0.05.

Analysis of Discordant Tree-Tip Distances
After plotting tree-tip distance proportions, the tails of each distribution was investigated for symmetry by counting the number of 50-kb windows above or below each mean at three thresholds (mean 6 0.05, 0.10, and 0.15). Windows below the mean (mean -0.15) were further investigated by calculating d XY across 10-kb windows, sliding by 2 kb. These genomic regions indicate greater similarity between P. australiana and Australian P. xylostella than the average and d XY plots were visually inspected for signs of introgression.
Identification of Divergent Genomic Windows between P. australiana and P. xylostella Both F ST and d XY were calculated across aligned 50-kb genomic windows between all P. xylostella samples (from Australia plus Hawaii) and P. australiana. Annotated protein coding genes were extracted from the most divergent 1% of 50-kb windows for each statistic and BLAST against the DBM gene list available from DBM-DB (Tang et al. 2014). To identify their molecular function, InterPro and UniProt annotations were obtained for each BLAST hit.

Alignment of Plutella Species to the Reference Genome
The genomes of 29 Plutella samples were sequenced using short read Illumina platforms, including eight P. xylostella from Hawaii, eight P. xylostella from Australia, and 13 P. australiana. Samples from Australia were classified into three populations based on collection location and year for analysis (ACT2014, ACT2015, SA2014). A single P. australiana individual from Richmond, NSW, was also sequenced (supplementary table S1, Supplementary Material online). Resequenced genomes were mapped to the $393 Mb P. xylostella reference genome (You et al. 2013), but just 170 Mb of non-N bases were retained after stringent quality filtering. Sequence coverage across the 170 Mb alignment ranged from 9-to 25-fold per individual and $70% of these sites were genotyped in P. australiana samples compared with $92% for Australian and Hawaiian populations of P. xylostella (table 1).
The highest levels of nucleotide diversity were observed within Hawaiian P. xylostella samples (table 1). However, endemic P. australiana populations showed higher levels of nucleotide diversity than Australian P. xylostella, which may have undergone a population genetic bottleneck when colonization occurred. Mutation-drift equilibrium of these populations was determined using Tajima's D (D T ). Plutella xylostella collected from Australia were under equilibrium (D T 95% CI ¼ À0.6046375 to þ0.9435148) whereas those collected from Hawaii showed largely negative values (D T 95% CI ¼ À1.88 to À0.039) which may be the result of a recent population size expansion or higher than expected abundance of rare alleles. The frequency of rare alleles in P. australiana was also common, although the D T 95% confidence interval overlapped with zero (D T 95% CI ¼ À1.18 to þ 0.42).
Pairwise comparisons between populations and species were then used to assess genetic structure with F ST . The three Australian P. xylostella populations showed no genetic structure between geographic location (SA vs. ACT) or year (2014 vs. 2015) (combined average of F ST ¼0.003 6 0.003), as has been previously reported with microsatellite data ). However, much higher levels of differentiation were observed when compared with Hawaiian P. xylostella, supporting the expectation of genetic isolation (average of F ST ¼0.108 6 0.01). The average pairwise F ST values were slightly lower between P. australiana and Hawaiian P. xylostella (F ST ¼0.501 6 0.002) than P. australiana and Australian P. xylostella (F ST ¼0.532 6 0.013) (table 2).

Phylogenetic Inference of Plutella Species
A maximum likelihood phylogeny using $170 Mb of the nuclear genome showed two clear Plutella species groups with deep divergence between species. Plutella xylostella from Hawaii and Australia formed reciprocally monophyletic sister clades with 100% bootstrap support while P. australiana genomes formed a single clade, although generally had lower levels of internal branch support ( fig. 1). Branch distances were shorter between the internal nodes of P. australiana and Hawaiian P. xylostella than Australian P. xylostella, suggesting the two P. xylostella clades have diverged substantially since their most recent common ancestor.

Plutella australiana Mitochondrial Genome and Dating
We carried out de novo assembly and annotation of the P. australiana mitochondrial genome which has a total length of 15,962 bp (GenBank accession MG787473.1) compared with 16,014 bp of P. xylostella (Dai, Zhu, Qian, et al. 2016). Using sequence homology to the P. xylostella mitochondrial genome we annotated two rRNAs, 13 protein coding mitochondrial genes and 22 t-RNA, which showed a conserved gene order for Lepidopteran mitochondrial genomes (Dai, Zhu, Zhao, et al. 2016). The nucleotide sequence of 13 protein coding mitochondrial genes from 22 Plutella samples were then used to estimate the mitochondrial split time between P. xylostella and P. australiana at 1.96 Ma (95% confidence interval 6 0.175 Myr, fig. 2

Assessing Admixture between Australian Plutella Species
The F3-Statistic A formal test for genomic admixture was calculated using the three-population f-statistic (f3), first with simulated data sets to assess the level of sensitivity we could reasonably achieve, and second with empirical data. Simulated introgression frequencies of f ¼ 0.0, 0.05, 0.1, 0.2, and 0.3 were applied from a donor to a recipient. Introgression from ingroup 2 (I 2 ) into the outgroup (O) increased similarity between these groups, yet also reduced genetic differences between the outgroup and ingroup 1 (I 1 ). Despite the outgroup becoming more similar to I 2 , O still contained a large proportion of divergent loci which tends to confound the f3 statistic making negativity difficult to achieve, even with high levels of introgression (Peter 2016). Consequently, this approach failed to indicate shared ancestry through a negative f3-statistic ( fig. 3A). Next, introgression from O into I 2 was simulated, to assess sensitivity of introgression from P. australiana into Australian P. xylostella. Negative values were detected for mixing frequencies of !20% (f ¼ 0.2), indicating high rates of recent hybridization are required to detect introgression using the f3-statistic ( fig. 3B). Interestingly, spreading the total proportion of introgression to five equidistant time-points along the branch did not increase the detectability of admixture (supplementary fig.  S1, Supplementary Material online). This suggests the f3 is more dependent on the admixture frequency than the divergence between discordant and concordant regions. Applying the f3-statistic to empirical data failed to identify negativity in any tip order between Australian P. xylostella and P. australiana ( fig. 3C). Results for the f3-statistic were lower when assessing introgression between Hawaiian P. xylostella and P. australiana than from between the two sympatric Australian species, consistent with the nuclear phylogeny showing Hawaiian samples are more similar to P. australiana.  A lower f3(Pxyl Hawaii, Pau.; Pxyl Australia) value was estimated for SA 2014 than ACT 2014 and ACT 2015, which may be due to differences in nucleotide diversity between the Australian populations (table 1), as f3 is decreased proportional to the frequency of minor alleles in the target population. As f3 did not detect recent admixture events with two closely related ingroup tips, further tests were used to investigate introgression using smaller genomic windows.

Absolute Divergence (d XY )
Nei's measure of absolute divergence (d XY ) (Nei 1987) was used to compare genetic similarity between populations using 50-kb genomic windows for both simulated and empirical data sets. In all cases, population wide comparisons of d XY were performed between; 1) two ingroup populations (I 1 and I 2 ), 2) ingroup 1 and the outgroup (I 1 and O), and 3) ingroup 2 and the outgroup (I 2 and O). Comparions returing values approaching zero indicate high levels of similarity and a recent allelic split time. Low d XY values are expected between ingroup samples (I 1 and I 2 ), or in cases where introgression may be occurring between an ingroup and outgorup.
Absolute divergence in simulated populations was calculated for each 50-kb window (n ¼ 500), again for admixture occuring at f ¼ 0.0, 0.05, 0.1, 0.2, and 0.3. The distribution of d XY values obtained for each comparison were plotted as histograms normalized for density by rescaling such that the maxima of the distribution is 1 ( fig. 4 and supplementary figs. S2 and S3, Supplementary Material online). Introgression either from I 2 into O or from O into I 2 produced a decrease in absolute divergence across the genome, providing a benchmark for comparisons with empirical data. Admixture in the direction O to I 2 provided a much clearer genome wide signal than the reverse direction, (I 2 to O) indicating it would be easier to detect introgression from P. australiana into Australian P. xylostella than the reverse.
Based on the whole genome phylogeny ( fig. 1), we expected mean d XY between P. australiana and Hawaiian P. xylostella to be slightly lower than Australian P. xylostella. The d XY distribution of empirical 50-kb windowed data provided no support of widespread introgression ( fig. 4C), as values comparing the P. australiana outgroup with either P. xylsotella from Hawaii (I 1 ) or Australia (I 2 ) did not deviate from their expected values (table 3). This suggests concordance with the whole genome tree topology.

Phylogenetic Tree-Tip Distance Proportions
The f3-statistic and d XY were both used to test for introgression within populations. Next, we used the tree-tip distance proportion to assess whether evidence for introgression could be detected between individual sample pairs. Simulated genomes with introgression from I 2 into O, or O into I 2 at the rates f ¼ 0.0, 0.05, 0.1, 0.2, and 0.3 were divided into 50-kb sequence alignments, as described earlier. Further subdivision was then performed so each 50-kb window contained just four sequences; two ingroup 1, one ingroup 2, and one outgroup sequence. This was repeated 64 times for each 50-kb window, then maximum likelihood phylogenic reconstruction performed for each alignment. Based on the whole-genome topology, we expected the outgroup to be a similar distance from both ingroup 1 and ingroup 2, unless introgression had occurred and shortened the distance between samples.
A proportion of the branch distance between I 2 and O (d a ) and I 1 and O (d b ) was then calculated for each phylogeny using equation 1, normalizing values within the range 0-1. Tree-tip distance proportions are presented as histograms to graph the distribution (x axis), and normalized density (y axis). Although introgression from I 2 into O ( fig. 5A and supplementary fig. S4, Supplementary Material online) and from O into I 2 (fig. 5B) were both detected using distance proportions, patterns did vary based on the direction of admixture. Clearer signals of admixture were evident in the direction O to I 2 (supplementary table S4, Supplementary Material online), as this made ingroup 2 less similar to ingroup 1. Given I 1 and I 2 recently split, admixture from I 2 into O is also expected to make the outgroup more similar to I 1 , decreasing detectability. Simulating introgression over five equaly spaced events was effective at detecting admixture using tree-tip distance proportions (supplementary fig. S5, Supplementary Material online).
This method was then applied to empirical data to identify genomic windows that were discordant with the species tree. Similar to the simulated data sets, genomic alignments were divided into nonoverlapping 50-kb contiguous windows, then further subdivided into alignments of one P. xylostella and one P. australiana individual from sympatric Australian populations, plus two consistant P. xylostella from Hawaii. This produced 32 different sample combinations for each 50-kb window, including eight combinations from ACT 2014, eight from ACT 2015 and 16 from SA 2014. An average of 7276 (6293) maximum likelihood phylogenies were then produced for each of the 32 sample combinations to identify potential admixture that was not fixed in the population.  . 1). Under a widespread admixture hypothesis windows with distance proportions below the mean (P. australiana closer to P. xylostella from Australia) should be much more frequent than above. The number of windows above and below three distances from the mean (0.05, 0.1, 0.15) was similar (supplementary table S6, Supplementary Material online) suggesting no clear evidence to support widespread admixture within P. xylostella and P. australiana individuals from three sympatric populations.
Despite lack of support for widespread hybridization and genome-wide introgression, we further investigated the tails I 2 ), which did not reach negative values. For comparison, the f3-statistic for (I 2 , O; I 1 ) and (I 1 , I 2 ; O) were plotted with circles and triangles, respectively. (B) Simulated admixture from O into I 2 did produced a significant f3 statistic at a mixing frequency >0.2, as indicated by the values <0 f3(I 1 , O; I 2 ). (C) The f3statistic was then applied to empirical data, testing for admixture in three possible scenarios between Plutella xylostella from Hawaii (P.x. H, I 1 ), P. xylostella from Australia (P.x. A, I 2 ) and P. australiana (P.a., O). This suggests that, as no f3 values were <0, if admixture was occurring it could not be detected using this method. of the tree-tip distance proportions at a distance 0.15 below the mean for each Australian population. These scaffolds (n ¼ 21) have the shortest branch lengths between P. australiana and sympatric Australian P. xylostella, relative to the branch length proprtions between P. australiana and Hawaiian P. xylostella (supplementary table S7, Supplementary Material online). Sliding window d XY was performed on each of these scaffolds with 10-kb windows (sliding by 2 kb), revealing just one region on scaffold KB207303.1 where Australian P. xylostella and P. australiana are more similar than between Hawaiian and Australian P. xylostella.

Genomic Regions with High Interspecies Divergence
The two Plutella species investigated in this study have been shown to have contrasting biologies and pest potential (Perry et al. 2018), and although they can hybridize in laboratory crosses, we found no evidence for widespread admixutre among wild samples. This prompted us to ask which 50-kb genomic regions are most divergent between these species, and what kinds of genes do they encode? First, absolute divergence (d XY ) between all P. xylostella and all P. australiana individuals was used to identify the top 1% most divergent genomic windows (supplementary table S8, Supplementary Material online). These included fifty-one 50-kb windows dispersed across 41 unique scaffolds and showed 33-61% greater absolute divergence than the genome-wide average (d XY ¼0.0369). Second, the top 1% of genomic windows showing highest divergence in nucleotide diversity (F ST ) were also identified, showing values 70-110% higher than the mean (F ST ¼0.356). These two estimates of divergence only detected one 50-kb region common to both d XY and

Discussion
The discovery of cryptic species can often be inadvertent and arise from sequencing mitochondrial or nuclear amplicons (Stuart et al. 2006;Landry and Hebert 2013), as well as whole genomes (Janzen et al. 2017). The fortuitous identification of Plutella australiana was unexpected and raised initial concern over its pest status and whether specific management practices were required. Plutella australiana populations collected from across southern Australia (Perry et al. 2018) and Australian P. xylostella ) lack genetic structure, showing these species are highly mobile. Adaptive introgression of advantageous traits from one of these species into the other could potentially spread across the Australian continent. Despite high levels of movement, we sampled from sites where Plutella populations co-occur to attempt to detect either historical admixture or very recent hybridization. The physical genome size of P. xylostella is estimated at 339 Mb (Baxter 2011) while the reference genome assembly is 393 Mb (You et al. 2013) and includes sequencing gaps totaling $50 Mb. After aligning all resequenced genomes to the P. xylostella reference, only 170 Mb was retained in this analysis, which is likely to be caused in part by the sequence gaps and also high levels of genetic diversity (You et al. 2013). Mapping P. australiana sequence reads to the P. xylostella genome is affected by mapping bias, as the most divergent loci will not map to this reference. This causes all branches to be shortend toward the reference, underestimating the diveregence between P. australiana and P. xylostella in the whole genome phylogeny. However, the introduction of this bias is unavoidable as the only reference geneome within the superfamily Yponomeutoidea is currenlty Plutella xylostella.
Plutella australiana were more similar to P. xylostella samples from Hawaii than Australia, based on shorter phylogenetic branch lengths for nuclear genomes and subsequent tree-tip distance proportions, lower F ST values and lower f3statistics. A better understanding of migration or transport routes enabling P. xylostella to colonize the world would  (6293) branch distance ratio calculations. The branch leading to ingoup 1 was standardized using the same two Hawaiian P. xylostella individuals for each of these comparisons. help explain why this is the case. Several studies have found Australian P. xylostella mtDNA genomes have very low levels of diversity (Saw et al. 2006;Juric et al. 2017;Perry et al. 2018), which is indicative of a population bottleneck (and other factors), while we found Hawaiian mtDNA genomes to be quite diverse. This suggests the Hawaiian Islands may have been colonized by a larger founding population, or multiple, independent invasions while Australia may have simply been colonized by a derived population of P. xylostella (Juric et al. 2017).
Mitochondrial diversity between the two Plutella species was originally found to be $8.2%, based on sequencing COI amplicons (Landry and Hebert 2013), although the level of diversity across all thirteen protein coding genes is less (4.95%). This level of diversity was not sufficient to result in complete reproductive isolation between the two sister species when reared in the laboratory (Perry et al. 2018). Using the 13 mitochondrial genes, we estimated the split time of P. xylostella and P. australiana to be $1.96 Myr. To date, P. australiana has only been detected in Australia, yet this relatively recent split questions whether P. australiana did evolve within Australia. This would require a considerable migratory event some 2 Mya from the ancestral Plutella source population to Australia, and no further migration. Future molecular screening of P. xylostella may identify cryptic P. australiana in other countries.
Phylogenies of genes or genomic windows can deviatie from an expected consensus topology or species tree and can be used to identify genomic regions that may be of biological interst. For example, genomic regions subject to incomplete lineage sorting (Scally et al. 2012), horizontal gene transfer (Moran and Jarvik 2010) and adaptive introgression (Wallbank et al. 2016) all produce discordant phylogenies. Despite simulated data detecteting minor levels of introgression using phylogenetic tree-tip distances across the genome, we found few discordant distances between individual Plutella samples across 50-kb genomic windows. The methods used here were not sufficient to reject small regions of decreased d XY between Australian Plutella, which may be signals of past admixture. Future work into the evolutionary history of Plutella moths and sequencing outgroup genomes of Plutella species, will enable further analysis of these regions using the D statistic and f d (Martin et al. 2015).
The most divergent genomic windows between the two Plutella species identified using d XY or F ST showed little evidence for current selection and may potentially contain genes that underwent selection after speciation. These genes may reflect different abilities to evade host plant defenses or host plant preference, as many are involved with digestion and detoxification. Using absolute genetic divergence (d XY ) to identify the most divergent genomic windows between P. australiana and P. xylostella may also be highlighting loci that are highly polymorphic or rapidly evolving. Further understanding of Plutella biology including mating timing, evolutionary history, host plant preference and behavious may provide further insight into these divergent loci.
Plutella australiana and P. xylostella are likely to have been in secondary contact in Australia for over 125 years (>1000 generations). Despite this, we found no support for widespread admixture, and although we cannot predict the amount of time these species have spent in geographic isolation, strong reproductive barriers are apparent in the field. Furthermore, P. xylostella and P. australiana will be a useful system to investigate the genetic basis of biological differences between cryptic species from an agricultural perspective.

Supplementary Material
Supplementary data are available at Genome Biology and Evolution online.