A Highly Contiguous Genome Assembly of a Polyphagous Predatory Mite Stratiolaelaps scimitus (Womersley) (Acari: Laelapidae)

Abstract As a polyphagous soil-dwelling predatory mite, Stratiolaelaps scimitus (Womersley) (Acari: Laelapidae), formerly known as Stratiolaelaps miles (Berlese), is native to the Northern hemisphere and preys on soil invertebrates, including fungus gnats, springtails, thrips nymphs, nematodes, and other species of mites. Already mass-produced and commercialized in North America, Europe, Oceania and China, S. scimitus will highly likely be introduced to other countries and regions as a biocontrol agent against edaphic pests in the near future. The introduction, however, can lead to unexpected genetic changes within populations of biological control agents, which might decrease the efficacy of pest management or increase the risks to local environments. To better understand the genetic basis of its biology and behavior, we sequenced and assembled the draft genome of S. scimitus using the PacBio Sequel platform II. We generated ∼150× (64.81 Gb) PacBio long reads with an average read length of 12.60 kb. Reads longer than 5 kb were assembled into contigs, resulting in the final assembly of 158 contigs with an N50 length of 7.66 Mb, and captured 93.1% of the BUSCO (Benchmarking Universal Single-Copy Orthologs) gene set (n = 1,066). We identified 16.39% (69.91 Mb) repetitive elements, 1,686 noncoding RNAs, and 13,305 protein-coding genes, which represented 95.8% BUSCO completeness. Combining analyses of genome family evolution and function enrichment of gene ontology and pathway, a total of 135 families experienced significant expansions, which were mainly involved in digestion, detoxification, immunity, and venom. Major expansions of the detoxification enzymes, that is, P450s and carboxylesterases, suggest a possible genetic mechanism underlying polyphagy and ecological adaptions. Our high-quality genome assembly and annotation provide new insights on the evolutionary biology, soil ecology, and biological control for predaceous mites.


Introduction
The family Laelapidae comprises a multitude of morphologically and behaviorally diverse mesostigmatic mites that are free living or associated with arthropods, mammals, or birds (Lindquist et al. 2009). Some laelapid mites, especially the subfamily Hypoaspidinae, show important potential for use as biological control agents against agricultural pests, such as Stratiolaelaps scimitus (Womersley). This species which often appears to have been confused with S. miles (Berlese) and has been distinguished from the later based on comparisons of mitochondrial DNA sequences and morphological analysis (Walter and Campbell 2003;Yan et al. 2020). In nature, S. scimitus is broadly distributed throughout the Holarctic, and with it is widely marketed for use in mushroom house and greenhouse production systems to manage pests (Walter and Campbell 2003;Xie et al. 2018). Additionally, S. scimitus can prey on some pests, including soil-pupating western flower thrip (Thysanoptera: Thripidae) in greenhouses, springtails (Collembola) and fungus gnats (Diptera: Sciaridae) in mushroom production facilities, and bulb mites (Acari: Astigmata) on lilies (Cabrera et al. 2005). Recently, S. scimitus has been studied for the potential as a biocontrol agent against other edaphic pests, such as Bradysia odoriphaga Yang and Zhang (Diptera: Sciaridae) Zhou et al. 2018). This revealed the potential for S. scimitus to be used against soil-inhabiting pest of agricultural importance. High-quality genomes could facilitate studies of biology, evolutionary biology, and molecular mechanisms in adaptions to environmental changes. To date, 32 Acari genomes, including 21 mites and 11 ticks (Ixodidae), have been public (NCBI, accessed December 16, 2020). Most mite genomes are of small sizes (<200 Mb) but 14 of 21 mites with poor assembly quality, that is, number of scaffolds >10,000 and N50 length <100 kb. Among them, only two predatory mite genomes can be accessed: Galendromus occidentalis (western predatory mite) (Hoy et al. 2016) and Dinothrom biumtinctorium (Dong et al. 2018;Zhang et al. 2019). Here, we present a de novo genome assembly of S. scimitus using Pacific Bioscience (PacBio) single-molecule real-time long reads, annotate the repeats, protein-coding genes, and noncoding RNAs (ncRNAs), and compare gene family evolution across the main Chelicerata clades, particularly those rapidly evolving families.

Sample Collection and Sequencing
The parthenogenetic monoisolate of S. scimitus used for sequencing was collected from topsoil under the bamboo of Shandong Agricultural University, Taian, Shandong,China (36.114 N,117.064 E) in May, 2017, and was bred for more than 23 generations in our lab. A total of 100, 100, 1, 200 females were prepared for Illumina whole-genome, Illumina transcriptome, and PacBio sequencing, respectively. Genomic DNA/RNA extraction, library preparation, and sequencing were carried out at Berry Genomics (Beijing, China). Libraries were constructed with insert sizes of 20 and 350 bp, respectively, for PacBio Sequel II and Illumina NovaSeq 6000 platforms. Quality control of raw Illumina data was performed using BBTools suite v38.49 (Bushnell 2014): remove duplicates using "clumpify.sh"; trim both read sides to Q20, discard reads shorter than 15 bp or with >5 Ns, trim poly-A/G/C tails of at least 10 bp; and correct overlapping paired reads using "bbduk.sh."

Genome Assembly
We performed genome survey based on short-read k-mer distributions using GenomeScope v1.0.0 (Vurture et al. 2017): K-mer frequencies was estimated with 21-mers using khist.sh (one of the BBTools suite), and maximum k-mer coverage cutoffs were set as 1,000 and 5,000.

Results and Discussion
Genome Assembly and Annotation Our final draft assembly had 158 scaffolds/contigs of 426.50 Mb, a scaffold/contig N50 length of 7.66 Mb, the longest sequence of 31.29 Mb, and a Guanine-Cytosine content of 45.85%. It has the highest contiguity quality compared with five public Mesostigmata genomes. Assembly size was almost identical with the estimated ones. BUSCO assessment against arthropod set (n ¼ 1,066) revealed the high completeness and very low redundancy of our assembly: 93.1%   1.-Phylogeny, orthologs, and gene family evolution. (a) Dating tree with node values representing the number of expanded, contracted, and rapidly evolving families. "1:1:1" represents shared single-copy genes, "N:N:N" as multicopy genes shared by all species, "Mesostigmata" as shared orthologs unique to Mesostigmata, "Others" as unclassified orthologs, "Unassigned" as orthologs which cannot be assigned into any gene families  were 7,870.13, 372.35, and 1,105.66 bp, respectively. BUSCO completeness assessment using protein mode "-m prot" identified 95.8% complete, 2.3% complete and duplicated, 0.9% fragmented, and 3.3% missing BUSCO genes (table 1), implying the high completeness of our predicted gene set. BUSCO completeness against predicted genes was slightly higher than assessment against genome assembly. It indicated the Augustus-based gene prediction under genome mode within BUSCO pipeline may had weaker capability of capturing complete genes using the default fly gene model training parameters. Diamond searches aligned 11,687 (87.84%) genes to the Uniprot proteins. InterproScan and eggNOG functional annotations assigned protein domains of 10,248 (77.02%) genes; 8,943 GO terms;7,147 KEGG ko terms;2,576 Enzyme Codes;4,422 KEGG and 4,083 Reactome pathways;and 9,453 COG categories,respectively. Phylogeny A total of 89.1% (183,669) genes were clustered into 18,319 orthogroups (gene families). Among them, 3,161 families were shared by all 11species and 399 are single-copy ones; 916 families and 6,309 orthologs are common to five Mesostigmata species (fig. 1a). For S. scimitus, 12,274 (92.25%) genes were assigned into 9,296 orthogroups; 571 families and 3,672 genes were species specific.
Thirty-nine single-copy loci were removed by IQ-TREE "symtest" prior to formal phylogenetic analyses. Phylogenetic reconstruction based on 360 single-copy loci revealed that S. scimitus were clustered with other four Mesostigmata species and were sistered to Varroa (Varroidae) rather than Tropilaelaps mercedesae (Laelapidae), questioning the current classification of Varroidae and Laelapidae. Considering that many members of both families acted as parasites associated with honeybees. Varroidae is possibly the ingroup of Laelapidae. Mesostigmata, Dermanyssoidea, and S. scimitus originated from early Cretaceous (132-142 Ma), early Paleocene (65-70 Ma), and middle Eocene (44-48 Ma), respectively ( fig. 1a). The emergence of these Parasitiformes mites may be related to the pervasive reptiles, birds, mammals, and insects since Cretaceous.

Gene Family Evolution
We identified 221 rapidly evolving gene families using CAF E, 135 and 86 of them experiencing significant expansions and contractions, respectively ( fig. 1a). The top 25 largest expanded families were shown in figure 1b. Many of them are related to dietary digestion and detoxification, such as cytochrome P450, ABC transporter, carboxylesterase, trypsin, cathepsin L, long-chain fatty acid transport protein, lipase, thyroglobulin, salivary protein, and salivary gland protein. It explains the possible mechanism of the wide dietary for this predatory S. scimitus. The largest expanded family, cytochrome P450, obviously plays an important role in digestion and detoxification by contributing to xenobiotic metabolism, insecticide resistance, odorant, or pheromone metabolism (Feyereisen 2005). Interestingly, toxin-related proteins, that is, cysteine-rich secretory protein family referred from Pfam annotations, may be helpful for predatory progress by inhibiting both smooth muscle contraction and cyclic ntgated ion channels (Yamazaki and Morita 2004). GO (fig. 2a) and KEGG ( fig. 2b) enrichment further confirmed above hypotheses, most categories related to digestion and detoxification, such as GO terms monooxygenase activity, lipid hydroxylation, and KEGG pathways steroid hormone biosynthesis, metabolism of xenobiotics by cytochrome P450, vitamin digestion and absorption, ABC transporters, and ovarian steroidogenesis, etc. Gene family evolution provides essential evidence, supporting genetic mechanisms of polyphagy and ecological adaptions for S. scimitus.

Supplementary Material
Supplementarydata are available at Genome Biology and Evolution online.