-
PDF
- Split View
-
Views
-
Cite
Cite
Paula Arribas, Carmelo Andújar, María Lourdes Moraza, Benjamin Linard, Brent C Emerson, Alfried P Vogler, Mitochondrial Metagenomics Reveals the Ancient Origin and Phylodiversity of Soil Mites and Provides a Phylogeny of the Acari, Molecular Biology and Evolution, Volume 37, Issue 3, March 2020, Pages 683–694, https://doi.org/10.1093/molbev/msz255
- Share Icon Share
Abstract
High-throughput DNA methods hold great promise for phylogenetic analysis of lineages that are difficult to study with conventional molecular and morphological approaches. The mites (Acari), and in particular the highly diverse soil-dwelling lineages, are among the least known branches of the metazoan Tree-of-Life. We extracted numerous minute mites from soils in an area of mixed forest and grassland in southern Iberia. Selected specimens representing the full morphological diversity were shotgun sequenced in bulk, followed by genome assembly of short reads from the mixture, which produced >100 mitochondrial genomes representing diverse acarine lineages. Phylogenetic analyses in combination with taxonomically limited mitogenomes available publicly resulted in plausible trees defining basal relationships of the Acari. Several critical nodes were supported by ancestral-state reconstructions of mitochondrial gene rearrangements. Molecular calibration placed the minimum age for the common ancestor of the superorder Acariformes, which includes most soil-dwelling mites, to the Cambrian–Ordovician (likely within 455–552 Ma), whereas the origin of the superorder Parasitiformes was placed later in the Carboniferous-Permian. Most family-level taxa within the Acariformes were dated to the Jurassic and Triassic. The ancient origin of Acariformes and the early diversification of major extant lineages linked to the soil are consistent with a pioneering role for mites in building the earliest terrestrial ecosystems.
Introduction
Mites (Acari) are present worldwide in virtually any locality capable of supporting life, from Antarctic nunataks to the follicles of human hair (Krantz and Walter 2009). They comprise ∼55,000 described species, but the actual number is estimated to be 500,000–1,000,000, and possibly a lot more, as new mite species are routinely discovered even in previously well-collected places (Walter and Proctor 1998). Mites have a long evolutionary history, dating back to at least 410 million years ago (Ma) based on their presence in early terrestrial fossil layers (Hirst 1923; Dubinin 1962). However, their evolutionary history is only partially understood, and mites remain one of the least studied major branches of the animal Tree-of-Life (Dunlop and Alberti 2008; Krantz and Walter 2009). The uncertainties in particular concern the times of origin of the earliest mite lineages. Recent studies place them either to the pre-Cambrian, implying an early diversification in an aquatic or semiaquatic environment (Schaefer et al. 2010), or to the Devonian well after the formation of terrestrial ecosystems (Dabert et al. 2010; Xue et al. 2017).
The greatest number of mite species is found in soils, where they constitute an essential part of the mesofauna. They predominantly exist in the soil matrix and are implicated in the soil trophic chain as predators, scavengers, or primary consumers (Burges and Raw 1967). Mites are essential for soil viability, aeration and water retention, decomposition of plant matter, microbial biomass control, and nutrient cycling (Bardgett et al. 2005; Bardgett and van der Putten 2014). These functional roles, together with their presumed antiquity, point to them as a likely key factor in the evolution of the earliest soils. Studies of the origin and deep-level diversification of the Acari, including the generation of time-calibrated phylogenetic trees, are thus an essential first step toward the reliable characterization of soil mite diversity and community evolution (e.g., Bardgett and van der Putten 2014; Veresoglou et al. 2015).
The study of soil Acari is hampered by extreme body miniaturization, which has likely stifled the interest of taxonomists in this group and delayed the application of new sequencing technologies. As a consequence, DNA-based studies of basal relationships of Acari are scarce and have largely neglected the soil-dwelling groups. Existing data instead are skewed toward lineages closely associated with humans, such as ticks, dust mites, or herbivorous agricultural pests. These groups are also represented by more extensive sequence data, including mitochondrial genomes and transcriptomes (see Sharma et al. 2014; Xue et al. 2016; Xue et al. 2017). However, in comparison with other large groups of arthropods, the study of deep level phylogenetics within Acari is lagging and mostly based on a limited set of mitochondrial and nuclear loci (cox1, 18S, and 28S rRNA; e.g., Klompen et al. 2007; Dabert et al. 2010; Pepato et al. 2010; Schaefer et al. 2010; Pepato and Klimov 2015) that are only slowly being extended to include a wider range of markers (e.g., Klimov et al. 2018).
Taxonomically, the Acari can readily be separated into two superorders, the Acariformes and Parasitiformes, which broadly correspond to the predominantly soil-dwelling and the parasitic forms, respectively, but most acarologists would agree that both lineages are not closely related and thus Acari are not monophyletic. To date, neither morphological nor molecular studies have convincingly resolved the sister group relationships of either superorder within the wider arachnids (for review see Dunlop and Alberti 2008). Some molecular studies have proposed the Acariformes as sister lineage to Solifugae (Pepato and Klimov 2015), whereas more recent transcriptome studies have found an affiliation with Pseudoscorpiones (Sharma et al. 2014). Within these two superorders, the cladistic hypotheses for Parasitiformes are in general agreement across studies and methodologies, whereas the systematics of the superorder Acariformes is quite convoluted. For example, the basal dichotomy of the two orders Sarcoptiformes and Trombidiformes has been recently questioned due to the dubious position of Eriophyoidea (gall mites and rust mites) and the uncertain relationships between the noneriophyoid Trombidiformes and Sarcoptiformes (Klimov et al. 2018).
Faced with the logistical challenges for a comprehensive phylogenetic analysis of the Acari, next-generation sequencing provides new opportunities for data acquisition from large numbers of very small and morphologically cryptic specimens. In particular, shotgun “mitochondrial metagenomics” (MMG) can be used for the simultaneous sequencing of mitochondrial genomes from numerous specimens, based on metagenome skimming from bulk or pooled samples, which greatly reduces the amount of DNA required for each individual (Crampton-Platt et al. 2015; Linard et al. 2015). Mitochondrial genomes are powerful phylogenetic markers and have been used successfully for resolving relationships at intraordinal and interfamiliar levels for arthropods, particularly under dense taxon sampling and with the use of adequate phylogenetic models (Timmermans et al. 2016; Basso et al. 2017; Wang et al. 2017).
Here we applied the MMG approach to a sample of mite specimens extracted from natural forest and grassland soils in southern Iberia, which generated >100 mitochondrial genomes representing major acarine lineages linked to the soil environment and nearly tripled the number of mite families with publicly available mitochondrial genomes. The increased taxon, lineage, and gene sampling allowed for a robust phylogenomic analyses to 1) test existing cladistic hypotheses for the relationships among lineages in the Acariformes and Parasitiformes, with special focus on the Trombidiformes—Sarcoptiformes divide; 2) estimate the temporal frame for the origin and early diversification of major mite lineages and evaluate its consistency with fossil evidence and previous molecular clock approaches; and 3) shed light on the origin and phylodiversity of soil mites and their potential role in the earliest terrestrial communities. Additionally, this study serves as paradigmatic example of the “soup to tree” approach to assemble the Tree-of-Life for the “hidden biodiversity” of the soil that still remains largely unknown.
Results
The Mitochondrial Genomes of Acari
Soil samples obtained from superficial and deep soil layers in a landscape of Quercus forest and wet grassland at Sierra de Grazalema (Southern Iberian Peninsula) (supplementary table S1, Supplementary Material online) produced large numbers of mites. Specimens were selected for recognizable morphological diversity under a dissection microscope, subjected to nondestructive individual DNA extractions, and morphologically identified to species or genus for adult forms, or to family for immatures (supplementary table S1, Supplementary Material online). The MMG procedure yielded 106 mitogenomes, which each were assigned to a particular morphological specimen by match to a cox1 (barcode) bait obtained from the same specimens with Sanger sequencing. All baits had an exact sequence match to the corresponding portion of the mitogenome, which together with the strict process of mitogenome assembly by multiple genome assemblers and subsequent steps of gene annotation, alignment and curation, allowed us to validate the data set and to demonstrate the minimal, if any, impact of mitochondrial nuclear copies or spurious assemblies of the mitogenomes.
A total of 71 mitogenomes were complete (i.e., included 13 protein coding and 2 ribosomal RNA genes) and the remaining 35 contigs ranged from 5,000–13,000 bp in length (see supplementary table S1, Supplementary Material online). The taxonomic profile of the newly assembled mitogenomes covered the main soil lineages across Acari: 22 mitogenomes from the superorder Parasitiformes, all of them from the order Mesostigmata; and 84 from the superorder Acariformes, including 15 from the order Trombidiformes and 69 from the order Sarcoptiformes. The new data set was taxonomically complementary to 78 mitogenomes of Acari obtainable from GenBank (Mitochondrial Genome database), which mostly represented “human associated” species: 46 Parasitiformes including 40 Ixodida (ticks); 23 Trombidiformes, of which almost all were agricultural pests (e.g., Tetranychus) or mammal parasites (e.g., Demodex or Leptotrombidium); and 9 Sarcoptiformes including 8 Astigmatina (dust mites) and a single species of Oribatida (Steganacarus magnus). The combined data set comprised 184 mitogenomes of Acari from 62 families, of which 44 were exclusively present in the newly assembled mitogenomes (supplementary table S1, Supplementary Material online). In addition, 20 outgroups of Arachnida (Amblypygi, Araneae, Opiliones, Pseudoscorpiones, Ricinulei, Scorpiones, Solifugae, and Uropygi) and Xiphosura were obtained from GenBank, for a total of 204 mitogenomes and a concatenated matrix of individual gene alignments of 15,047 bp.
The Deep Level Phylogeny of Acari
Under the most appropriate alignment settings and substitution models (supplementary table S2, Supplementary Material online), phylogenetic analyses using different algorithms and data treatments resulted in mostly congruent, well-supported trees (see fig. 1; supplementary figs. S1–S9, Supplementary Material online), whose characteristics were tabulated to indicate monophyly and node support for the main lineages of Acari across the different settings (supplementary table S3, Supplementary Material online). The two superorders Parasitiformes and Acariformes were well supported, but their positions relative to the arachnid outgroups differed across the analyses. All topologies resulted in a diphyletic Acari, placing the Parasitiformes as sister group to a clade comprising the Acariformes and various nonacarine outgroups. The Acariformes were grouped with Pseudoscorpiones in most analyses, but most other basal relationships among the nonacarine groups varied across the analyses and were generally poorly supported. Within the superorder Parasitiformes we obtained highly consistent results, with the three monophyletic orders arranged as (Mesostigmata [Ixodida + Holothyrida]). Within Mesostigmata two major supported clades include Uropodina together with Cercomegistina and the cohort (infraorder) Gamasina. In the latter, the subcohort (hyporder) Epicriina was sister group to, or included in a clade comprising different lineages of the subcohorts Dermanyssiae and Parasitiae.

Time-calibrated tree and evolution of mitochondrial gene rearrangements across the phylogeny of Acari using the 14min + 2max set of calibration points. Colored circles: posterior probabilities of the PhyloBayes analysis using the AA data set (black for >0.95). Grey bars: 95% confidence intervals for node ages (Ma). Colored triangles: ancestral mitotypes by BEAST (upper) and TreeREx (lower). Circles above branches: rearrangement events inferred by TreeREx (T, transposition; I, inversion; iT, inverse transposition; star, multiple steps rearrangement events). Boxes/vertical bars: solid line for monophyletic supercohort/cohort/subcohort and discontinuous for nonmonophyletic ones. *Eriophyoidea was recovered as sister group to the Trombidiformes plus Sarcoptiformes lineage thus breaking the monophyly of Trombidiformes in the traditional sense. Vertical discontinuous lines: oldest terrestrial fossils for some of the earliest land colonizers. Color of terminals: red for newly generated soil Acari mitogenomes, black and blue for Acari and outgroups from GenBank, respectively. See supplementary table S1, Supplementary Material online for details on the terminals.
Within the superorder Acariformes, the order Trombidiformes in the traditional sense (see below) was always paraphyletic in contrast to a monophyletic Sarcoptiformes. Most analyses recovered a basal split of a well-supported clade composed of the families Eriophyidae and Diptilomiopidae (=Eriophyoidea of the also paraphyletic supercohort Eupodides), which was the sister group to a clade of the remaining Trombidiformes plus Sarcoptiformes. The noneriophyoid Trombidiformes consisted of three well-supported clades, although of uncertain relative position, including 1) the supercohort Eupodides (excluding Eriophyoidea) and Labidostommatides, 2) a single representative from the cohort Anystina (supercohort Anystides), and 3) a clade containing the monophyletic supercohort Eleutherengonides as sister group of the monophyletic cohort Parasintengonina (as representative of the supercohort Anystides). Maximum likelihood analyses coding the data as amino acids (AA data set) also recovered the ancestral split of the traditional Trombidiformes, but in this case the families Eriophyidae and Diptilomiopidae (Eriophyoidea) were grouped with the supercohort Eleutherengonides.
Within the Sarcoptiformes (including multiple Oribatida and Astigmatina but lacking samples of the endeostigmatan lineages and Paleosomata that may be part of this group), analyses consistently recovered the monophyletic supercohort Enarthronotides as sister group to all other Sarcoptiformes. The latter was split into four well-supported clades, including two clades of the supercohort Mixonomatides and two clades of the supercohort Desmonomatides, including the monophyletic cohort Astigmatina as sister lineage to a well-supported clade comprised of the cohorts Nothrina and Brachypilina. The relationships among these four groups of Sarcoptiformes varied across the analyses, sometimes resulting in a paraphyletic supercohort Mixonomatides. However, the monophyly of the supercohort Desmonomatides was recovered consistently, with Astigmatina as sister clade to the rest of the Desmonomatides. At the family level, a high proportion of the families (>80%) was found as well-supported clades in all analyses (for details see supplementary table S3, Supplementary Material online).
The Evolution of Mitochondrial Gene Rearrangements across Acari
Mitochondrial gene order is known to be highly variable in Acari (Xue et al. 2016) and may be used as a phylogenetic character (e.g., Basso et al. 2017; Wang et al. 2017). Across the complete data set (204 taxa, including outgroups) we identified 37 different “mitotypes” defined by different gene orders of protein and rRNA coding genes, in addition to more frequent rearrangements of tRNA genes not specifically studied here. A total of 20 mitotypes were exclusive to the newly assembled data set. Rearrangements of the ribosomal (rrnL and rrnS) and adjacent nad1 and nad2 genes were the most frequent across Acari (see supplementary table S1 and fig. S10, Supplementary Material online). Mitotype1 represented the putative ancestral arthropod mitochondrial gene order and was also the most widespread in Acari, whereas 20 of the 37 identified mitotypes were restricted to a single terminal (fig. 1 and supplementary table S1, Supplementary Material online). Ancestral state reconstructions with BEAST and TreeREx were highly congruent and inferred mitotype 1 as the ancestral gene order for both Acariformes and Parasitiformes (also shared with the arachnid outgroups). Within Parasitiformes, mitotype 1 remained almost unchanged within the orders Holothryda and Ixodida, where a single translocation of the nad1-rrnL-rrnS gene block occurred within the family Ixodidae. Within the order Mesostigmata, rearrangements were inferred to result from single translocations (e.g., to mitotype 17 characterizing the cohort Uropodina) or required multiple translocation steps (e.g., those within the cohort Gamasina). In Acariformes, three gene orders were derived independently from mitotype 1 in the early diversification of the group. An inversion of the ribosomal RNA genes to mitotype 3 was exclusive to the families Eriophyidae and Diptilomiopidae (Eriophyoidea) at their basal split from the rest of Acariformes. Another translocation of the rrnS gene (mitotype 8) in Trombidiformes gave rise to a complex series of rearrangements across the cohort Parasintengonina and the family Tetranychidae. A further single translocation of the nad2 gene (mitotype 20) supported the Sarcoptiformes, including the supercohorts Enarthonotides and Desmonomatides (excluding Astigmatina). The remaining clades of Sarcoptiformes exhibited various gene orders derived from mitotype 20. The Astigmatina was characterized by a complex rearrangement; whereas two independent translocations derived from mitotype 20 marked each of the two clades within the supercohort Mixonomatides.
The Temporal Frame of Acari Evolution
The tree was dated using the oldest available fossils for lineages of Acari at various hierarchical levels (reviewed by Dunlop and Selden 2009) (see supplementary table S4, Supplementary Material online for details). These ages were used as hard minimum bounds for the stem node of the corresponding lineages by applying uniform distributions with uninformative maximum age constraints. This way of calibration avoids an important source of uncertainty that arises from implicitly constraining maximum ages at every calibration point when other parametric functions are used to calibrate (see Materials and Methods). We performed the analyses for 3 nested sets of calibration points using from 14 to 22 fossils, each of which representing the oldest records for the respective taxon (Dunlop and Selden 2009), but whose placement on our trees is less certain for some of them (see Materials and Methods and supplementary table S4, Supplementary Material online). The three analyses were conducted with and without constraining the maximum ages of the Acariformes and Parasitiformes crown groups based on the 99% confidence interval age for their first fossil appearance, as estimated by the statistical analysis of the available fossil series (see Materials and Methods and supplementary tables S4 and S5, Supplementary Material online for details). The results for the analyses using each of the three sets of minimum hard bounds (14–22 fossil minimum ages) were very similar, and comparisons of those with and without maximum age constraints resulted in similar median ages. The range of ages with maximum probabilities also coincided and the 95% highest posterior density (HPD) intervals were fully overlapping but, as expected, the calibration analyses without constrained maximum ages tended to have wider ranges due to longer tails of the marginal probability distributions to the past (supplementary fig. S16, Supplementary Material online). The results of all analyses supported minimum age for the origin of Acariformes in the Cambrian–Ordovician, at 455 Ma (95% HPD: 417–508 Ma), 458 (419–511) Ma, and 469 (424–512) Ma for the 3 analyses with maximum constraints, and at 496 (412–739) Ma, at 552 (415–810) Ma, and at 538 (415–805) Ma for the corresponding analyses without any upper limit to divergence times (fig. 1 and supplementary figs. S11–S16, Supplementary Material online).
The time line of Acari evolution, estimated under the most conservative set of calibration points and maximum ages constraints (14min + 2max calibration set; fig. 1), placed the minimum age for the origin of Acariformes (crown group age) at 455 (417–508) Ma, followed by the early diversification of Eriophyoidea and the Trombidiformes during the Silurian and early Devonian. The minimum age for the origin of the Sarcoptiformes (here Oribatida and Astigmatina) was estimated to be in the early Devonian (389 Ma; HDP: 355–440 Ma), with multiple basal lineages of Oribatida dated back at least to the Devonian and Carboniferous. In Parasitiformes, the age for the crown node was estimated at 310 (268–359) Ma, placing their minimum age to the Carboniferous. The subsequent diversification of Mesostigmata and Ixodida were estimated at 279 (238–329) Ma and 250 (216–289) Ma, respectively. All families represented in the tree (with two or more representatives) dated back at least to the Jurassic and Cretaceous while Oppiidae and Phthiracaridae dated to the Triassic and Trombiculidae to the Permian. The times of origin of the different supercohorts and cohorts varied greatly. The estimated rate of molecular evolution was 0.0146 substitutions per site per million years per lineage (subs/s/Ma/l) (95% HPD: 0.0082–0.022) for the protein coding genes, and an almost 10-fold lower rate for the rrnL and rrnS genes at 0.00175 and 0.00189 subs/s/Ma/l, respectively. These rates were similar to those estimated for other arthropods in recent studies (e.g., Papadopoulou et al. 2010; Andújar et al. 2012).
Discussion
The Deep Phylogeny of Acari
The mitogenome data are notable for their power to resolve the deep level systematics of the Acari, despite great divergences of lineages within both superorders. The mitogenomes also produce strong evidence for the monophyly of the two superorders Acariformes and Parasitiformes and support the findings of recent studies that both groups are only distantly related, whereas the relationships of either group with other Arachnida lineages remain ambiguous. In the ML analysis, the Acariformes was a well-supported sister group of Pseudoscorpiones, contrary to some molecular studies placing Acariformes as sister lineage to Solifugae, in the so-called Poecilophysidea (e.g., Pepato and Klimov 2015; Klimov et al. 2018), but in accordance with studies using mitochondrial genomes (Xue et al. 2016; Xue et al. 2017) and more than 3,600 orthologs from transcriptome sequencing by Sharma et al. (2014). The latter study attributed the Acariformes +Pseudoscorpiones to spurious long-branch attraction that was not obtained when only considering the 200 slowest-evolving genes, and given the limited taxon sampling, the authors emphasized the need to break long branches by including putatively slowly evolving exemplars. Our analysis showed that the lineages used by Sharma et al. (2014), specifically Astigmatina and Tetranychidae, also harbor the fastest evolving mitogenomes of all Acariformes (see branch lengths in supplementary figs. S1–S9, Supplementary Material online), but despite the much broader taxon representation that presumably overcomes these long-branch artifacts we still find the Pseudoscorpions +Acariformes clade. However, the PhyloBayes analysis (which uses a model that is less sensitive to rate heterogeneity) did not fully support this node, which leaves open the possibility that biases in evolutionary rates affect the deepest acariform relationships.
Within the Acariformes, the mitogenomes support the Sarcoptiformes but reject Trombidiformes in the traditional sense due to the absence of the Eriophyoidea. Recent molecular studies have already moved away from the Trombidiformes–Sarcoptiformes dichotomy of earlier cladistic analyses and are producing evidence for multiple basal splits (Klimov et al. 2018). In our analyses, the deepest branch was occupied by Eriophyoidea, whose phylogenetic position is still unclear but recently hypothesized to be near the family Nematalycidae, a member of the ancient Endeostigmata (Bolton et al. 2017; Klimov et al. 2018). The latter is composed of several families of vermiform mites inhabiting the deep soil, and maybe an early branch of, or even the sister group to all other Acariformes. The critical Endeostigmata unfortunately were not encountered in our sample although several families are expected in this habitat. If the placement of Eriophyoidea within Nematalycidae is correct, these mites are the only representative of Endeostigmata available for this study, although they constitute a very derived lineage not associated with the soil habitat. Yet, this placement is consistent with the initial separation of the Eriophyoidea from all other Acariformes in our trees, which is additionally characterized by an exclusive mitochondrial rearrangement (mitotype 3), derived from the ancestral order of Acariformes (mitotype 1) by a single inversion of the ribosomal genes, and which is now a possible clade marker for the wider Endeostigmata.
After excluding Eriophyoidea, the remaining lineages of Trombidiformes were strongly supported as sister group to Sarcoptiformes. Previous studies differed regarding this conclusion dependent on the chosen genes and partition strategies (Xue et al. 2016; Bolton et al. 2017; Xue et al. 2017; Klimov et al. 2018). Within the (noneriophyoid) Trombidiformes, two basal branches contributed by newly sampled soil mitogenomes corresponding to the suborders Eupodides, Labiostommatides and Anystides, whereas Eleutherongonides was represented by existing sequences mostly of spider mites (Tetranychidae). The results on the internal classification of the trombidiform supercohorts are only partially congruent with the two main hypotheses from morphology presented in Norton et al. (1993) and Lindquist (1996) and further support the need for a deep review of relationships across the entire eriophyoid–trombidiform complex, with focus on the potentially nonmonophyletic supercohorts Eupodides and Anystides.
On the contrary, for the Sarcoptiformes the mitochondrial genomes largely confirm existing cladistic hypotheses, including 1) the monophyly of the order Sarcoptiformes, 2) the sister relationship of the supercohort Enarthronotides and Mixonomatides +Desmonomatides, and 3) the monophyly of the supercohort Desmonomatides (Pepato and Klimov 2015; Klimov et al. 2018). However, our results raise the potential paraphyly of the supercohort Mixonomatides, which will require further review. The position of Astigmatina (a hyperdiverse group including many medically and economically important species) within Desmonomatides is consistent with recent molecular and some morphology-based studies (OConnor 1984; Norton 1998). This finding is evidence that mitochondrial genome data are in this case less susceptible to long-branch attraction than nuclear ribosomal genes (Dabert et al. 2010).
Finally, the second major branch of the Acari, the superorder Parasitifomes, constitutes a deep lineage within the class Arachnida that goes back even further than the Acariformes, but the crown group is much younger. Our study recovered a great diversity of hitherto unsampled lineages in the order Mesostigmata, which unlike many other Parasitiformes are free-living and include predatory soil mites. The tree is highly consistent with a previous molecular study based on rRNA markers (Klompen et al. 2007), with two well-supported clades, corresponding, respectively, to the orders Mesostigmata and Ixodida plus Holothyrida. Within the Mesostigmata, relationships among the present cohorts are in agreement with the rRNA study and further effort should be focused on adding missing lineages such as Sejida, the presumed most primitive suborder of Mesostigmata.
The Ancient Diversity of Acari in Soils and Their Role in Early Terrestrial Ecosystems
Our broad focus on hitherto neglected soil-dwelling lineages of mites and largely complete mitogenomes for the estimation of divergence times places the minimum age of Acariformes to the Cambrian–Ordovician, with median estimated ages of the Acariformes crown node for the different molecular calibration analyses ranging from 455 to 552 Ma. The corresponding marginal probability distributions (95% HPD intervals) extend these ranges to 417–512 Ma for calibrations schemes using maximum age constraints and 413–810 Ma without such constraints (supplementary fig. S16, Supplementary Material online). The favored Cambrian–Ordovician origin of the Acariformes is included in the age ranges obtained with all calibration schemes, whereas the dates extending far into the Precambriam are only obtained under the broad intervals obtained without maximum age constraints. The current study therefore suggests an earlier origin of Acariformes than previous molecular calibrations from rRNA genes and mitogenomes dated to the early Devonian (e.g., Dabert et al. 2010; Xue et al. 2017), but is in good agreement with recent molecular estimations for the origin of some of the main terrestrial arthropod lineages (Rota-Stabelli et al. 2013; Lozano-Fernandez et al. 2016). The estimated origin of Sarcoptifomes in the Silurian-early Devonian (Oribatida and Astigmatina in our study) matches the classic view based on the fossil record of Oribatida (Norton et al. 1988; Subías and Arillo 2002). However, these estimates are less ancient than those obtained by Schaefer et al. (2010) of a Precambrian origin, a hypothesis which is still plausible but unlikely with our results, as again the upper bound ages of the estimated 95% confidence intervals are overlapping with the Precambrian for our most relaxed analyses, but not overlapping with the more restricted ones (with maximum age constraints). Although the precise age for the origin of the Acariformes is still an open topic, the range of minimum ages here estimated at the Cambrian–Ordovician supports an aquatic or semiaquatic origin of the group, as similarly has been proposed for other terrestrial arthropod lineages (Lozano-Fernandez et al. 2016).
The current age estimates may still be compromised to some extent by taxon sampling. For example, the earliest branch of the Acariformes in our tree is represented by the Eriophyoidea, but earlier diverging lineages could be present with more complete sampling of the presumably nonmonophyletic Endeostigmata. However, these lineages are unlikely to change the estimated age of Acariformes dramatically because the branch length between the crown node of Acariformes and the stem node defining the split from the Pseudoscorpiones (but also Ricinulei, Opiliones) corresponds to only ∼25 Ma. Thus, any more basal branches that might be added by the inclusion of further Endeostigmata can only be placed within this limited age range. In a similar way, the choice of calibration constraints may cause uncertainties. For example, the time calibration of the Sarcoptiformes was constrained using the oldest fossil for the order dated to 392 Ma (Dunlop and Selden 2009), whereas an alternative calibration point could be the next inner Enarthronotides stem node. The fossil in question (Protochthonius gilboa, Devonacarus sellnicki) represents extinct lineages of monobasic genera and families (only established to accommodate these fossils), both considered early derivative members of Enarthronotides but with uncertain relationships (Norton et al. 1988). Therefore, we avoided the use of this fossil to constrain the Enarthronotides stem node, but favored a position deeper in the tree. Another reason supporting this deeper constraint is the uncertain monophyly of Enarthronotides, particularly regarding the early derivative lineages (see Schaefer et al. 2010; Klimov et al. 2018), and which in our trees is only represented by members of the family Lomanniidae. Overall, we consider that the use of the 392 Ma calibration for constraining the stem node of Sarcoptiformes (i.e., the node subtending the stem of Enarthronotides) is a more conservative and robust option to the goal of estimating minimum ages of origin for Sarcoptiformes, even if it may lead to a slight underestimate of the time of origin of the order.
Most lineages sampled for mitogenomes to date are of medical or agricultural importance, with very limited sampling from the soil (Xue et al. 2017). These lineages are generally younger in origin, particularly within the widely sampled Parasitiformes, whose crown group was placed in the late Carboniferous just over 300 Ma. The much increased inclusion of soil fauna phylodiversity (labelled in red letters in fig. 1) adds critical early branches in both the Trombidiformes and Sarcoptiformes, and reveals approximately a dozen lineages of Acariformes to have originated prior to the end of the Devonian and 19 lineages dated to before 250 Ma, that is, originating in the Paleozoic. Equally, the phylogenetic diversity within individual families in almost all cases dates back to the Jurassic or even the Triassic. Our time-calibrated tree places the diversification of major extant soil lineages to the earliest existence of terrestrial ecosystems, and so also supports the role of soil mites in early soil communities. Our conclusions do not strictly require that association with the soil habitat was the ancestral condition, but a simplified ancestral-state reconstruction of the habitat type performed across the tree on major taxonomic groups of Acariformes and Parasitiformes, also supports the antiquity of the association of mite lineages with soil-related habitats (see supplementary table S6 and fig. S17, Supplementary Material online for details). The preponderance of soil-inhabiting taxa representing early-branching lineages gives weight to the hypothesis that mites, possibly together with springtails, were among the few macroscopic animal groups inhabiting the soil matrix during the Ordovician-Silurian, and underwent the process of land colonization in synchrony with early terrestrial plants (Lenton et al. 2016). Other mesofauna of modern soil ecosystems such as the nematodes colonized the land much later (Rota-Stabelli et al. 2013), whereas some early land lineages such as myriapods and other arachnids were mostly living on the soil surface. Thus, the early diversification of major soil mite lineages with various functional roles, including particulate-feeding scavengers, fungivores and microbiovores, and eventually predators, is in agreement with a pioneering role of mites in the evolution of early terrestrial ecosystems.
MMG and Further Advance in the Tree-of-Life for the Mites
The 106 newly assembled mitochondrial genomes markedly extend the available phylogenetic coverage and now include representatives of seven of the nine supercohorts of Acariformes and three of the four orders within Parasitiformes. It is common practice in phylogenetic analysis of basal relationships to target specific exemplars of the focal group, which typically involves broad geographic and ecological sampling, or the use of museum collections, to achieve a comprehensive representation of major evolutionary lineages. This taxonomy-based approach may be fraught logistically and may lead to taxonomic bias, in particular in small-bodied and poorly characterized groups, while ignoring potentially valuable nontarget taxa encountered during field collecting. Here instead we maximized the taxon sampling by gathering the full diversity of lineages from a small study area and applying a flotation method (Arribas et al. 2016) for extraction of a maximum number of species from the soil matrix. This “site-based” approach makes full use of all specimens obtained, in this case from ∼100 kg of soil in a seminatural landscape (12 km maximum distance) of forest and grassland, which also produced many thousand further specimens that are being sequenced with PCR-based bulk approaches (community metabarcoding). If applied more widely, the site-based taxon selection in phylogenetic studies may overcome the current undersampling of soil-dwelling lineages of Acariformes and the bias toward free-living and parasitic forms in public sequence databases. If applied to further sites and biogeographic regions, MMG can be a synergistic and cost-effective way for assembling the Tree-of-Life in hyperdiverse but elusive taxa (Crampton-Platt et al. 2016; Giribet 2016).
Site-based sampling will inevitably miss major lineages distributed elsewhere or not sampled by chance. An obvious group missing from our analysis are the members of Endeostigmata, which are common in soils everywhere but extremely small in body size, and their absence here is most likely explained by failure at the sequencing stage. Very small individuals frequently do not produce sufficient DNA for the construction of shotgun sequencing libraries, and this problem is alleviated here by pooling of DNA extracts from multiple specimens. The approach is most efficient if all species in the mixture are present in roughly equimolar proportions, which is possible if DNA extractions are performed on individuals, as was the case here. However, very small specimens (producing <1 ng of total genomic DNA) may still be underrepresented in the metagenomic libraries.
Bulk sequencing of multiple soil mite lineages from landscape scale sampling improved the topology and support of the tree of Acari and greatly extends the available taxon and gene sampling. The study revealed the great diversity of ancient lineages at the local scale and illustrates the potential of metagenomics to overcome limitations for phylogenetic and evolutionary analysis of small-bodied metazoans. The generally well-supported phylogenomic tree demonstrates that mitochondrial genomes are suitable for inferring the evolutionary history of this ancient group. Further phylogenetic efforts can now build on the current MMG data set and methodology by including key lineages not represented, in particular: the presumed early-branching Endeostigmata; the Heterostigmata, the largest radiation of Trombidiformes with various lineages common in the soil; representatives of groups whose monophyly is questionable, such as Eupodina; or undersampled groups, for example, various deep lineages of Sarcoptiformes. This first application of MMG to analyze site-based soil phylodiversity provides a framework that if applied to additional sites and biogeographic regions, could be a cost-effective way toward a more complete Tree of the mites or other hyperdiverse but elusive taxa. This source of phylogenetic information also is crucial to be incorporated in community level studies (e.g., the phylogenetic placement of metabarcodes) if we are to understand patterns and processes driving soil biodiversity from a phylogenetic and functional perspective.
Materials and Methods
Specimen Sampling, DNA Extraction, and Sequencing
Specimens (see supplementary table S1, Supplementary Material online for details on collecting locations) for each sample were obtained from the superficial (1 m2 to 5 cm depth) and deep soil (40 cm diameter core to 35 cm depth) layers. Specimens were extracted from the soil sample following the flotation-Berlese-flotation protocol (Arribas et al. 2016) that allows for the extraction of “clean” minute arthropods from large volumes of soil. Selected specimens were subjected to nondestructive individual DNA extractions and morphologically identified at least to family level (supplementary table S1, Supplementary Material online). DNA extractions were performed with the BioSprint 96 DNA Kit and a BioSprint 96 Workstation (Qiagen). To generate “bait” sequences for associating specimens with the assembled mitochondrial genomes, the 5′ end of the cox1 gene (barcode fragment) was PCR amplified (see Arribas et al. 2016; supplementary table S7, Supplementary Material online for primer sequences and PCR conditions) and Sanger sequenced with ABI technology.
The MMG approach of Crampton-Platt et al. (2015) was followed for the assembly of mitochondrial genomes from shotgun sequences of equimolar DNA mixtures of specimens. Genome assembly from shotgun reads favors the contig formation from the high-copy mitochondrial fraction and produces complete or partial mitochondrial genome assemblies for individual species in the sample. The approach contains different steps for the assembly, identification, verification and curation of the mitogenomes (see below) that allow to validate the reliability and homology of the data (Gillett et al. 2014; Andújar et al. 2015; Gómez-Rodríguez et al. 2015). For metagenomic sequencing, the concentration of each specimen DNA extract was measured using the Qubit dsDNA HS Assay Kit (Invitrogen). Aliquots of ≈1.5 ng of DNA (for many of these tiny specimens this was almost all extracted DNA) were pooled to prepare a single TruSeq Nano library for shotgun sequencing on an MiSeq sequencer (2 ×300 bp paired-end reads) using 120% of a flow cell (Illumina).
Assembly, Annotation, and Alignment of Acari Mitochondrial Genomes
Raw reads were processed following a bioinformatic pipeline for assembly and further validation of obtained mitogenomes (Andújar et al. 2015; Crampton-Platt et al. 2015). Raw read quality was assessed with FastQC v0.10.1, low quality reads were discarded with Prinseq v0.19.2 (Schmieder and Edwards 2011) and Illumina adapter remnants were trimmed with Trimmomatic v0.30 (Bolger et al. 2014). Remaining reads were filtered for putative Arthropoda mitochondrial reads against a reference database of all complete Arthropoda mitochondrial genomes from NCBI (November 2016) using BLAST (Altschul et al. 1990) (e-value =10−5). Putative mitochondrial reads were assembled using three independent assemblers: Celera Assembler v7.0 (Myers 2000), IDBA-UD v1.1.1 (Peng et al. 2012) and Newbler (Miller, Koren, et al. 2010). Independently assembled contigs were further validated by comparing among assemblers and only identical or highly similar contigs produced by at least two of the three different assemblers were merged with the “De Novo Assembly” function in Geneious v7.1.9 (minimum overlap =1,000 bp; maximum mismatches per read = 1%; maximum gaps per read = 1%). The resulting mitochondrial genomes were identified by BLAST alignment to the Sanger “bait” cox1 sequences and annotated using MITOS (Bernt et al. 2013) followed by manual refinement. Protein-coding and ribosomal gene sequences were extracted gene by gene from these mitochondrial genomes and those downloaded from the NCBI (November 2016) (see supplementary table S1, Supplementary Material online for accession numbers). Each gene extraction was independently aligned with MAFTT v6.240 (Katoh and Toh 2008), and the reading frame was verified across all protein coding gene alignments. The concatenation of the alignments for the 15 mitochondrial genes constituted the matrix subsequently used for phylogenetic analyses.
Mitochondrial Phylogenomics of Acari
Three different data sets were generated from the 15-gene matrix: 1) the DNAall data set including the DNA sequences for all ribosomal and protein coding genes; 2) the DNArecoded data set including the DNA sequences for complete ribosomal genes and modified protein coding genes: third codon positions excluded and first position recoded as R (for purines A or G) and Y (for pyrimidines T or C); and 3) the AA data set including amino acid sequences for the protein coding genes, with ribosomal genes excluded. These three data sets were used for Bayesian inference with PhyloBayes (Lartillot and Philippe 2004) and ML in RAxML v8.2 (Stamatakis 2014) and IQ-TREE (Nguyen et al. 2015). For PhyloBayes analyses 2 independent chains were run for a minimum of 25,000 generations under a GTR-CAT model for each data set, and a consensus tree was obtained combining trees from both chains after discarding the first 10,000 generations as a burn-in fraction and retaining 1 tree at every 10 generations (bpcomp options -x 10000 10 –c 0).
RAxML analyses for each data set were computed on the CIPRES portal (Miller, Pfeiffer, et al. 2010) using 100 alternative runs to select the best tree and generating 1,000 bootstrap replicates for node support. Analyses were performed using the best partition scheme and corresponding best models of substitution for each data set (considering only those supported by RAxML for nucleotide [i.e., GTR] and amino acid sequences [GTR with different transition matrixes]) as determined with Partition Finder (Lanfear et al. 2017), using a partition scheme by gene and by codon position as input for the DNA data sets and a partition by gene for the AA data set (see supplementary table S2, Supplementary Material online for details). IQ-TREE analyses were performed on the three data sets using the best partition scheme and models of substitution selected by Partition Finder, as before, but using all available models (supplementary table S2, Supplementary Material online). Node support was calculated with 250 standard bootstrap replicates.
Dating Acari Origin and Diversification
Dating was performed using BEAST v.1.8.4 (Drummond et al. 2012). Analyses were done using the DNAall data set and applying the best-fitting substitution model and partition scheme as estimated in Partition Finder (Lanfear et al. 2017) for all available models (supplementary table S2, Supplementary Material online) and using the data set partitioned by gene and codon as input. For the molecular clock settings, the data set was partitioned in three partitions (protein coding genes, rrnL and rrnS) applying an uncorrelated lognormal clock to each partition. Due to the high complexity and time required for calibration analyses on our data set, together with the fact that preliminary calibration analyses showed long-branch attraction problems (that were not confounding the tree searches performed before with other methods), we opted for fixing the topology in the calibration analyses to the optimal topology obtained for the AA data set with PhyloBayes, a software particularly robust under high rate heterogeneity scenarios. Note that this is a well-established and common practice in phylogenetic inference using complex data sets (e.g., Brown and Yang 2010; Pons et al. 2010) for which the coestimation approach usually results in poor tree topologies. A birth-death speciation prior was applied and analyses were run twice in parallel with 100 million generations sampling 1 tree every 10,000 generations. Consensus trees were estimated with TREEANNOTATOR (Drummond et al. 2012) combining both runs and discarding the 50% initial trees as burn-in after checking the ESS of the tree likelihood and ensuring that values had reached a plateau in TRACER v.1.6 (http://beast.bio.ed.ac.uk/Tracer; last accessed July 2017). Posterior probabilities were considered as a measure of node support. Calibration analyses were computed on CIPRES (Miller, Pfeiffer, et al. 2010).
The oldest fossil age in each taxon was used as hard minimum bound (using uniform distributions with uninformative maximum ages) for the age of the stem node of the corresponding lineage (calibration point in the phylogeny). Note that the use of uniform distributions with uninformative maximum ages avoids giving an implicit maximum age constraint not supported by fossil data at every calibration point, which happens when log-normal or other functions are used to constrain node ages (Ho and Phillips 2009; Donoghue and Yang 2016). Among all the available oldest fossils from the comprehensive compilation of Dunlop and Selden (2009) we performed the analyses for three nested sets of calibration points including an increased number of fossils but also resulting in increased uncertainty of the phylogenetic placement of these fossils in our trees (see supplementary table S4, Supplementary Material online). A first analysis was performed using the most conservative set of calibration points for our trees (14min set) including constraints for Acariformes, Sarcoptiformes, and those families (with more than one representative) found to be monophyletic and well supported, that is, Damaeidae, Erythraeidae, Galumnidae, Gymnodamaeidae, Hermanniellidae, Ixodidae, Oppiidae, Parasitidae, Phenopelopidae, Phthiracaridae, Scheloribatidae, and Tetranychidae (see supplementary table S4, Supplementary Material online for details). A second analysis used the 14 previous plus 6 additional calibration points (20min set) on families with a single representative in our trees (i.e., Cepheidae, Camisiidae, Bdellidae, Anystidae, Ceratozetidae, and Scutoverticidae). Finally, a third analysis was performed including all the previous calibration points and two additional constraints (22min set) to integrate some of the oldest known fossils from families that are not represented in our trees (and that previously were not used to constrain the minimum age at the stem node of the complete Sarcoptiformes [i.e., 392 Ma]). These are Hypochthoniidae (326 Ma), Cosmochthoniidae (326 Ma), and Protoplophoridae (326 Ma) constraining the minimum age for stem node of Enarthronotides, and Hydrozetidae (197 Ma), Trhypochthoniidae (145 Ma), and Cymbaereidae (145 Ma) for which the calibration point in our phylogeny coincided with the node of the split between Nothrina and Brachypylina (stem node of both Nothrina and Brachypylina, respectively) that was constrained to the age of the oldest available fossil for Hydrozetidae (197 Ma).
The use of uniform distributions with uninformative maximum age constraints avoids including an important source of uncertainty that could strongly impact the results (Ho and Phillips 2009; Donoghue and Yang 2016). This way of calibration is, however, very sensitive to take arbitrarily high ages while the estimated substitution rate is driven toward very low values, due to the absence of any upper limit to divergence times (Hug and Roger 2007; Ho and Phillips 2009). For this reason, we also conducted the time calibration analysis adding two maximum age constraints applied to the base for the Acariformes and Parasitiformes lineages. Although to establish maximum age constraints on clade ages inevitably involves some arbitrariness, the statistical analysis of the fossil series can provide very useful information to define hypothetical maximum ages to be further used in calibration analyses (Holland 2016). Therefore, we performed calibration analyses for each set of minimum hard bounds also constraining the maximum ages of Acariformes and Parasitiformes crown nodes (2max set; supplementary table S4, Supplementary Material online) defined by the 99% confidence interval age for the first fossil appearance, as estimated using the oldest gap approach (Solow 2003) on the available fossil series at Fossilworks database (Behrensmeyer and Turner 2013) (July 2017) and Dunlop and Selden (2009) (see supplementary table S5, Supplementary Material online for used fossils) and the R function CladeAge (Claramunt and Cracraft 2015).
Ancestral Reconstruction of Mitochondrial Gene Rearrangements
To infer the evolutionary pathways of mitochondrial gene rearrangements, each of the 204 Acari and outgroups mitogenomes were coded as different mitotypes according to the gene order (excluding tRNA genes). The much higher observed number of translocations of tRNA genes was ignored as they may mask the phylogenetic signal of rare changes in protein coding and rRNA genes (Xue et al. 2016). Using this information associated to each terminal we reconstructed the ancestral mitotypes in BEAST 1.8.4, which simultaneously estimated branch lengths and assigned trait states to all its nodes (Drummond et al. 2012). Ancestral mitotype reconstruction was performed using an asymmetric trait evolution model (unordered states) with other parameters and inputs as in the calibration analyses above.
We also used the program TreeREx 1.85 (Bernt et al. 2008) for parsimony inference of gene rearrangement events (including reversals, transpositions, reverse transpositions, and tandem duplication with random loss of genes within the mitochondrial genome) on an existing tree. A matrix was generated containing the specific gene order for each mitogenome (tRNAs were again omitted). In this case only the 167 taxa with complete mitogenomes were used because the software requires that analyzed mitogenomes include an identical set of genes. Using the phylogenetic tree obtained in PhyloBayes for the AA data set, the most parsimonious transformational pathway was computed using common intervals mapped along the branches with TreeREx, which also inferred the putative gene order at the internal nodes and its reliability. The TreeREx analysis was performed with default settings, as suggested at the website (http://pacosy.informatik.uni-leipzig.de/185-0-TreeREx.html; last accessed November 2017).
Data sets, matrices, and phylogenetic trees available from Dryad Digital Repository: [https://doi.org/10.5061/dryad.ghx3ffbjb]. GenBank Accessions numbers are given in supplementary table S1, Supplementary Material online.
Acknowledgments
This research was funded by the NHM Biodiversity Initiative and the CGL2015-66192-R and CGL2015-74178-JIN projects. PA was supported by the Royal Society UK (Newton International Program) and the Spanish Ministry of Economy and Competitiveness (Juan de la Cierva Formación); BL by the NHM Biodiversity Initiative and French Labex Agro-CeMEB-NUMEV. Thanks are due to the regional government of Andalucía and the staff of the Sierra de Grazalema Natural Park; to P. Foster (NHM) and J. Arribas for their technical assistance and to four anonymous reviewers and the editor for their comments on the manuscript.