ddRAD Sequencing Sheds Light on Low Interspecific and High Intraspecific mtDNA Divergences in Two Groups of Caddisflies


 Large-scale global efforts on DNA barcoding have repeatedly revealed unexpected patterns of variability in mtDNA, including deep intraspecific divergences and haplotype sharing between species. Understanding the evolutionary causes behind these patterns calls for insights from the nuclear genome. While building a near-complete DNA barcode library of Finnish caddisflies, a case of barcode-sharing and some cases of deep intraspecific divergences were observed. In this study, the Apatania zonella (Zetterstedt, 1840) group and three Limnephilus Leach, 1815 species were studied using double digest RAD sequencing (ddRAD-seq), morphology, and DNA barcoding. The results support the present species boundaries in the A. zonella group species. A morphologically distinct but mitogenetically nondistinct taxon related to parthenogenetic Apatania hispida (Forsslund, 1930) got only weak support for its validity as a distinct species. The morphology and genomic-scale data do not indicate cryptic diversity in any of the three Limnephilus species despite the observed deep intraspecific divergences in DNA barcodes. This demonstrates that polymorphism in mtDNA may not reflect cryptic diversity, but mitonuclear discordance due to other evolutionary causes.

The systematic gathering of DNA barcodes, i.e., the mitochondrial cytochrome oxidase (mtCOI) sequences (Hebert et al. 2003) during the last decade demonstrated that DNA barcodes can successfully be used to identify species of caddisflies (Trichoptera) (Zhou et al. 2016). In connection with the international efforts to build a globalwide reference library for caddisflies (Zhou et al. 2016), we built the mtCOI library for 214 out of the 218 Finnish species (dx.doi. org/10.5883/DS-TRIFI200). This national-wide library has been used to clarify some known taxonomic issues (Salokannel et al. 2010(Salokannel et al. , 2012 and, especially, to identify unknown larvae and verify poorly characterized early larval instars to build a comprehensive larval key (Rinne and Wiberg-Larsen 2017). To assess the levels of intraspecific variation and to identify possible undetected cryptic species, we attempted to sequence multiple specimens per species, ideally from different locations.
Various unusual patterns observed in mtDNA, including deep intraspecific splits and barcode sharing between species, may result from different evolutionary phenomena such as introgression, retained ancestral polymorphism and incomplete lineage sorting, but may also reflect taxonomic inaccuracies, such as undetected cryptic diversity or oversplitting of species, or even sequencing or alignment errors or nuclear mitochondrial pseudogenes (Funk andOmland 2003, Mutanen et al. 2016). By providing multiple loci across the entire genome, genome-wide approaches have great potential to elucidate the underlying causes behind such mitochondrial anomalies. Reduced-representation sequencing (RRS) has become very popular in the last few years among scientists studying the genetic variation of nonmodel organisms (Andrews et al. 2016), with special emphasis on digestion-based techniques, commonly referred to as RAD (restriction-site associated DNA). In this study, we took of one of the most popular RRS methods, double digest RAD (ddRAD) sequencing, which enables rapid recovery of thousands of orthologous loci from samples with or without a reference genome (Peterson et al. 2012). While entire genomes have recently been made available for some caddisfly species (Luo et al. 2018, Heckenhauer et al. 2019, Olsen et al. 2021, to our knowledge, no published studies have focused on species-level taxonomy of caddisflies based on high-throughput sequencing techniques. We focused on two groups of caddisflies, both showing patterns in DNA barcodes incongruent with the current taxonomy. The genus Apatania Kolenati, 1848 (Apatanidae) consists of 13 valid species in Fennoscandia (Salokannel and Mattila 2018). Their larvae inhabit mostly cool running waters and feed primarily on diatoms, although some consume algae (Rinne and Wiberg-Larsen 2017). Parthenogenesis is common within the genus, including the Apatania zonella (Zetterstedt, 1840) group, in which only females are known for some species, whereas males are rare or unknown in some others (Salokannel and Mattila 2018). Although DNA barcodes and morphological characters mostly agree with the current delimitation of species in the Finnish material, unsolved taxonomic issues within the A. zonella group remain (Salokannel et al. 2010, Lecaudey 2013, Pálsson et al. 2016). The valid Finnish species within the group are A. zonella, A. dalecarlica (Forsslund, 1942), A. auricula (Forsslund, 1930), A. forsslundi Tobias, 1981. While identification of these species is usually straightforward, specimens with intermediate or otherwise unclear morphology are found in alpine habitats in the North Finland. A regularly collected, but unclear taxon 'A. nr. hispida' has a DNA barcode identical with A. hispida but differs by some characters of genital segments (Salokannel et al. 2010). Two more species of the A. zonella group, A. scandinavica Svensson &Tjeder, 1975, andA. majuscula McLachlan, 1872 are reported from Fennoscandia but were not included in the analysis because no material was available.
Limnephilus Leach, 1815 (Limnephilidae) is the most speciesrich genus of caddisflies in the northern Europe; it consists of 39 species in Finland. Their larvae are mainly herbivorous, feeding on algae, decayed fine (FPOM), and coarse (CPOM) plant material (Rinne and Wiberg-Larsen 2017). Limnephilus larvae inhabit various waters, including small and temporary waters with the help of their well-developed multifilament gills (Rinne and Wiberg-Larsen 2017). Despite the relatively high number of species, the taxonomy of Finnish Limnephilus has been stable during the recent decades. However, the DNA barcodes of Limnephilus flavicornis (Fabricius, 1787), L. sericeus (Say, 1824) and L. centralis Curtis, 1834 each appeared to split into two distinct clusters (Salokannel and Mattila 2018).
In this study, we examined both the Apatania zonella group and the three Limnephilus species supplemented with reference specimens of a fourth species, L. marmoratus Curtis, 1834 using ddRAD. We tested if genomic-scale data support the existing notion of the relationships of taxa within the A. zonella group and clarifies the status of A. nr. hispida as well as the identity of a selected morphologically unidentified specimen. We also tested if genome-wide data of nuclear loci recover the same pattern as the DNA barcode clustering of the three Limnephilus species, and particularly if the intraspecific deep divergences observed in mtCOI gene in three Limnephilus species are reflected in the nuclear genomes.

Materials and Methods
Along with the national DNA barcoding activities, altogether 1,133 specimens of 214 species were sequenced for the standard DNA barcoding fragment of the mitochondrial COI gene. Full collection and taxonomic data as well as photographs for most specimens are publicly available through the BOLD dataset DS-TRIFI200 accessible at dx.doi.org/10.5883/DS-TRIFI200. In short, for most cases all steps prior to DNA extraction, i.e., sample collection, preparation, tissue sampling, photography, and data entry to the BOLD database, were conducted in Finland by the authors and following guidelines of the BOLD Handbook (accessed at http://v4.boldsystems.org/ libhtml_v3/static/BOLD4_Documentation_Draft1.pdf). DNA extraction, PCR, sequencing, sequence alignment and sequence upload to the BOLD database were conducted at the Canadian Centre for DNA Barcoding following the protocols of Hebert et al. (2003) and DeWaard et al. (2008). Partly, DNA sequencing was conducted at the University of Turku as explained in Salokannel et al. (2010). These data had previously revealed cases of deep intraspecific divergences in three species of Limnephilus, which were chosen as target species for this study. Similarly, DNA barcoding did not support the status of morphologically identifiable Apatania nr. hispida as a distinct species, leaving its taxonomic status open.
The initial identifications of the samples were based on specimen morphology. The adult specimens were identified using their genital characters, as defined in the European atlas (Malicky 2004) and keyed by Salokannel et al. (2010). In the morphological examination of Limnephilus sericeus specimens, the characters of North American L. sericeus species group (Ruiter 1995) were also studied: primarily the relative length of the intermediate and superior appendages in the males and the connection of the appendages of 10th segment in females (Ruiter 1995). Identification of the larvae followed the characters given in Trichoptera Larvae of Finland (Rinne and Wiberg-Larsen 2017).

Molecular Analyses and Bioinformatics
DNA extractions were performed using the DNeasy Blood & Tissue Kit (Qiagen) according to the manufacturer's instruction. The quantity of genomic DNA extracts was checked using the Quant-iT PicoGreen dsDNA Assay Kit (Molecular Probes). The ddRAD library construction followed protocols described in Lee et al. (2018) with an exception: the size distribution was measured with Bioanalyzer (Agilent Technologies). Raw demultiplexed reads are archived in the NCBI SRA: PRJNA631141.
Raw paired-end reads were de-multiplexed with no mismatches tolerated using their unique barcode and adapter sequences using ipyrad v.0.7.23 (Eaton and Overcast 2020). All ipyrad defaults were used, with the following exceptions: the minimum depth at which majority rule base calls are made was set to 3, the cluster threshold was set to 0.85, and the minimum number of samples with data for a given locus to be retained in the final data set was set to 3 and 4. We also compiled a dataset of biallelic, unlinked SNPs by extracting a single SNP from each locus. The unlinked SNPs generated from the ipyrad datasets were analyzed using STRUCTURE analysis.

Data Analyses
Sequences were examined for fixed substitutions between species based on K2P distances in sequence alignments using MEGA v.7.0 (Kumar et al. 2016). The proportion of missing data was calculated using Mesquite (Maddison and Maddison 2017). Phylogenetic trees were constructed for both mtCOI and ddRAD data using the IQ-TREE web server (http://iqtree.cibiv. univie.ac.at; Trifinopoulos et al. 2016). We used the IQ-TREE integrated ModelFinder (Kalyaanamoorthy et al. 2017) with 286 DNA substitution models, which selected 'GTR+G+I' as the best-fit model according to the Bayesian information criterion for mtCOI data, 'TVM+F+I' for Apatania ddRAD data, and 'TN+F+R5' for Limnephilus. To reconstruct the phylogenetic tree, ML analysis with ultrafast bootstrap approximation model (1,000 replicates) was applied (Minh et al. 2013). In addition, a coalescent SVDquartets analysis (Chifman and Kubatko 2014) was conducted in PAUP 4.0a169 (https://paup.phylosolutions.com/; last accessed 2 April 2021) on the SNP data. We used default settings to yield a species tree and ran 1,000 bootstrap replicates. The trees were generated using FigTree v.1.4.2 (Rambaut 2015) and modified using Adobe Illustrator CS6.
We inferred population clustering with admixture from unlinked SNPs data to visualize genomic variation between individuals in STRUCTURE v.2.3.1 (Pritchard et al. 2000). Ten replicates were run at each value of K between 1 and 7 for Apatania group and K = 1 to 4 for Limnephilus group. Each run had a burn-in of 10K generations followed by 20K generations of sampling. We used StrAuto to automate Structure processing of samples (Chhatre and Emerson 2017). Replicates were permuted in CLUMPP (Jakobsson and Rosenberg 2007) according to the ad hoc ∆K statistics (Evanno et al. 2005), which is the second-order rate of change of the likelihood function. Structure results were visualized using DISTRUCT (Rosenberg 2004).

Apatania zonella Group
Species recognition using the morphological characters was in line with the key provided by Salokannel et al. (2010), except for the specimen JSlk-2016R003A. Apatania zonella group specimens included in the analysis represent the five valid Finnish species and A. nr. hispida. The specimen JSlk-2016R003A has a ventrally bluntrounded supragenital plate similar to A. zonella, but the upper part of the last segment is protruded, making it laterally similar to A. nr. hispida. Such combination of characters is not clearly associated with any valid taxon, so the sample was named as Apatania sp.
Sequence variation of mtCOI within species was close to zero (0-0.15%) except in A. zonella with 1.16% variation. A. hispida and A. nr. hispida samples have identical DNA barcode. Apatania sp.
(JSlk-2016R003A) has the closest match (0.15%) to the specimens of A. forsslundi. Otherwise, the inter-species variation between the valid species ranges from 1.05 to 2.31% (Table 2).
We generated a genome-wide SNP dataset from 18 individuals of the A. zonella group using ddRAD sequencing. We obtained 3.2 million reads per individual on average, of which 3 million reads per individual (93.3%) were retrieved after quality filtering step  (Supp Table 1 [online only]). After clustering at 85% sequence similarity, 26,790 putative orthologous loci shared across more than four samples were retained, for a total length of 2,158,568 bp (Table 3). These data included 59,486 SNPs, of which 38,366 are parsimony informative. The proportion of missing data was 71.4% across all loci.
In the ML tree based on mtCOI sequences, most species in the A. zonella group did not clearly separate as distinct clades, except for A. auricula ( Fig. 2A). Apatania nr. hispida completely intermixed with A. hispida, while A. zonella diverged into two lineages and was found closely related to A. dalecarlica. In the ddRAD ML tree, the lineages that correspond to A. hispida, A. auricula, and A. dalecarlica were strongly supported as monophyletic (BS 100%) (Fig. 3A). Apatania nr. hispida appeared paraphyletic with respect to its putative sister taxon A. hispida. The unidentified sample, Apatania sp., was closely related to A. zonella. The species tree estimation analyses recovered a rather similar topology to the ML tree, but A. auricula was found sister to A. zonella and Apatania sp. (Supp Fig. 1 [online only]).
The population clustering analyses revealed substantial heterogeneity in proportions of admixed ancestry between the species. The best supported model (K = 6) clustered populations into six major clades ( Fig. 3A; Supp Fig. 2 [online only]). The STRUCTURE analysis revealed that much of common ancestry of Apatania nr. hispida is shared through apparent admixture with the closely related A. hispida and A. auricula (Fig. 3A). However, D-statistics estimated insignificant levels of gene flow between A. hispida and A. auricula, but significant only between A. hispida and A. nr. hispida (Supp Table 3 [online only]). Apatania sp. also shared significant common ancestry with A. zonella and A. auricula, and did not form a distinct cluster even when the number of assumed genetic clusters was increased to 7.
The morphological characters were well in line with the literature and we found no clear indications of cryptic species. Only minor differences in the shapes and setae arrangement of genital segments and their appendages were detected, whereas species-level differences between North European Limnephilids are regularly clearly more significant (Malicky 2004). Also, when examining L. sericeus adults, we found no characters of closely related North American L. sericeus group of species (Ruiter 1995). The samples of each three Limnephilus species were split into two DNA barcode clusters: (i) Limnephilus sericeus with mean Kimura-2 parameter (K2P) divergence of 6.8% between the clusters, and mean intra-cluster K2P divergence of 0.3%; (ii) Limnephilus centralis with mean K2P divergence of 8.8% between the clusters and mean intra-cluster K2P divergence of 0.1%; and (iii) Limnephilus flavicornis with mean K2P divergence of 10.1% between the clusters, and mean intra-cluster K2P divergence of 0.1%. One of L. flavicornis clusters is closer (mean K2P divergence 2.2%) to the DNA barcodes of a related species L. marmoratus than the second cluster of L. flavicornis.   For ddRAD data of Limnephilus from 23 individuals, an average of 2.3 million raw reads was obtained (Supp Table 1 [online only]). We recovered a total of ca. 3.4 million base pairs with 20,003 loci and 81,701 SNPs, with average cluster depth of 207 (Table 3; Supp  Table 1 [online only]). The proportion of missing data was 72.7% across all loci.
In the mtCOI ML tree, three Limnephilus species (L. centralis, L. sericeus, and L. flavicornis) showed a deep intraspecific split. Two samples of L. marmoratus were sister and closely related to one of the L. flavicornis clades (Fig. 2B). In the ddRAD ML tree, all four species of Limnephilus were fully supported as monophyletic clades (Fig. 3B). The phylogenetic tree analyses supported L. flavicornis as a sister species to L. marmoratus.
The STRUCTURE identified four clusters (Fig. 3B), which are perfectly compatible with the ddRAD ML tree. Some signs of admixture were observed between a few heterospecific samples. Interestingly, two samples (ARin-2014F140 and ARin-2014F141) of L. sericeus showed some admixture with both L. flavicornis and L. marmoratus. Tests of admixture using D-statistics confirmed the significant levels of gene flow between these two species (Supp Table 3 [online only]).

Discussion
Our study is among the first to provide a nuclear genomic insight into the deviant patterns of intra-or interspecific variability observed in the standard DNA barcode fragment of the mitochondrial genome.
Patterns similar to what we observed in the focal caddisfly species are, however, not rare, as numerous DNA barcode data release papers report cases of deep intraspecific divergences and barcode sharing (e.g., Dincă et al. 2011;Hausmann et al. 2011;Huemer et al. 2014;Pentinsaari et al. 2014;Zahiri et al. 2014Zahiri et al. , 2017Saitoh et al. 2015;Astrin et al. 2016;Yang et al. 2016;Morinière et al. 2017). Janzen et al. (2017) demonstrated, based on nuclear genomic data, that presumably a single species of butterfly showing three distinct barcodes actually represents three distinct cryptic species. Some other studies have also focused on exploring evolutionary causes of high levels of mitochondrial DNA variability within a species (Garg et al. 2016, Weiss et al. 2018, Dincă et al. 2019b, Hinojosa et al. 2019, Marchán et al. 2020 or its reversal, namely barcode sharing across species, but vast majority of detected cases of high intra or low interspecific variability remain unstudied by genomic or other means. While morphology or few-loci approaches may be useful to reveal cryptic diversity, they may not be powerful enough to reveal all cryptic species. They also may not provide compelling evidence for the conspecificity when identical barcodes result from the taxonomic oversplitting of species. Genomics methods, ddRAD sequencing included, possess high potential to provide an accurate overview of evolutionary (e.g., introgression) and operational (e.g., taxonomic inaccuracy) causes behind the unexpected patterns of mtDNA variability.
In this study, the three flagged cases of deep intraspecific barcode splits are likely to be explained by biological rather than operational causes. In the one studied case of barcode sharing between two morphologically supported putative species, the evidence remained more ambiguous as we cannot exclude the possibility that they represent two, genetically very closely related but biologically distinct species. Next, we discuss the taxonomy of both species groups in more detail.

Apatania zonella Group
The present DNA barcoding results are similar to the study by Salokannel et al. (2010), even if these studies had only one common sample (ME071): (i) despite relatively small divergences, each valid species of A. zonella group has its own cluster and appear monophyletic; (ii) A. nr. hispida specimens have identical DNA barcode with A. hispida; and (iii) specimens identified as A. zonella show more diversity than the other taxa of the group ( Fig. 2A and Table 2). Also, the ddRAD-seq results are mainly in line with the mtCOI results, supporting the existing taxonomic notion of the presently valid species. The specimens of A. nr. hispida differentiate from A. hispida in the ddRAD-seq data, but not as significantly as the other analyzed species differentiate from each other. They also do not form reciprocally monophyletic entities. The combination of different morphology, identical mtCOI and slightly deviated ddRAD-seq suggests that A. nr. hispida and A. hispida are very closely related, but genetically distinct taxa. Because of parthenogenetic nature of these taxa (no males are known), their taxonomic status cannot be assessed under the biological species concept. A possibility remains that they represent relatively old, diverged asexual strains. Taxonomic delineation of asexual taxa is often difficult and inherently arbitrary. Considering that these two taxa appear as fully asexual, it is surprising that the D-statistics we conducted suggested introgression to have happened between them. Two possible scenarios might explain this situation. First, signs of introgression may originate time before they turned asexual. Our phylogeny suggests asexuality being a derived feature in this group. Second, A. hispida and A. nr. hispida might represent two independent shifts from sexual to asexual mode of reproduction, as has been demonstrated in a similar system in Dahlica moths (Elzinga et al. 2013). A question follows: what could this sexual 'stem' species then be? While not sampled here, an excellent candidate could be the sexual A. majuscula. Its DNA barcodes available in the BOLD systems (incl. our own data) reveals it being very closely related to the A. hispida complex, with barcodes being partially phylogenetically nested within the A. hispida-A. nr. hispida cluster. Its female genitalia also show close resemblance to those of A. hispida and A. nr. hispida. While we unfortunately did not include A. majuscula in our ddRAD sampling, we find the scenario of A. hispida and A. nr. hispida representing two asexual strains of A. majuscula appealing. Further studies are needed to address this question.
The unidentified specimen JSlk-2016R003A with exceptional morphology clusters with A. forsslundi in mtCOI, but ddRAD suggests admixture of between A. zonella and A. auricula. This is also biologically possible as both species, especially A. zonella, are known to produce males regularly in its Nordic populations. Altogether, the DNA results suggest hybridization between two or more species. Also, the hybridization may have occurred already in the past and the hybrid line might have survived, supported by parthenogenesis. Such hybrid lines could also explain other cases of morphologically intermediate or aberrant specimens that have caused issues for identifiers over the decades (Solem 1985, personal observations).

Limnephilus spp.
The three studied Limnephilus species seem to make an exception within the Finnish caddisfly fauna with distinct polymorphism in their DNA barcodes, with minimum Kimura-2 parameter divergences between the clusters varying from 6.8 to 10.1%. However, no clear morphological differences nor division of ddRAD sequences were detected in this study. A likely explanation for the genetic polymorphism in their mtDNA is historical admixture between closely related species, resulting in mitochondrial introgression and subsequent co-occurrence of two distinct lineages of COI within a species. Alternatively, this pattern could have resulted from retained historical genetic polymorphism. The former explanation would get indirect support if one of the clusters was observed being genetically closer to any other related species than to the other cluster. We explored this possibility in our own data, but also using all data accessible to us in the BOLD database.
One of the two mtCOI clusters of L. flavicornis is both genetically and phylogenetically closer to its morphological sister species marmoratus than the other intraspecific mtCOI cluster. This observation suggests introgression of mtCOI from L. marmoratus to L. flavicornis in the past, although the evidence for this scenario cannot be considered as compelling. The supposedly introgressed, yet then diverged COI haplotype is now widespread in L. flavicornis populations, as among our material (n = 18) both haplotypes seem common.
One of two mtCOI clusters of L. sericeus is almost identical (divergence <0.5%) with that of the public barcodes available (Zhou et al. 2016) for L. abbreviatus Banks, 1908 (Supp Table 2 [online only]), which is one of the Nearctic species of L. sericeus group (Ruiter 1995). This suggests a relatively recent introgression of mtCOI from L. abbreviatus to L. sericeus. However, it is surprising that the haplotype would have spread all the way from North America through Siberia to Finland. Such a spread could potentially have been promoted by endosymbiotic bacteria, Wolbachia in particular (Hurst andJiggins 2005, Smith et al. 2012), but we do not have evidence that being happened as we did not screen Wolbachia.
Neither of the mtCOI clusters of L. centralis is closely associated with other taxa with public DNA barcodes available (Zhou et al. 2016) or other data accessible to us in BOLD. The closest match (about 4%) of the second cluster is Central European L. italicus McLachlan, 1884 (Supp Table 2 [online only]), which is originally described as a variety of L. centralis (McLachlan 1884). The second mtCOI cluster may suggest introgression from L. italicus to L. centralis long in the past or an introgression from an unknown species to L. centralis. So far, the second mtCOI cluster specimens were found only from the continent of Finland, while the first cluster sequences appear widely in Europe (Supp Table 2

Conclusions
This study highlights the importance of DNA barcoding multiple specimens per species, preferably with good geographic coverage, to reveal the extent of intraspecific variability and possible genetic polymorphism in COI. We demonstrate that the genome-wide ddRAD sequencing provides a powerful method to uncover the cases with mitonuclear discordance and, although not detected here, undoubtedly also cryptic diversity. We did not obtain strong evidence to lift Apatania nr. hispida (=A. kaisilai nomen nudum) as a valid species, although this possibility remains. In case of three Limnephilus species with mitochondrial polymorphism, no evidence of cryptic diversity was obtained. Our observations suggest that the co-presence of two distinct mitochondrial lineages result from historical hybridization and introgression between species at least in two of the three studied cases.

Supplementary Data
Supplementary data are available at Insect Systematics and Diversity online.