Pervasive microRNA Duplication in Chelicerates: Insights from the Embryonic microRNA Repertoire of the Spider Parasteatoda tepidariorum

Abstract MicroRNAs are small (∼22 nt) noncoding RNAs that repress translation and therefore regulate the production of proteins from specific target mRNAs. microRNAs have been found to function in diverse aspects of gene regulation within animal development and many other processes. Among invertebrates, both conserved and novel, lineage specific, microRNAs have been extensively studied predominantly in holometabolous insects such as Drosophila melanogaster . However little is known about microRNA repertoires in other arthropod lineages such as the chelicerates. To understand the evolution of microRNAs in this poorly sampled subphylum, we characterized the microRNA repertoire expressed during embryogenesis of the common house spider Parasteatoda tepidariorum . We identified a total of 148 microRNAs in P. tepidariorum representing 66 families. Approximately half of these microRNA families are conserved in other metazoans, while the remainder are specific to this spider. Of the 35 conserved microRNAs families 15 had at least two copies in the P. tepidariorum genome. A BLAST-based approach revealed a similar pattern of duplication in other spiders and a scorpion, but not among other chelicerates and arthropods, with the exception of a horseshoe crab. Among the duplicated microRNAs we found examples of lineage-specific tandem duplications, and the duplication of entire microRNA clusters in three spiders, a scorpion, and in a horseshoe crab. Furthermore, we found that paralogs of many P. tepidariorum microRNA families exhibit arm switching, which suggests that duplication was often followed by sub- or neofunctionalization. Our work shows that understanding the evolution of microRNAs in the chelicerates has great potential to provide insights into the process of microRNA duplication and divergence and the evolution of animal development.


Introduction
MicroRNAs are a class of small noncoding RNAs that fine-tune gene expression by repressing translation of targeted mRNAs (Bartel 2004;Eichhorn et al. 2014). Perturbed microRNA function in metazoans has been shown to disrupt diverse developmental processes such as embryonic survival and viability (Chen et al. 2014), stem cell proliferation (Hatfield et al. 2005), metamorphosis in insects (Sokol et al. 2008; Ambros 2011), organ development (Chen et al. 2014), and vertebral number and identity (Wong et al. 2015). Moreover, variation in microRNA function has become increasingly associated with the evolution of cellular differentiation, morphological complexity, and even multicellularity (Heimberg et al. 2008;Peterson et al. 2009;Wheeler et al. 2009;Campo-Paysaa et al. 2011;Chen et al. 2012; Guerra-Assunçã o and Enright 2012; Arif et al. 2013;Fromm et al. 2013;Tarver et al. 2015).
The current understanding of canonical microRNA biogenesis in metazoans is that the microRNA primary transcript folds into a hairpin structure that is cleaved by the endonuclease Drosha forming the so-called precursor microRNA hairpin (pre-microRNA). Pre-microRNAs are then exported to the cytoplasm, where they are further cleaved by Dicer to produce a double stranded RNA molecule (Bartel 2004;Chendrimada et al. 2005). One of the two RNA strands is subsequently loaded into the RNA-Induced Silencing Complex (RISC) (Gong et al. 2014;Guennewig et al. 2014). The mature microRNA guides the RISC to specific mRNAs through complementary binding to regions in their 3 0 -UTRs. Once bound, the miRISC represses translation by destabilizing the mRNA, repressing binding of translation initiation factors, and can lead to the degradation of transcripts (Bartel 2004;Chendrimada et al. 2005;Kawamata et al. 2011;Juvvuna et al. 2012;Eichhorn et al. 2014: Fukaya et al. 2014. Normally, one of the two arms of a microRNA hairpin is preferentially incorporated into the RISC. However, some pre-microRNAs can generate two functional products, one from each arm, which normally target different mRNA repertoires (Marco, Macpherson, et al 2012;Gong et al. 2014;Guennewig et al. 2014). This mode of processing has been found to play a role in cell differentiation (Zhou et al. 2010) and disease progression (Pink et al. 2015). Moreover, for several microRNAs, it has been observed that differences have evolved between insects with respect to which arm of a conserved pre-microRNA is dominantly loaded into the RISC (Marco et al. 2010;Griffiths-Jones et al. 2011). This evolutionary change is likely to have altered the target repertoires of these microRNAs between species, which is predicted to lead to different functional consequences because the two mature products from the precursor microRNA have different sequences and are therefore expected bind to distinct recognition sites and different mRNA targets (Marco, Macpherson, et al. 2012).
The Chelicerata subphylum, which diverged from the Mandibulata approximately 500 MYA, and includes morphologically diverse animals such as spiders, scorpions, mites, ticks, and horseshoe crabs, could provide new insights into the evolution of microRNAs in arthropods (Dunlop 2010;Rota-Stabelli et al. 2011Sharma, Gupta, et al. 2014;Sharma, Kaluziak, et al. 2014;Schwager et al. 2015) (supplementary fig. S1, Supplementary Material online). However, to date, the characterization of microRNAs in chelicerates has been limited primarily to mites and ticks, representing a rather narrow sampling of this subphylum (Barrero et al. 2011;Rota-Stabelli et al. 2011;Zhou et al. 2013Liu et al. 2014Luo et al. 2015). Therefore, we chose to survey microRNAs in the common house spider Parasteatoda tepidariorum (formerly Achaearanea tepidariorum), which has emerged as the main model spider for evolutionary developmental biology (Hilbrant et al. 2012) and thus offers the potential for future functional investigations. Studies in this spider have already made important contributions to understanding the regulation and evolution of axis formation, segmentation, and nervous system development (McGregor et al. 2008, Hilbrant et al. 2012, Schwager et al. 2015. Moreover, sequencing of the transcriptome (Posnien et al. 2014) and genome of P. tepidariorum (i5k Consortium, unpublished data) now facilitates genomic approaches to understanding gene regulation in this spider in comparison to other chelicerates and other arthropods.
Surveying the evolution of microRNAs in spiders is of a particular interest because, intriguingly, there is evidence of extensive gene duplications in spider genomes (Schwager et al. 2007;Janssen et al. 2010Janssen et al. , 2015Clarke et al. 2015;Turetzek et al. 2015). Furthermore, duplication of coding genes has also been found in other chelicerate species such as scorpions and horseshoe crabs (Nossa et al. 2014;Kenny et al. 2016;Sharma et al. 2015). It is therefore important to estimate copy numbers of microRNA families within P. tepidariorum and other chelicerates to determine if the patterns of protein-coding and microRNA gene duplication are similar.
In this study, we have used deep sequencing and informatics approaches to characterize the embryonic microRNA repertoire of P. tepidariorum, and used this resource to identify novel microRNAs as well as to determine putative paralogy and orthology relationships of the microRNAs of this spider with those of other chelicerates. We found that conserved microRNA orthologues show a signature of duplication in P. tepidariorum. We also found this pattern of duplication in two other spiders and a scorpion but not in the other arachnids (mites and ticks) surveyed. Interestingly, the horseshoe crab Limulus polyphemus also contains many duplicated microRNAs, but based on previous studies of the chelicerate phylogeny, these expansions are possibly independent of those we find in arachnids (discussed in Schwager et al. 2016). Interestingly, the paralogous P. tepidariorum microRNAs exhibit divergence in arm usage suggesting that they may have diversified in target gene repertoire and function, perhaps reflecting specific roles in the regulation of spider development.

Materials and Methods
Parasteatoda tepidariorum Culture and RNA Sequencing Parasteatoda tepidariorum (from a strain collected in Gö ttingen, Germany) were raised at 25 C and fed on a diet of Drosophila and Gryllodes sigillatus. Total RNA was extracted from the embryos of ten healthy cocoons, corresponding to stages 1-10 of embryogenesis in this spider (Mittmann and Wolff 2012), using QIAzol (Qiagen) following the manufacturer's instructions. Small RNA (~15-30 nt) libraries were prepared with the Illumina TruSeq Õ Small RNA Sample Prep Kit according to the manufacturer's instructions, and library quality was assessed on the 2200 TapeStation instrument. Size selection between 18 and 30 bp was performed by excision of RNA from a 6% DNA PAGE gel, 1 mm (Invitrogen). Sequencing was performed on the Illumina HiSeq 2000 platform at the University of Manchester Genomic Technologies facility, and yielded a total of 177,789,607 reads (100 bp). The raw read data have been deposited in the Sequence Read Archive, with accession PRJEB13119.
Analysis of Small RNA Sequencing Data and Identification of P. tepidariorum microRNAs The raw read quality was confirmed using FastQC v0.11.2 (Andrews, 2010), after which adapter sequences were trimmed using the Cutadapt v1.4.1 tool (Martin 2011), and reads longer than 17 bp were retained. Reads were then mapped against the P. tepidariorum genome (GCA_000365465.1) using Bowtie v1.0.0 (Langmead et al. 2009) with the parameters -n 0 -m 5 -a -best -strata, to best discern among reads from paralogous microRNAs. All mapped reads were analyzed using miRDeep2 v0.0.5 (Friedlä nder et al. 2008) to predict microRNAs. To preliminarily identify orthologs, the predicted P. tepidariorum microRNAs hairpins were then compared against metazoan microRNA precursor sequences from miRBase v21 (Kozomara and Griffiths-Jones 2014) using BLAST v2.2.28+ (BLASTn; -word_size 4; -reward 2; -penalty -3; -evalue 0.01; -perc_identity 70) (Altschul et al. 1990). Predicted microRNAs that did not have significant sequence similarity to any other microRNA in miRBase were then manually inspected. Those that met the following criteria were discarded from the study: fewer than ten reads mapping to the locus; poorly defined Dicer processing sites, defined as less than 50% of reads for a given mature microRNA having the same five end; the predicted mature microRNA sequence had a BLAST hit to more than ten loci in the P. tepidariorum genome with two or fewer mismatches. Predicted paralogous microRNAs were aligned using Clustal-Omega (Sievers et al. 2011) and then manually inspected in RALEE v0.8 (Griffiths-Jones 2005).

Comparisons of Relative Arm Usage
To compare the relative arm usage of P. tepidariorum microRNAs to those of other species we used a similar approach to Marco et al. (2010). The average number of reads per experiment catalogued in miRBase for each microRNA arm for D. melanogaster and T. castaneum were obtained from miRBase v21 (Kozomara and Griffiths-Jones 2014). For the multicopy microRNA families in D. melanogaster and T. castaneum the average arm usage across the family was used. Arm usage of P. tepidariorum microRNAs with unique 5' and 3' mature sequences was also quantified. P. tepidariorum microRNAs with one unique mature sequence were also analyzed, with the caveat that the expression of the other arm was possibly over or under estimated. Note that arm usage of P. tepidariorum microRNAs that have nonunique sequences for both mature products could not be reliably assessed because reads could not be unambiguously mapped to one location.

Sequencing of Small RNAs and Prediction of microRNAs in P. tepidariorum
Small RNAs were sequenced to determine the repertoire of microRNAs expressed during P. tepidariorum embryogenesis. This yielded a total of 177,789,607 raw reads of which 159,258,789 (95.5%) processed reads were mapped to the genome of P. tepidariorum. The distribution of these mapped read sizes contain two peaks at 22 and 29 bp, indicating the presence of mature microRNAs and a presumptive pool of piRNAs, respectively ( fig. 1A) (Saito et al. 2006). From these mapped reads miRDeep2 analysis initially predicted 278 microRNAs, which constituted 14,942,552 (9.4%) of the total mapped reads. Of these predicted microRNAs, 130 had either less than 50% of reads that had the same 5' end, low read numbers, or were found repeatedly in the P. tepidariorum genome (see Materials and Methods). While these discarded candidates could potentially be functional microRNAs, they were removed to give a conservative final prediction of 148 microRNAs (supplementary files S1 and S2, Supplementary Material online) that are expressed during embryogenesis in P. tepidariorum.
We then used BLAST and metazoan pre-microRNAs from miRBase v21 (Kozomara and Griffiths-Jones 2014) to annotate orthologs of the 148 predicated microRNAs. This approach identified 101 conserved microRNAs, with similarity to at least one previously annotated metazoan microRNA in miRBase. The remaining 47 were therefore classified as "novel" P. tepidariorum microRNAs (fig. 1B) despite some having putative orthologs in other chelicerate species ( fig. 2). Note that we cannot rule out the possibility that our novel microRNA set includes some genes conserved in other species but diverged in sequence beyond our ability to recognize them as homologs, even using state-of-the-art structureaware RNA homology search tools.
We also grouped microRNAs based on whether they were found as a single copy or if the microRNA family had been duplicated and therefore contained multiple microRNAs (multicopy microRNAs). Of the conserved microRNAs, there were 22 single-copy microRNAs, and 79 multicopy microRNAs belonging to 15 families ( fig. 1B). For the novel microRNAs, there were 22 single-copy microRNAs, and 25 multicopy microRNAs belonging to nine families ( fig. 1B). Our prediction and annotation process therefore suggests that at least 36% of P. tepidariorum microRNA families have been duplicated at least once.
Identification of Conserved P. tepidariorum microRNA Families in Other Chelicerates We next searched for all of the 148 predicted P. tepidariorum microRNAs in the genomes of other chelicerates (supplementary file S3, Supplementary Material online) and also compared the results to the repertoire of microRNAs present in other selected ecdysozoans ( fig. 2). We identified 32 conserved microRNA families from P. tepidariorum in two other spiders and a scorpion. The conserved families of mir-14, mir-3477, and mir-3791 were present in S. mimosarum, but absent in A. genticulata and C. sculpturatus ( fig. 2). The mir-3791 family appears to be absent from the genomes of all the other chelicerates, except S. mimosarum and L. polyphemus, but is present in some insects (T. castaneum and A. mellifera) that we surveyed, and also shares sequence similarity with the mir-35-41 cluster in Ca. elegans (Massirer et al. 2012;Fromm et al. 2013;McJunkin and Ambros 2014).
The other arachnids showed variable patterns of retention and loss of the 35 conserved microRNA families found in P. tepidariorum. There were nine microRNA families that were present in all of these animals. The lowest number of families in any chelicerate surveyed was 17 in V. destructor ( fig. 2). Apart from the spiders and scorpion, mir-193 is likely to have been lost from the genomes of other chelicerates including L. polyphemus ( fig. 2). Note that this microRNA is commonly lost in metazoan lineages (Tarver et al. 2013). Other potential losses in chelicerates include mir-14 and mir-3477, which was only present in P. tepidariorum and S. mimosarum. Limulus polyphemus has also lost mir-981 and mir-279, but has retained the other 31 conserved P. tepidariorum microRNA families ( fig. 2).
Comparing the conserved microRNA families found in P. tepidariorum to other nonchelicerate arthropods reveals that 23 out of 35 are found in the genomes of all the insects, the myriapod and the crustacean in our survey ( fig. 2). Interestingly, mir-96 may have been lost in insects but retained in the crustacean, Da. pulex, as well as most chelicerates, while mir-3931 appears to have evolved in chelicerates as previously reported (Tarver et al. 2013;Wheeler et al. 2009), Novel P. tepidariorum microRNAs Are Largely Species-Specific Of the 22 novel single-copy microRNA families we identified in P. tepidariorum, two were identified in the genome of A. geniculata and three in S. mimosarum, whereas only one (mir-11960) was found in C. sculpturatus, and none was found in any other chelicerate (fig. 2). As described above, we also identified nine novel multicopy microRNA families in P. tepidariorum. However of these nine families, we were only able to identify mir-11951 in S. mimosarum and mir-11942 in A. geniculata. Though interestingly, both of these microRNA families contained duplicates ( fig. 2). Taken together our analysis suggests that most of the novel microRNAs identified in P. tepidariorum are likely to be young and to have recently evolved in the lineage leading to this spider.
Incidence of Multicopy P. tepidariorum microRNA Families in Other Chelicerates and Ecdysozoans As mentioned above, we found that 15 out of the 35 conserved microRNA families in P. tepidariorum were represented by two or more paralogs. There were also many multicopy families in the other spiders (S. mimosarum and A. geniculata) and the scorpion (C. sculpturatus) with 16, 17, and 16 out of 35 conserved families represented by two or more paralogs, respectively ( fig. 2). There was considerable overlap of multicopy families between these species (S. mimosarum, A. geniculate, and C. sculpturatus) and P. tepidariorum with 12, 11, and 10 families represented by at least two paralogs in the respective species and P. tepidariorum ( fig. 2). Moreover, eight families are multicopy among all four of these species ( fig. 2). In contrast, only up to five (in the parasitiforms M. occidentalis and I. scapularis) multicopy microRNA families were observed in any other chelicerate, or ecdysozoan, we surveyed with the notable exception of L. polyphemus ( fig. 2). In the horseshoe crab, 26 out of the 30 microRNAs found (with respect to the conserved P. tepidariorum microRNAs) were present as multiple copies, with 21 present as three or more copies. These included mir-8 and mir-10, which were not duplicated in any of the other ecdysozoans that we surveyed ( fig. 2).
In the manidibulates, parasitiformes and acariformes an-  -MicroRNA family copy number in P. tepidariorum and other ecdysozoans. Presence of P. tepidariorum single-copy and multicopy microRNA families in other spiders (purple), scorpions (magenta), horseshoe crab (blue) parasitiformes (green), acariformes (dark green), mandibulates (gray), and a nematode (black). The spiders, the scorpion, and the horseshoe crab all display large numbers of multi-copy microRNA families relative to other species.

Small-and Large-Scale Duplications of microRNAs in Chelicerates
These observations strongly suggest that both small-scale tandem duplications as well as larger scale duplications contributed to the pattern of duplicated microRNAs in spiders. In P. tepidariorum, we found that the mir-3791 family is represented by 34 copies, but in contrast only single copies of this family were identified in S. mimosarum and L. polyphemus ( fig. 2). Interestingly, 13 of the P. tepidariorum mir-3791 paralogs are located within a 50 kb locus on a single 671 kb scaffold (supplementary fig. S3, Supplementary Material online). Therefore it is clear that tandem duplications have at least in part contributed to the expansion of this microRNA family in the lineage leading to this spider.
We also found evidence that larger scale duplications have contributed to the expansion of microRNA repertoires in P. tepidariorum and other chelicerates. The mir-71/mir-2 cluster is an invertebrate-specific microRNA cluster that has expanded in arthropods probably due to tandem duplications of mir-2 (Marco et al. 2010;Marco, Hooks, et al. 2012). In the acariform and parasitiform lineages, the copy numbers of mir-71 and mir-2 are variable, though usually there is one copy of mir-71 and two copies of mir-2 ( fig. 2). In the spiders, the scorpion and the horseshoe crab, we consistently found at least two copies of mir-71 and more than two copies of mir-2 ( fig. 2). In the spiders and the scorpion, these microRNAs were generally found as two separate clusters on different scaffolds ( fig. 3). Parasteatoda tepidariorum contains one cluster containing a single copy of mir-71 and three copies of mir-2, and a second cluster, on a different scaffold, also with one copy of mir-71, but with six copies of mir-2 ( fig. 3).
In the S. mimosarum and C. sculpturatus genomes we found one scaffold containing one copy of mir-71 and four copies of mir-2 ( fig. 3), and a second scaffold with one copy of mir-71 and three copies of mir-2 ( fig. 3). In A. geniculata there is one complete cluster with one mir-71 and four mir-2 copies, all on one scaffold ( fig. 3). There is another scaffold with one copy of mir-71 and one copy of mir-2 located approximately 6 kb from the end of the scaffold. There are also two other copies of mir-2, each on approximately 1 kb scaffolds. The arrangement of L. polyphemus mir-71/mir-2 genes is much more fragmented (fig. 3). In this horseshoe crab we found eight copies of mir-71 located on seven scaffolds, of which six also contain at least one copy of mir-2 ( fig. 3). The seven copies of mir-2 that are located on six different scaffolds are all found close to the ends of the scaffolds, and may therefore possibly be fragments of larger mir-71/mir-2 clusters ( fig. 3).
A second microRNA cluster that we observed to be duplicated in some species is the mir-100/let-7/mir-125 cluster (supplementary fig. S4, Supplementary Material online). However, in all lineages there appeared to be possible loss or rearrangement of at least one cluster, although we cannot The mir-71/mir-2 cluster is duplicated in spiders (purple), the scorpion (magenta), and the horseshoe crab (blue). Each species displays lineagespecific retention, loss and further small segmental duplications. The position of the left most microRNA and the total scaffold length are indicated on the left and right hand side, respectively. exclude the possibility that this is an artifact of the genome assembly.
In P. tepidariorum we found one complete cluster (supplementary fig. S4, Supplementary Material online), and additional copies of mir-100 and mir-125 on separate scaffolds but no paralog of let-7 (supplementary fig. S4, Supplementary Material online). The arrangement of these microRNAs in S. mimosarum is similar to P. tepidariorum (supplementary fig. S4, Supplementary Material online) although the two mir-100 paralogs are on separate scaffolds (supplementary fig. S4, Supplementary Material online). Acanthoscurria geniculata has a complete mir-100/let-7/mir-125 cluster and two additional copies of mir-125 are found on separate scaffolds (supplementary fig. S4, Supplementary Material online). In C. sculpturatus, we found a single cluster, which appears to have been rearranged, similar to one of the three L. polyphemus clusters (supplementary fig. S4, Supplementary Material online).
Therefore the mir-71/mir-2 and mir-100/let-7/mir-125 clusters appear to have been duplicated in spiders and scorpions (Arachnopulmonata) and L. polyphemus but not other chelicerates or mandibulates that we surveyed. Taken together, our results show that both tandem duplications and larger scale duplications of entire clusters, followed by subsequent lineage-specific expansion or loss of microRNAs, have contributed to the evolution of chelicerate microRNA repertoires.

Divergence in microRNA Arm Usage in Multicopy microRNAs
Duplicate microRNAs have the potential to diversify their functions by a number of evolutionary mechanisms, including arm usage (Okamura et al. 2008;de Wit et al. 2009;Marco et al. 2010;Griffiths-Jones et al. 2011). Inspection of microRNA arm usage in P. tepidariorum suggests that there is a bias toward the mature sequence deriving from the 3' arm of the hairpin (supplementary fig. S5, Supplementary Material online). The distribution was statistically different from that in both T. castaneum (D = 0.2124; P = 0.002) and D. melanogaster (D = 0.2955; P = 5.4 Â 10 À 7 ), but was not different between T. castaneum and D. melanogaster (D = 0.1343; P = 0.08) (Kolmogorov-Smirnov tests).
We then compared relative arm usage for individual microRNAs between P. tepidariorum, T. castaneum, and D. melanogaster and observed multiple cases of arm switching among these species (fig. 4). Considering first the microRNAs that have unique mature sequences in P. tepidariorum, only mir-275 and mir-3477 exhibit differences in relative arm usage between P. tepidariorum and T. castaneum, (fig. 4A). Comparisons between P. tepidariorum and D. melanogaster shows that mir-10, mir-993a, mir-278b, mir-281, also display changes in arm use ( fig. 4B).
We also identified that one P. tepidariorum mir-3791 paralog had been subject to relative arm switching, and another paralog that had similar 5' and 3' expression, which is in contrast with the rest of this mostly 3' dominant family in P. tepidariorum (supplementary fig. S6, Supplementary Material online). Both of these microRNAs have unique mature sequences in P. tepidariorum (supplementary file S1, Supplementary Material online), though some of the other paralogs of this family do not have unique sequences for both mature products (supplementary file S1, Supplementary Material online).
It was also possible to compare the arm usage among the paralogs of the nine novel multicopy microRNA families identified in P. tepidariorum ( fig. 2). The mir-11942 family exhibited differential arm usage between paralogs (supplementary fig. S7, Supplementary Material online). Another family, mir-11961, also showed variation in arm usage between paralogs, however there was less striking differences between paralogs compared to those observed in the mir-11942 family (supplementary fig. S7, Supplementary Material online). Importantly, the paralogs of these two microRNA families all had unique mature sequences in P. tepidariorum (supplementary file S1, Supplementary Material online). The other novel multicopy families showed relatively similar arm usage between paralogs. However, these belonged to microRNA families with one nonunique mature sequence in P. tepidariorum in relation to their paralogs (supplementary file S1, Supplementary Material online).

Discussion
In this study, we surveyed the repertoire of microRNAs expressed during embryogenesis in P. tepidariorum and compared them with other chelicerates to examine patterns of duplication during the evolution of these genes in arthropods and other animals. From the initial 278 miRDeep2 predictions, we focused on a conservative total of 148 microRNAs representing 66 families expressed during P. tepidariorum embryogenesis ( fig. 1). This number is similar to the complement of 172 microRNAs expressed during D. melanogaster embryogenesis (Ninova et al. 2014). However, D. melanogaster has a total of 256 microRNAs identified across all life stages (Kozomara and Griffiths-Jones 2014). Therefore, it is likely that further microRNAs expressed later in development and in adults remain to be identified in P. tepidariorum. Despite this, we have characterized more microRNAs in this spider than previously identified in other arachnids (R. microplus [87], I. scapularis [49], and T. urticae [52]) (Wheeler et al. 2009;Barrero et al. 2011;Grbić et al. 2011) except the tick Hyalomma anatolicum anatolicum (Luo et al. 2015). Thus P. tepidariorum possesses one of the largest chelicerate microRNA repertoires sequenced to date, and is the only one where small RNAs have been mapped against the specific corresponding genome sequence.
Approximately half of the microRNA families identified in P. tepidariorum by small RNA seq were conserved in other metazoans ( fig. 2). These included 21 out of 31 families that are common to most bilateral animals, and 7 out of the 12 families present in protostomes (Tarver et al. 2013). The losses include mir-33, mir-219, mir-2001, and mir-1993, which have been commonly lost in metazoans (Tarver et al. 2013). P. tepidariorum also contains panarthropod (mir-276 and mir-305), arthropod (iab-4/8 and mir-275), and cheliceratespecific (mir-3931) microRNAs (Rota-Stabelli et al. 2011;Tarver et al. 2013). We did not find mir-242 and mir-216, which are also lost in other arthropods, nor mir-31, which is not found in the chelicerate family of Ixodidae, or the mandibulate-specific mir-282 or mir-965 (Rota-Stabelli et al. 2011;Tarver et al. 2013). It is possible that due to our sampling of embryonic small RNA we may have missed microRNA families (Tarver et al. 2013) that are only expressed during other life stages. Indeed, we were able to identify one copy of mir-133, two copies of mir-137, and three copies of mir-124 (supplementary file S4, Supplementary Material online) in the genome of P. tepidariorum, further increasing the number of microRNAs that we identified.

Evidence of Duplication of Both Coding and Noncoding Genes in P. tepidariorum
Over one-third of the conserved microRNA families and many of the novel microRNA families are present in multiple copies in P. tepidariorum ( fig. 1B). This expansion appears to have resulted from both local tandem duplications and larger scale segmental duplications ( fig. 3, supplementary figs. S3 and S4, Supplementary Material online). Previous reports have found that some protein-coding genes in P. tepidariorum have also been duplicated (Posnien et al. 2014). Of the 8,917 P. tepidariorum transcripts identified as being orthologous to a Drosophila gene, approximately 28% are likely to be expressed from duplicated genes (Posnien et al. 2014). This therefore suggests that there has possibly been greater retention of duplicated microRNAs compared to protein-coding genes in P. tepidariorum. This difference is similar to previous estimates of microRNAs and coding gene retention following largescale duplication (Berthelot et al. 2014).

Duplication and Divergence of microRNAs in Multiple Chelicerate Lineages
Evidence of gene duplication in chelicerates is not limited to P. tepidariorum. Analysis of the transcriptomes of representatives of the Mygalomorphae and Araneomorphae suggests that these spiders are likely to have shared a large-scale duplication event and that retained paralogs may have contributed to the origin and evolution of silk glands (Clarke et al. 2015). Hox genes in the spider Cupiennius salei and scorpions, including C. sculpturatus and Mesobuthus martensii, have also been found to be duplicated, though it is unclear whether these are shared or independent duplication events (Schwager et al. 2007;Di et al. 2015. In the horseshoe crab, there is also evidence of two rounds of whole-genome duplication, which are may be independent of the duplication events that have occurred in spiders or scorpions (Nossa et al. 2014;Sharma, Gupta, et al. 2014;Sharma, Kaluziak, et al. 2014;Kenny et al. 2016;Schwager et al. 2015). The presence of mir-193 in Arachnopulmonata and its absence in all the other chelicerates we surveyed further supports the hypothesis of independent duplication events.
We also identified many duplicate microRNA families in spiders, scorpions, and a horseshoe crab ( fig. 2 and supplementary fig. S2, Supplementary Material online). We found that L. polyphemus has the largest estimated microRNA repertoire among chelicerates. Interestingly, many microRNA families were found to be represented by multiple genes in all three spiders and the scorpion that we surveyed ( fig. 2). These findings are consistent with a possible large-scale duplication event in the common ancestor of spiders and scorpions (Arachnopulmonata), and the two possibly independent rounds of whole-genome duplication in horseshoe crabs (Nossa et al. 2014;Sharma, Gupta, et al. 2014;Sharma, Kaluziak, et al. 2014;Kenny et al. 2016;Schwager et al. 2015).
In contrast to the shared duplication of microRNAs among Arachnopulmonata species, there were differences in the retention of paralogs in some families ( fig. 2). These patterns of gain and loss may be due to genome assembly quality or a lack of small RNA sequencing from adult stages of P. tepidariorum. Alternatively, differential patterns of retention of microRNA paralogs potentially relates to differential retention of duplicate coding genes between these spider species (Clarke et al. 2015). It is also possible that the paralogous microRNAs found in some species could have been produced by lineages-specific duplication events. However, further analysis of the genomes and small RNA sequencing of all of these chelicerates is required to investigate these different evolutionary scenarios.

Evolution of microRNA Function
Despite the shared retention of many duplicated genes generated by putative tandem and large-scale duplication events, it is clear from our results that there are also common lineagespecific losses (figs. 2 and 3, supplementary figs. S3 and S4, Supplementary Material online). Changes in microRNA biogenesis may have also contributed to the evolution of their function among chelicerates. We identified that P. tepidariorum, like some other ecdysozoans (de Wit et al. 2009), has a general 3' arm bias with respect to preferential strand loading into the RISC. This bias is greater than that reported for 3' arm usage compared to T. castaneum and D. melanogaster (Marco et al. 2010) (supplementary fig. S5, Supplementary Material online). This could perhaps be caused by the many paralogs of the mir-3791 and mir-2 families, which were generally 3' dominant, though both of these families do show instances of arm switching ( fig. 4 and supplementary fig. S6, Supplementary Material online). However, the microRNA that switched in the mir-2 family did not contain completely unique mature sequences and relative arm switching may have been over or underestimated. Many of the microRNAs that show arm switching, relative to T. castaneum and D. melanogaster, also belong to multicopy microRNA families. These results suggest that duplication may facilitate functional change between paralogs and provides further evidence that microRNA duplication facilitates changes that can alter strand selection (de Wit et al. 2009).
There were also cases of single-copy microRNAs that have been subject to arm switching between species. miR-10-5p is more abundant than miR-10-3p in both P. tepidariorum and T. castaneum, while miR-10-3p dominates in D. melanogaster (Griffiths-Jones et al. 2011). In both P. tepidariorum and Drosophila, mature products from both arms are expressed at detectable levels (Stark et al. 2007). The expression of both mature arms may be a feature that contributes to a microRNAs ability to switch strand usage between species.
The pervasive duplication and subsequent divergence in retention and copy number, as well as arm switching, that we have identified among chelicerate microRNAs may have led to their subfunctionalization and neofunctionalization (Ohno 1970;Taylor and Raes 2004;Li et al. 2008;Amoutzias and Van de Peer 2010;Innan and Kondrashov 2010;Abrouk et al. 2012;Huminiecki and Conant 2012;Wang and Adams 2015). These evolutionary differences, therefore, may have contributed to the divergence in the developmental programs of chelicerates.

Conclusions
Our characterization of microRNAs expressed during P. tepidariorum embryogenesis and the identification of their orthologs in other arthropods show that there has been pervasive duplication and subsequent divergence in the sequences of these paralogous genes in spiders, scorpions, and the horseshoe crab. It is now essential to apply the tools for analysis of gene expression and function available in P. tepidariorum to test the developmental implications of these changes to provide a perspective on the evolution of microRNAs in chelicerates, arthropods, and metazoans.