First annotated draft genomes of nonmarine ostracods (Ostracoda, Crustacea) with different reproductive modes

Abstract Ostracods are one of the oldest crustacean groups with an excellent fossil record and high importance for phylogenetic analyses but genome resources for this class are still lacking. We have successfully assembled and annotated the first reference genomes for three species of nonmarine ostracods; two with obligate sexual reproduction (Cyprideis torosa and Notodromas monacha) and the putative ancient asexual Darwinula stevensoni. This kind of genomic research has so far been impeded by the small size of most ostracods and the absence of genetic resources such as linkage maps or BAC libraries that were available for other crustaceans. For genome assembly, we used an Illumina-based sequencing technology, resulting in assemblies of similar sizes for the three species (335–382 Mb) and with scaffold numbers and their N50 (19–56 kb) in the same orders of magnitude. Gene annotations were guided by transcriptome data from each species. The three assemblies are relatively complete with BUSCO scores of 92–96. The number of predicted genes (13,771–17,776) is in the same range as Branchiopoda genomes but lower than in most malacostracan genomes. These three reference genomes from nonmarine ostracods provide the urgently needed basis to further develop ostracods as models for evolutionary and ecological research.


Relevance of ostracods
Ostracoda are small, bivalved crustaceans, widely occurring in almost all aquatic habitats as part of the meiobenthos and periphyton. There are 2330 formally described species of extant nonmarine ostracods (Meisch et al. 2019) and at least another 7000 described species of extant marine ostracod species (see Schö n and Martens 2016 for an estimate by S. Brandao). Their calcified valves are preserved as microfossils, making them the extant arthropod group with the most extensive fossil record. The group has an estimated (Cambrian) age of c. 500 myr (millions of years) according to a molecular clock (Oakley et al. 2013), and c. 450 myr (Ordovician;Maddocks 1982) to 509 myr (Wolfe et al. 2016) according to the fossil record. This makes them one of the oldest extant pancrustacean groups ( Figure 1). Because of their excellent fossil data, evolutionary events can be dated with real-time estimates making ostracods ideal models for evolutionary research (Butlin and Menozzi 2000;Oakley and Cunningham 2002;Oakley et al. 2013;Schö n and Martens 2016).
Contrary to the extensive focus on this group for palaeontological research, there is a total lack of published ostracod genomes, and even isolated genomic data from ostracods in open access databases are still rare. Thus, the only resources available beyond individual gene sequences are four mitogenomes [the marine ostracods Vargula hilgendorfii (Ogoh and Ohmiya 2004; GenBank accession number NC_005306) and Cypridina dentata NC_042792); and two unpublished mitogenomes from V. tsujii (NC_039175) and Cyprideis torosa (PRJNA302529)]. Also, raw Illumina DNA sequencing reads of the podocopid ostracod Eucypris virens have been generated as part of a study testing DNA extraction methods for high-throughput sequencing in zooplankton (SRX8021019; Beninde et al. 2020) but these have neither been assembled nor annotated. In studies on crustacean phylogenies and gene expression (see Supplementary  Table S1 for details), raw RNA-sequencing reads have been generated for a total of 12 species coming from the three major ostracod lineages (Mydocopida, Halocyprida, and Podocopida), but the number of assembled and annotated ostracod genes in these studies remains very limited, ranging between 4 and 822 genes.

Choice of model species
Extant nonmarine ostracods show a high prevalence of asexual reproduction (Chaplin et al. 1994;Butlin et al. 1998;Martens et al. 1998), which has evolved several times independently in different ostracod lineages and is most frequent in the Cyprididae and the Darwinulidae. Ostracods are thus an ideal group to further study the paradox of sex, which remains one of the most puzzling questions in evolutionary biology (Bell 1982 Jaron et al. 2020) and to develop insights into mechanisms such as gene conversion (Omilian et al. 2006), DNA repair (Schö n and Hecox-Lea and Mark Welch 2018), or horizontal gene transfer (Gladyshev et al. 2008;Danchin et al. 2010;Boschetti et al. 2012;Paganini et al. 2012;Flot et al. 2013). Such data are also needed to further test for general consequences of asexuality beyond lineage-specific effects . For many animal groups in which asexuality is frequent, genomic data are limited to a few representatives only (Tvedte et al. 2019) or are totally absent like in the Ostracoda.

Figure 1
The phylogenetic position of the Ostracoda among the pancrustaceans and their age estimated from fossil and molecular data. Modified from Oakley et al. (2013). Different pancrustaceans are indicated by branches in different colors. The Ostracoda include the Podocopida, Platycopida, and Myodocopida. Here, three representatives of the Platycopida (indicated in purple) have been sequenced. The phylogenetic clade to which D. stevensoni belongs, is indicated by D, the clade to which C. torosa and N. monacha belong, is indicated by *. Black horizontal bars represent the range of age estimates in myr from Bayesian analyses by Oakley et al. (2013). The letters A-C in the black boxes indicated fossils that were used for calibrations of age estimates.
Of all extant nonmarine ostracods, the Cyprididae (cyprids) are most speciose, comprising 42% of all known species (Meisch et al. 2019). They would thus be an obvious choice for genomic studies, also because in this ostracod family, mixed reproduction with sexual and asexual females and geographic parthenogenesis is very common . Asexual cyprids, however, are often polyploid (Adolfsson et al. 2010;Symonová et al. 2018), probably because of hybridization between males and asexual females through accidental mating (Schmit et al. 2013). Consequently, genome sizes are relatively large (Jeffery et al. 2017;Gregory 2020) up to 3.13 pg which equals more than 3 Gb. These features are likely to seriously complicate genomic assemblies and annotations in the absence of any genomic resources for ostracods, which is why we did not choose any asexual cypridid ostracods for this genome project. Instead, we have selected three other species of nonmarine ostracods, one putative ancient asexual darwinulid ostracod and two species with obligate sexual reproduction.
The ostracod family Darwinulidae is one of the two last remaining animal groups which are still supported as being genuine ancient asexuals (Heethoff et al. 2009;Schö n et al. 2009b;Schwander 2016) and comprises about 35 morphospecies (Meisch et al. 2019). All darwinulids are brooders with valve dimorphisms between males and females that are detectable in the fossil record. Martens et al. (2003) showed that males have been absent in this family for at least 200 myr. One study reported a few males in a single darwinulid species (Smith et al. 2006) but proof of the functionality of these males for successful mating and meaningful genetic exchange could not been provided. Such (potential) atavistic males have also been reported in other putative ancient asexuals (Heethoff et al. 2009). The type species of the Darwinulidae, Darwinula stevensoni, has been asexual since c. 20 myr (Straub 1952), occurs on all continents except Antarctica (Schö n et al. 2012) and in a wide range of habitats (Schö n et al. 2009b). Darwinula stevensoni is the best investigated darwinulid ostracod so far and has been the subject of ecological (Van Doninck et al. 2002, 2003a, 2003bVan den Broecke et al. 2013) and molecular research using DNA sequence data from single genes Martens et al. 2005;Schö n et al. 2012). These studies revealed that D. stevensoni is most likely apomictic or functionally mitotic (following the definition of apomixis in animals as in Schö n et al. 2009a). The species also has low mutation rates as there appears to be no (Schö n et al. 1998) or low (Schö n andSchö n et al. 2009b) allelic divergence within individuals, and genetic differences between populations from different continents can be attributed to ancient vicariant processes (Schö n et al. 2012). It has also been suggested that gene conversion is common in this species , which could be an explanation for the low observed mutation rates. These results, however, were based on a limited number of genes and require further confirmation with genomewide data. Darwinula stevensoni has a life cycle of 1 year in Belgium (Van Doninck et al. 2003b) and up to 4 years in more northern regions (McGregor 1969 in Northern America;Ranta 1979 in Finland), which is exceptionally long for a nonmarine ostracod. It can survive a wide range of temperatures, salinities (Van Doninck et al. 2002), and oxygen concentrations (Rossi et al. 2002). The total genome size of D. stevensoni has been estimated as 0.86-0.93 pg with flow cytometry (Paczesniak, unpublished), approximating 900 Mb. There is no information on the ploidy level of D. stevensoni, except for the study by Té tart (1979) showing 22 dot-like chromosomes.
Because of its putative ancient asexuality, no close sexual relatives of D. stevensoni are available for comparative, genomic analyses. We have chosen two fully sexual nonmarine ostracod species from the Cytherideidae and the Notodromadidae with high population densities in Belgium as comparisons to the putative ancient asexual: C. torosa and Notodromas monacha, respectively. Cyprideis torosa inhabits brackish waters and is the only extant species of this genus in Europe (Meisch 2000). It has been the subject of various biological and especially palaeontological and geochemical studies (see for example, Heip, 1976aHeip, , 1976bDe Deckker et al. 1999;Keyser 2005). Frogley and Whittaker (2017) suggested that C. torosa is at least of Pleistocene origin (c. 2.5 myr) but might be older. There are only two molecular studies of this species based on single genes (Schö n Schö n et al. 2017). No information on the genome size or the karyotype of C. torosa is currently available.
The second sexual ostracod species analyzed here, N. monacha, occurs throughout the Northern hemisphere and is a nonmarine ostracod with a most peculiar behavior: it is partially hyponeustonic, hanging upside down attached to the water surface (Meisch 2000). The fossil record of N. monacha goes back to the Miocene (max 23 myr-Janz 1997), and its genome size is at 0.87 pg (Jeffery et al. 2017;Gregory 2020) very similar to that of D. stevensoni. This species has not yet been the subject of any molecular studies.
Our aim here is to provide the first reference genome data of nonmarine ostracods from three different species with varying reproductive modes: the putative ancient asexual D. stevensoni and the two obligate sexuals, C. torosa and N. monacha. We also generate transcriptomes of these species to facilitate genome annotations.

Sample collection for genome and transcriptome sequencing
All three nonmarine ostracod species were sampled in Belgian lakes where previous research had shown that these species occurred Merckx et al. 2018). Living ostracods were sampled using a hand net with a mesh size of 150 mm. The hand net was swept in between the vegetation and forcefully right above the surface of the sediment for collecting D. stevensoni and C. torosa. N. monacha was sampled by moving the net on the water surface. Nonmarine ostracods were kept in habitat water. Their taxonomic identity was confirmed, and they were sorted alive under a binocular microscope as described by Martens and Horne (2016). Individual ostracods were picked with a pipette and transferred into sterilized EPA water in which they were maintained until DNA and RNA were extracted. More details on the origin of biological samples are provided in Supplementary Table S2. For generating reference genomes, DNA was extracted from a single female of each species using the QIAamp DNA Micro kit according to the manufacturer's instructions. The extracted DNA from single females was amplified in two independent reactions using the SYNGIS TruePrime WGA kit and then pooled, to generate sufficient DNA for preparing different libraries. To generate transcriptomes for annotation of reference genomes, RNA was extracted from 40 pooled individuals per species from the same collection batch. For this, individuals were frozen in liquid nitrogen and, after addition of Trizol (Life Technologies), mechanically crushed with beads (Sigmund Lindner). Next, chloroform and ethanol-extraction methods were applied to the homogenized tissue and the aqueous layer transferred to RNeasy MinElute Columns (Qiagen). Subsequent steps of RNA extraction were done following the RNeasy Mini Kit protocol, including DNase digestion. Finally, RNA was eluted into water and stored at À80 C. RNA quantity and quality were estimated with the NanoDrop (Thermo Scientific) and Bioanalyzer (Agilent).

Genome assembly
We prepared five genomic DNA libraries for each reference genome (three 2 Â 125 bp paired-end libraries with average insert sizes of 250-300, 550 and 700 bp, and two mate-pair libraries with average insert sizes of 3000 and 5000 bp; see Supplementary  Table S3 for more details) with the Illumina TruSeq DNA Library Prep Kit. Reads were generated with the Illumina HiSeq 3000 system for a total coverage between 351Â and 386Â (Supplementary Table S3).

Protein coding gene annotation
Libraries were prepared using the Illumina TruSeq Stranded RNA kit, following the manufacturer's instructions. RNA reads were generated with the Illumina HiSeq 2500 system (Supplementary  Table S4). Reads were filtered with Trimmomatic v0.36. All trimmed reads were mapped against the genomes with STAR v2.5.3a (Dobin et al. 2013) and further assembled with Trinity v2.5.1 (Haas et al. 2013) under the "genome guided" mode to produce transcriptome assemblies.
More details of the annotation pipelines and the applied parameters can be found in Supplementary Material SM2.

GenomeScope analyses
The whole genome amplification approach, which we used in the present study because of the small body size of individual ostracods, generated unequal read coverage of ostracod genomes and prevented us from directly estimating genome sizes and levels of heterozygosity from the assemblies. To overcome this problem, we re-sequenced two individual ostracods each of D. stevensoni and N. monacha without whole genome amplification, preparing libraries with the NEBNext V R Ultra TM II DNA Library Prep Kit for Illumina. Reads were filtered with Trimmomatic v0.36 and analyzed using GenomeScope v2.0 (Ranallo-Benavidez et al. 2020) to correctly estimate genome size and heterozygosity. More details on the analyses are provided in the Supplementary Material SM3.  Gutekunst et al. (2018) Assembly size is provided in million base pairs, scaffold N50 in kilo base pairs, and BUSCO scores in %. Letters behind BUSCO scores indicate the % of complete single copy genes (C) or % of single and fragmented single copy genes (C þ F), respectively. Where BUSCO scores lack brackets, no further information on completeness of single copy genes was provided. n.i. ¼ no information available.

First ostracod reference genomes and their attributes
We produced the first de novo reference genomes of nonmarine ostracods, namely of the three species D. stevensoni, C. torosa, and N. monacha with different reproductive modes (see Supplementary Material SM1 and Tables S3 and S4 for more details on the assemblies). We used a whole genome amplification approach (WGA), because the TruSeq DNA Nano library prep kit for Illumina sequencing or low input protocols for PacBio (Duncan et al. 2019) were not available when these assemblies were generated. We would not recommend WGA for future studies because this PCR-based method generated uneven coverage, and consequently, problems for applying routine genome assembly methods and estimates of genome size and heterozygosity. Despite these limitations, our approach produced genome assemblies that are useful for future research as will be outlined below.
When assessing the quality of the obtained ostracod de novo genome assemblies, the assembly of the putative ancient asexual, D. stevensoni, had the best contiguity, with the largest N50 although the total number of scaffolds was similar to N. monacha (Table 1, Supplementary Table S6). The genome of the putative ancient asexual is furthermore the most complete as shown by its total BUSCO score of 96% and of 94% for complete single copy genes ( Table 1). The quality of the genome from the obligate sexual ostracod C. torosa is the lowest of the three ostracod species as it has the highest number of scaffolds, and the lowest N50; it is also less complete with a total BUSCO score of 92% (Supplementary Table S7) and of 87% for complete single copy genes (Table 1). All three species have similar numbers of predicted genes and transcripts (Supplementary Table S7).
Ostracod genome sizes estimated with flow cytometry are somewhat larger than the estimates that we obtained here from GenomeScope analyses of re-sequenced individual ostracods. The haploid genome size of D. stevensoni was estimated at 420-455 Mb with flow cytometry (Paczesniak, unpublished) while we estimated 362 Mb from sequence reads (Supplementary Figure S1, A and B). Similarly, the size of the haploid genome of N. monacha is estimated at 425 Mb with flow cytometry (Jeffery et al. 2017;Gregory 2020), which is larger than the 385 Mb (Supplementary Figure S1, C and D) that we obtained from sequence reads. It thus seems that either the genome size estimates by flow cytometry are incorrect or that some parts of each genome are missing from our sequencing reads. Transposons and repeat-rich genomic regions can contribute to gaps in genomic assemblies (Peona et al. 2020). Some of these missing regions could also be GC rich, a feature which is known to cause a sequencing bias with Illumina technology (see for example, Chen et al. 2013, Botero-Castro et al. 2017. Acquiring more complete genome assemblies will require the additional application of long-read technologies to ostracods. Genome-wide estimates of heterozygosity are especially interesting for asexual taxa because the absence of recombination is expected to cause accumulation of mutations, resulting in increasing allelic divergences within individuals (Birky 1996). Jaron et al. (2020) identified three factors driving intragenomic heterozygosity in asexuals: how the transition to parthenogenesis occurred, which cytological mechanism underlies parthenogenesis and how long asexual reproduction has been ongoing. Based on sequencing reads from individual ostracods, we estimate heterozygosity of the putative ancient asexual ostracod D. stevensoni to be 0.92%-0.99% (Supplementary Figure S1, A and B) and 1.32%-1.43% for the sexual N. monacha (Supplementary Figure S1, C and D). The genome-wide heterozygosity of D. stevensoni matches to some extent an earlier study on intra-individual divergence in three nuclear genes of D. stevensoni (Schö n and . The finding of almost 1% heterozygosity in D. stevensoni is remarkable, given that all previous genome-wide estimates for asexual arthropods that did not evolve via hybridization revealed extremely low levels of heterozygosity ). Yet heterozygosity is clearly less than the estimates for parthenogenetic species with known hybrid origin (1.73%-8.5%) or polyploidy (1.84%-33.21%) , supporting the view that D. stevensoni is neither a hybrid nor a polyploid. Asexual reproduction in ostracods is thought to be apomictic (Chaplin et al. 1994), implying that observed heterozygosity levels are largely dependent on the relative impact of heterozygosity losses from gene conversion and heterozygosity gains from new mutations. Given the apparent absence of sex and recombination for millions of years (Straub 1952), it is perhaps surprising that heterozygosity in this putative ancient asexual ostracod is not larger. This may suggest that genome-wide rates of gene conversion and mutation are comparable in this species.

Genome contiguity of ostracod assemblies as compared to other crustaceans
We here compare the qualities of our ostracod genome assemblies to those of 19 other crustacean species (Table 1) published in the last 4 years. We only include studies with complete assemblies and sufficient information to assess assembly qualities. We assessed the contiguity of the three de novo ostracod genome assemblies by the number of scaffolds and their N50. Both features are comparable to those of the copepod Apocylops royi (Jørgensen et al. 2019) (Table 1). Long-read technologies such as PacBio used to require a relatively large amount of high-molecular weight DNA (Solares et al. 2018), which could not be obtained for ostracods with their very low yields of highmolecular weight DNA from individual specimens and their small body sizes as compared to many other crustaceans (Schö n and Martens 2016). We hope that low input protocols for PacBio (Duncan et al. 2019) and other long-read technologies can be successfully applied to ostracods in the future, in which case the genome assemblies obtained here could form the basis for subsequent hybrid assemblies. Optimizing Oxford Nanopore Technology for nonmarine ostracods has already commenced (Schö n et al. in prep.).

Genome annotations of ostracods and other crustaceans
Because our de novo ostracod genome assemblies are relatively complete (see BUSCO scores in Table 1), we will here also briefly compare some features of predicted protein coding genes with those of other crustaceans (Supplementary Table S8). We have predicted 13,771-17,776 protein coding genes in the three nonmarine ostracod genomes (Supplementary Tables S7 and S8), with the highest number for the sexual C. torosa and an intermediate estimate for the putative ancient asexual D. stevensoni. The number of annotated protein coding genes in nonmarine ostracods is similar to estimates for various branchiopods and the copepods Oithona nana, Tigriopus californicus, and T. kingsejongensis but lower than in most malacostracans (Supplementary Table S8). Not all genome studies of crustaceans cited here contain information on other features of coding genes, such as the average size of genes, introns, and exons (Supplementary Table S7). Comparisons of these features are therefore limited and will not be further discussed here but we provide available data of these features for ostracods and other crustacean genomes for reference.
Gene annotation in general but especially in the crustaceans is challenging; this is for example illustrated by the much lower numbers of protein coding genes (18,440) which are predicted in the novel reference genome of the cladoceran Daphnia pulex by Ye et al. (2017) as compared to the first assembly of D. pulex with more than 30,000 predicted genes (Colbourne et al. 2011). Even more difficult is assigning gene functions to annotated crustacean genomes (Rotllant et al. 2018). The novel data on predicted genes and transcripts from nonmarine ostracods in the current study will significantly contribute to future genome annotations in crustaceans and other arthropods. The genes and transcripts predicted here can also provide the baseline for future gene expression studies of nonmarine and marine ostracods.

Conclusions
We have successfully obtained de novo genome assemblies for three species of nonmarine ostracods with different reproductive modes. These represent the first quality reference genomes for ostracods. Given the paucity of genome assemblies from crustaceans as compared to insects or other arthropods, these assemblies are important tools to further develop ostracods as models for evolutionary and ecological research, also including marine species. Even if the de novo genome assemblies are somewhat fragmented and not yet at the chromosome level, they have a high level of completeness and will thus facilitate future studies of ostracods. The genomes presented here can also provide the first step toward a genomic assessment of the putative ancient asexual status of nonmarine darwinulid ostracod species.