Box, stalked, and upside-down? Draft genomes from diverse jellyfish (Cnidaria, Acraspeda) lineages: Alatina alata (Cubozoa), Calvadosia cruxmelitensis (Staurozoa), and Cassiopea xamachana (Scyphozoa)

Abstract Background Anthozoa, Endocnidozoa, and Medusozoa are the 3 major clades of Cnidaria. Medusozoa is further divided into 4 clades, Hydrozoa, Staurozoa, Cubozoa, and Scyphozoa—the latter 3 lineages make up the clade Acraspeda. Acraspeda encompasses extraordinary diversity in terms of life history, numerous nuisance species, taxa with complex eyes rivaling other animals, and some of the most venomous organisms on the planet. Genomes have recently become available within Scyphozoa and Cubozoa, but there are currently no published genomes within Staurozoa and Cubozoa. Findings Here we present 3 new draft genomes of Calvadosia cruxmelitensis (Staurozoa), Alatina alata (Cubozoa), and Cassiopea xamachana (Scyphozoa) for which we provide a preliminary orthology analysis that includes an inventory of their respective venom-related genes. Additionally, we identify synteny between POU and Hox genes that had previously been reported in a hydrozoan, suggesting this linkage is highly conserved, possibly dating back to at least the last common ancestor of Medusozoa, yet likely independent of vertebrate POU-Hox linkages. Conclusions These draft genomes provide a valuable resource for studying the evolutionary history and biology of these extraordinary animals, and for identifying genomic features underlying venom, vision, and life history traits in Acraspeda.

Abstract: Anthozoa, Endocnidozoa, and Medusozoa comprise the three major clades of Cnidaria. Medusozoa is further divided into four clades, Hydrozoa, Staurozoa, Cubozoa, and Scyphozoa-the latter three lineages make up the clade Acraspeda. Acraspeda encompasses extraordinary diversity in terms of life history, numerous nuisance species, taxa with complex eyes rivaling other animals, and some of the most venomous organisms on the planet. Genomes have recently become available within Scyphozoa and Cubozoa, but there are currently no published genomes within Staurozoa and Cubozoa. Here we present three new draft genomes of Calvadosia cruxmelitensis (Staurozoa), Alatina alata (Cubozoa), and Cassiopea xamachana (Scyphozoa) for which we provide a preliminary orthology analysis that includes an inventory of their respective venom-related genes. Additionally, we identify synteny between POU and Hox genes that had previously been reported in a hydrozoan, suggesting this linkage is highly conserved, possibly dating back to at least the last common ancestor of Medusozoa, yet likely independent of vertebrate POU-Hox linkages. These draft genomes provide a valuable resource for studying the evolutionary history and biology of these extraordinary animals, and for identifying genomic features underlying venom, vision, and life history traits in Acraspeda.
information to allow them to be uniquely identified, should be included in the Anthozoa, Endocnidozoa, and Medusozoa comprise the three major clades of Cnidaria. 24 Medusozoa is further divided into four clades, Hydrozoa, Staurozoa, Cubozoa, and 25 Scyphozoa-the latter three lineages make up the clade Acraspeda. Acraspeda encompasses 26 extraordinary diversity in terms of life history, numerous nuisance species, taxa with complex 27 eyes rivaling other animals, and some of the most venomous organisms on the planet. Cassiopea xamachana (Scyphozoa) for which we provide a preliminary orthology analysis 32 that includes an inventory of their respective venom-related genes. Additionally, we identify 33 synteny between POU and Hox genes that had previously been reported in a hydrozoan, 34 suggesting this linkage is highly conserved, possibly dating back to at least the last common 35 and other venomous organisms. Given the great phylogenetic distance between cnidarians 118 and these well-studied venomous organisms, a better understanding of the cnidarian venom 119 repertoire can provide insight into the evolution of venom and venom-encoding genes. 120 Here we present three new genomes for species of the three major Acraspeda 121 These three acraspedan genomes were assembled at different times throughout a five-143 year period as part of several independent projects overseen by the coauthors, using separate 144 methods for collection, extraction, sequencing, and assembly (see below). This valuable 145 resource to the scientific community is the culmination of an extensive collaborative effort to 146 respond to the need for model medusozoan systems in a plethora of research fields. 147

Cassiopea xamachana Sample Collection and DNA extraction
148 We propagated C. xamachana polyps from a single polyp via asexual budding (Line 149 T1-A). Polyps were maintained symbiont-free at 26 °C, and fed 3 times weekly with Artemia 150 nauplii. To avoid the possibility of food-source contaminates interfering with downstream 151 bioinformatic analysis, we starved the polyps for seven days in antibiotic-treated seawater 152 prior to preservation in 95% ethanol; any Artemia cysts retained within the gut were 153 manually removed before preservation. We extracted genomic DNA from the apo-symbiotic 154 (lacking endosymbionts) polyps using a CTAB (cetyl trimethylammonium bromide) phenol 155 chloroform extraction, first performing an overnight digestion of whole polyp tissue with 156 proteinase K (20 mg/ml) in CTAB buffer before proceeding with the standard protocol. DNA 157 extract was stored at -20 °C until further processing. Hiseq2000. Approximately 634 million reads totaling 117.6 Gb of high-quality paired-end 179 sequence data were generated. We performed adaptor trimming and quality filtering using 180   With the mitochondrial sequences removed, we generated two "sub-optimal" assemblies 211 using Platanus v1.2.1 with kmer size of 32 bp and 45 bp and default settings. Subsequently, 212 we used these "sub-optimal" assemblies to construct artificial mate-pair libraries for 9 insert 213 PacBio RS II platform, resulting in 486,000 long-reads totaling 990.2 Mb of data. We 231 conducted hybrid assembly of Illumina short-reads and PacBio long-reads using MaSuRCA 232

[83] (which includes an error correction step for paired-end reads) that resulted in an 233
assembly of 291,445 contigs and an N50 of 7,049 bp (NCBI Accession PUGI00000000). We 234 did not perform adapter trimming prior to assembly because the MaSuRCA manual advises 235 against preprocessing of reads, including adapter removal. Nevertheless, we identified 236 considerable adapter contamination in our final assembly. Subsequently, we used a custom 237 script (remove_adapters_and_200.pl https://github.com/josephryan/Ohdera_et_al_2018) to 238 remove adapters and sequences shorter than 200 nucleotides. The total length of the assembly 239 was 851.1 Mb. We recovered 29.84 % (8.06 % complete and 21.78 % partial) of the core 240 eukaryotic genes and 32.11% (18.30 % and 13.81 % partial) of the core metazoan genes with 241 CEGMA and BUSCO, respectively. The low recovery rates for conserved genes in the A. 242 alata genome are likely due to the considerably larger size of the genome, which tends to be 243 coupled with long introns, and therefore higher rates of gene fragmentation in a draft 244 assembly [84][85][86]. 245

Comparison of Assemblies 246
A comparison of the draft genomes assembled in this study reveals that the genome of 247  (Table 1). 255

Gene Model Prediction 256
We predicted genes for all three genomes using Augustus v3. Our OrthoFinder analysis generated a total of 80,482 orthogroups for the combined 279 genomic datasets. Using a custom script, we identified 756 Cnidaria-specific orthogroups, 280 another 562 medusozoan-specific orthogroups, and yet another 1091 Acraspeda-specific 281 orthogroups ( Figure 2); genes in each taxon-specific orthogroup were non-overlapping. Of 282 these unique orthogroups, we were able to retrieve Swiss-Prot annotations for 57% of 283 Cnidaria-specific orthogroups, 32% of Medusozoa-specific orthogroups, and 55% of 284 Acraspeda-specific orthogroups (Table S1-S3). Unannotated orthogroups may represent 285 taxonomically restricted genes or genes no longer discernable as such due to possibly 286 extensive genetic mutation experienced in evolutionary history. Using this framework, we 287 identified enriched GO (gene ontology) terms, uncovering 123 terms corresponding to 288 biological process that appear to be enriched within Cnidaria: 107 for Medusozoa, and 14 for 289 Acraspeda (adjusted p-value < 0.01). We used ReViGO to remove redundant GO terms 290 from these initial lists and grouped them further through k-means clustering by 291 Euclidean distance. The optimal number of clusters was predicted using the R package 292 NbClust v3.0. 293 Our ReViGO analysis reduced the 123 Cnidaria-specific GO terms for biological 294 processes to 59 non-redundant ReViGO terms comprising 5 clusters (Figure 4 Table S1, 295 Figure S1). Within the five clusters, many genes putatively encoding proteins for the 296 cnidarian nerve net were represented (e.g., development and sensory perception) indicative of 297 a system exhibiting a complex response to physical and chemical stimuli. Additionally, terms 298 related to transport (ion, amines, carbon compounds) and the extracellular matrix were also 299 represented. Genes associated with the extracellular matrix are possibly linked to the 300 cnidarian novelty, the mesoglea (the proteinaceous layer between the endoderm and ectoderm 301 in these diploblastic animals). 107 Medusozoa-specific GO terms were reduced to 41 non-302 redundant ReViGO terms, which when grouped by k-means formed 3 clusters ( Figure  303 7, Table S2, Figure S2). Similar to terms represented within Cnidaria, Medusozoa terms were 304 also associated with response to stimuli and neural. Unique terms seemingly important to 305 medusozoan biology were those related to wound healing and tissue migration, as well as 306 apoptotic signaling regulation. These terms are possibly associated with unique asexual 307 reductive traits (budding, fission, strobilation, etc.) seen within the medusozoan lineage. 308 Despite initially identifying 1091 orthogroups unique to Acraspeda, only 14 GO terms were 309 enriched (highly abundant); this number was further reduced to 8 non-redundant ReViGO 310 terms ( Figure 5, Table S3,). The apparently low enrichment may reflect an under-311 representation of Acraspeda genes within the reference database. Interestingly, half of 312 the terms were associated with DNA recombination.

Venom Analysis 314
We identified potential venom-encoding genes within the cnidarian transcriptomes 315 indicate that it is not an Anthozoa-specific protein, but rather variably distributed in Cnidaria. 346

Hox-POU synteny analysis 347
In the hydrozoan Eleutheria dichotoma, a POU6 class homeobox gene is fused with a These medusozoan genomes will be useful resources in developing functional 389 constructs (e.g. CRISPR/Cas9 guide RNAs) that can be employed to understand the genomic 390 basis for some of the captivating biological innovations of these animals, and eventually for 391 the design of probes for target-capture DNA sequencing. Lastly, the availability of these 392 genomic-level sequence data is an important step forward in the pursuit to elucidate 393 evolutionary events that may have shaped Medusozoa, and in reconstructing the last common 394 ancestor of Cnidaria and Bilateria. Therefore, we are confident that these new genomes will 395 prove valuable for understanding the biology of these fascinating creatures, and for exploring 396 key genomic events that were formative in the early evolution of animals. 397

Availability of supporting Data 398
Accession numbers for raw sequencing reads and assemblies are available in Table 1 Table S3. Orthogroups specific to Acraspeda identified using OrthoFinder and annotated by a 893 representative gene from the Hydra magnipapillata genome. Protein annotations were retrieved 894 from Swiss-Prot. Table S4: Venom-encoding gene repertoire of five cnidarian genomes identified via the 897 venomix database and pipeline. Venom genes are categories by families (column 1). Both 898 genomic and transcriptomic data were used, with transcriptomic isoforms counted as a 899