A draft genome sequence of the elusive giant squid, Architeuthis dux

ABSTRACT Background The giant squid (Architeuthis dux; Steenstrup, 1857) is an enigmatic giant mollusc with a circumglobal distribution in the deep ocean, except in the high Arctic and Antarctic waters. The elusiveness of the species makes it difficult to study. Thus, having a genome assembled for this deep-sea–dwelling species will allow several pending evolutionary questions to be unlocked. Findings We present a draft genome assembly that includes 200 Gb of Illumina reads, 4 Gb of Moleculo synthetic long reads, and 108 Gb of Chicago libraries, with a final size matching the estimated genome size of 2.7 Gb, and a scaffold N50 of 4.8 Mb. We also present an alternative assembly including 27 Gb raw reads generated using the Pacific Biosciences platform. In addition, we sequenced the proteome of the same individual and RNA from 3 different tissue types from 3 other species of squid (Onychoteuthis banksii, Dosidicus gigas, and Sthenoteuthis oualaniensis) to assist genome annotation. We annotated 33,406 protein-coding genes supported by evidence, and the genome completeness estimated by BUSCO reached 92%. Repetitive regions cover 49.17% of the genome. Conclusions This annotated draft genome of A. dux provides a critical resource to investigate the unique traits of this species, including its gigantism and key adaptations to deep-sea environments.

The giant squid (Architeuthis dux; Steenstrup, 1857) is an enigmatic giant mollusk with a circumglobal distribution in the deep ocean, except in the high Arctic and Antarctic waters. The elusiveness of the species makes it difficult to study. Thus, having a genome assembled for this deep-sea dwelling species will allow unlocking several pending evolutionary questions.

Findings
We present a draft genome assembly that includes 200 Gb of Illumina reads, 4 Gb of Moleculo synthetic long-reads and 108 Gb of Chicago libraries, with a final size matching the estimated genome size of 2.7 Gb, and a scaffold N50 of 4.8 Mb. We also present an alternative assembly including 27 Gb raw reads generated using the Pacific Biosciences platform. In addition, we sequenced the proteome of the same individual and RNA from three different tissue types from three other species of squid to assist genome annotation. We annotated 51,225 unique protein coding genes, from which 30,472 have transcript evidence. Genome completeness estimated by BUSCO reached 92%. Repetitive regions cover 49.17% of the genome.

Conclusions
This annotated draft genome of A. dux provides a critical resource to investigate the unique traits of this species, including its gigantism and key adaptations to deep-sea environments.
A description of all resources used, food source for marine mammals, birds and for many fish species. They are also increasingly important as 145 a high-protein food source for humans and are a growing target for commercial fisheries and farming [6]. 146 Cephalopods show a wide variety of morphologies, lifestyles and behaviours [7], but with the exception 147 of the nautiluses they are characterised by having rapid growth and short lifespans, despite a considerable 148 investment in costly sensory adaptations [2]. They range in size from the tiny pygmy squids (~2cm) to 149 animals that are nearly 3 orders of magnitude larger, such as the giant squid, Architeuthis dux (at least 10-150 12m and reported up to 20m total length) [6,8,9], to the colossal squid, Mesonychoteuthis hamiltoni 151 (maximum length remains unclear, but a recorded weight of 500kg makes it the largest known 152 invertebrate [10]). A sophisticated adaptive body patterning system that can rapidly alter the texture, 153 pattern, colour and brightness of its skin, facilitates a complex communication system, while also 154 providing exceptional camouflage and mimicry [11]. Together these allow cephalopods to both avoid 155 predators, and hunt prey highly efficiently, making them some of the top predators in the ocean. The 156 remarkable adaptations of cephalopods also extend to their genome, with recent work demonstrating 7 increased levels of RNA editing to diversify proteins involved in neural functions [12]. 158 Over recent years, oceanic warming and acidification, pollution, expanding hypoxia and fishing [13-15] 159 have been shown to affect cephalopod populations. Depletion due to high rates of cephalopod by-catch 160 in commercial fisheries can also result in regional extinction [16]. Mercury has been found in high High-molecular-weight genomic DNA was extracted from a Architeuthis dux (NCBI taxon id: 256136) 171 sample using a CTAB based buffer followed by organic solvent purification, following Winkelmann et al 172 [19] (details in the Supplementary Information). We generated 116 Gb of raw reads from Illumina short-173 insert libraries, 76 Gb of paired-end reads from libraries ranging from 500 bp to 800 bp in insert size, and 174 5.4 Gb of mate-pair with a 5 kb insert (Table S1). Furthermore, we generated 3.  Table S2). 187 Transcriptome sequencing and de novo assembly 188 Given the extreme rarity of live giant squid sightings, we were unable to collect fresh organ samples 189 (following the recommendations in [24]) containing intact RNA from the species to assist with the 190 genome annotation. As an alternative, we extracted total RNA from gonad, liver and brain tissue from

Protein extraction, separation by 1D SDS-PAGE, MALDI-TOF/TOF and Protein Identification 206
Given the practical impossibility of obtaining RNA from a giant squid specimen, we produced a library of 207 giant squid peptide sequences to guide the gene annotation process. 208 Proteins were solubilised from a giant squid mantle tissue sample according to the procedure described 209 initially also used as evidence, but these were later omitted due to low coverage. Evidence from 257 predicted repeat locations was used to discourage the model to predict genes overlapping repeats. Repetitive elements were identified using a bespoke pipeline. Firstly, elements were identified using 277 12 ignored (-nolow) and a sensitive (-s) search was performed. Following this, a de novo repeat library was 279 constructed using RepeatModeler v. 1  Again, many of these had relatively large copy numbers (summarised in Table 1 Figure 1 Click here to download Figure Figure_1.pdf Click here to download Figure Figure_2.pdf  We herewith submit our manuscript 'A draft genome sequence of the elusive giant squid, Architeuthis dux' as Data Note for your formal consideration as a publication in GigaScience. We present a draft genome assembly with a scaffold N50 of 4.8 Mb (estimated genome size of 2.7 Gb) produced using Illumina, Moleculo and Chicago libraries. We also provide the corresponding gene, RNAs and transposable element annotations, as well as the results of a comparative genomics analyses with other available cephalopod genomes.
Besides providing the community with an important resource for further studying this enigmatic animal, given the paucity of available cephalopod genomes, this is a valuable contribution to the genomic description of cephalopods, and therefore we believe it has the potential to be published in GigaScience.
The sequence data and annotations have been submitted to the NCBI database as Bioproject PRJNA534469, which will be made available upon request from your journal.