Genome sequencing of deep-sea hydrothermal vent snails reveals adaptions to extreme environments

Abstract Background The scaly-foot snail (Chrysomallon squamiferum) is highly adapted to deep-sea hydrothermal vents and has drawn much interest since its discovery. However, the limited information on its genome has impeded further related research and understanding of its adaptation to deep-sea hydrothermal vents. Findings Here, we report the whole-genome sequencing and assembly of the scaly-foot snail and another snail (Gigantopelta aegis), which inhabits similar environments. Using Oxford Nanopore Technology, 10X Genomics, and Hi-C technologies, we obtained a chromosome-level genome of C. squamiferum with an N50 size of 20.71 Mb. By constructing a phylogenetic tree, we found that these 2 deep-sea snails evolved independently of other snails. Their divergence from each other occurred ∼66.3 million years ago. Comparative genomic analysis showed that different snails have diverse genome sizes and repeat contents. Deep-sea snails have more DNA transposons and long terminal repeats but fewer long interspersed nuclear elements than other snails. Gene family analysis revealed that deep-sea snails experienced stronger selective pressures than freshwater snails, and gene families related to the nervous system, immune system, metabolism, DNA stability, antioxidation, and biomineralization were significantly expanded in scaly-foot snails. We also found 251 H-2 Class II histocompatibility antigen, A-U α chain-like (H2-Aal) genes, which exist uniquely in the Gigantopelta aegis genome. This finding is important for investigating the evolution of major histocompatibility complex (MHC) genes. Conclusion Our study provides new insights into deep-sea snail genomes and valuable resources for further studies.

other snails. Gene family analysis revealed that deep-sea snails experienced stronger 49 selective pressures than shallow-water snails, and the nervous system, immune system, 50 metabolism, DNA stability, antioxidation and biomineralization-related gene families 51 were significantly expanded in scaly-foot snails. We also found 251 class II 52 histocompatibility antigen H2-Aal, which uniquely exist in the Gigantopelta aegi genome, 53 which is important for investigating the evolution of MHC genes. 54

Conclusion 55
Our study provides new insights into deep-sea snail genomes and valuable resources for 56 further studies. The discovery of deep-sea hydrothermal vents in the late 1970s expanded our 65 knowledge of the extent of life on Earth [1]. Deep-sea macrobenthos, which are animals 66 that inhabit deep-sea hydrothermal vents, face high hydrostatic pressure, variable 67 temperatures and pH, and high levels of hydrogen sulphide, methane, and heavy metals 68 [2]. To date, the literature contains a limited number of studies on the genetics of 69 macrobenthos. A recent report on the genome of deep-sea hydrothermal vent/cold seep 70 mussels (Bathymodiolus platifrons) showed that, while most of the genes present in a 71 related shallow-water mussel (Modiolus philippinarum) have been retained, many gene 72 families have expanded in the B. platifrons genome. These include families that are 73 associated with stabilising protein structures, removing toxic substances from cells, and 74 the immune response to symbionts [3]. 75 Gastropods represent the largest class of the phylum Mollusca, with different 76 estimates of diversity varying from 80,000 to 150,000 species [4]. More than 218 77 gastropod (i.e. snail and slugs) species have been described from chemosynthetic 78 ecosystems (i.e. solely rely on endosymbiotic bacterias for sustenance), of which more 79 than 138 are believed to be endemic to these ecosystems [5]. Gastropods are an important 80 component of the fauna in hydrothermal vents in terms of abundance and biomass [6]. 81 Due to the lack of samples and fossil evidence, studies on the evolution and adaptation of 82 deep sea chemosynthetic gastropods are very limited. The scaly-foot snail (Chrysomallon 83 squamiferum) is only found in hydrothermal vents at a depth of ~3,000-metres in the 84 Indian Ocean. There are two types of varieties without genetic differences: black (due to 85 have not been studied in-depth, repeats have been thought to have a regulatory function 157 in related genes that play an important role in the life cycle. Thus, we might infer that the 158 expansion of DNA transposons and LTRs, as well as the abandonment of some LINEs, 159 may be closely associated with adaptation to extreme environments for deep-sea snails. 160 161

Construction of phylogenetic relationships for deep-sea snails 162
To determine the phylogenetic relationships between deep-sea snails and other 163 molluscs, we compared two mussels, two freshwater snails, and two shallow-water snails. 164 The California two-spot octopus and the freshwater leech Helobdella robusta were used 165 as the outgroup. We identified 26,668 gene families in the ten species examined (Table  166 S13). Phylogenetic trees were constructed from 406 shared single-copy orthologs. Both 167 ML and Bayesian methods revealed the same topology (Figure 2a and Figure S2), which 168 is consistent with a recent study [14]. In the tree, mussels and snails are clearly separated 169 and the two deep-sea snails are located on the same branch and are independent of other 170 snails (although their genome sizes are quite different). We estimated that C. As the speciation of the two deep-sea snails may be related to geological events (see 178 above), we estimated their historical effective population size (Ne) using whole-genome 179 genetic variation. We identified ~3.51 and ~3.19 million heterozygous SNPs with 180 nucleotide diversities of 0.0077 and 0.0025 for C. squamiferum and G. aegis, respectively. 181 We estimated Ne changes using the pairwise sequential Markovian coalescent (PSMC) 182 method, which can infer demography from approximately 20,000 to 1 million years ago 183 (MYA) [18]. The effective population sizes of C. squamiferum and G. aegisspecies 184 derived from different geographical locations in the Indian Oceanare distinct ( Figure  185 2b). The demographic history of G. aegis decreased until ~250 KYA (thousand years 186 ago), followed by an Ne increase, from ~50,000 to 450,000 individuals, 20,000 years ago. The evolution and expression of single-copy orthologous genes are unique features of 196 organisms. To explore the evolutionary rate of single-copy orthologous genes, we 197 calculated the synonymous substitution rate (Ka) and nonsynonymous substitution rate 198 (Ks) values of 1,324 single-copy orthologous genes shared by the two deep-sea snails, 199 one shallow-water snail (L. gigantea), and two freshwater snails (B. glabrata and P. were approximately ~20% and ~40% higher than those of the shallow-water snails (0.11); 208 from this we could infer that deep-sea snails have experienced stronger selective 209 pressures, possibly to allow adaptation to life in hydrothermal vents. 210 211 Expanded gene families in deep-sea snail genomes 212

Nervous system 213
Using CAFÉ [19] (see details in methods), we identified two significantly (p-value < 214 0.01) expanded gene families in the two deep-sea snail genomes compared to the 215 freshwater and shallow-water snails. BTB/POZ domain-containing protein 6 (BTBD6) 216 had 56 copies in C. squamiferum and 35 copies in G. aegis, while fewer than 5 copies 217 were found in the four other snail species examined (Figure 3a). We found 17 BTBD6 genes on chromosome 16 of C. squamiferum, and these genes showed traces of tandem 219 duplications (Figure 3b). In G. aegis, we also found several tandem gene clusters 220   As expected, many of these have roles in the immune system. However, unlike the 252 freshwater snail B. glabrata [13] and deep-sea mussels [3], we did not detect an 253 expansion of the Toll-like receptor 13 (TLR13) gene family, but identified other 254 expanded gene families (Figure 4a). For example, increased expression of thioredoxin 1 255 (Txn1; 22 copies in C. squamiferum) plays a pivotal role in T-cell activation in mice [24]. 256 Although T-cell related adaptive immunity only appears in vertebrates, the existence and 257 expansion of this gene may assist the innate immune system of C. squamiferum. 258 Glutamine-fructose-6-phosphate transaminase (GAFT; 21 copies in C. squamiferum) 259 promotes the biosynthesis of chitin [25,26], which is one of the stable components of the 260 crustacean shell, and provides protection against predation and infection. 261 We identified expanded gene families that maintain the stability of nucleic acids and 262 proteins, such as heat shock protein 90 (Hsp90; 13 copies in C. squamiferum, Figure 4a By constructing a phylogenetic tree, we found that snails diverged from other molluscs 305 approximately 555.2 MYA (Figure 2a). These two deep-sea snails were found to be 306 independent of other shallow-water snails around 536.6 MYA and diverged from each 307 other approximately 66.3 MYA. This evolutionary time frame implies that the last 308 common ancestor of all molluscs (LCAM) already lived before the infamous Cambrian 309 Explosion (530-540 MYA), which was speculated by the palaeobiological hypothesis [36]. It also elucidated that deep-sea gastropod lineages originated at least around 540 311 MYA and diverged from other gastropods in the same age of the oldest molluscs taxons, 312 Aculifera and Conchifera [37,38]. The deep sea gastropod lineages were also confirmed 313 by the phylogenetic analysis of mitogenomes [39]. Further conceived by the evolutionary 314 rate of single-copy orthologous genes, deep sea gastropod lineages have experienced 315 stronger selective pressures than shallow-water snails (Figure 2c). At the end of the 316 Cretaceous geological period, ~66 MYA, C. squamiferum and G. aegis diverged from 317 each other and had different historical effective population sizes (Ne) later (Figure 2b). It also indicated that the evolution of deep sea snail linages depends more on adaptive 328 needs than on a region-specific feature shared between lineages. 329 Specifically, we analysed expanded gene families in deep-sea snail genomes 330 (Figure 4a). They both significantly expanded the nervous system, especially BTB/POZ 331 domain-containing protein 6 (BTBD6) and 5-hydroxytryptamine receptor 4 (HTR4), 332 which are involved in the neuroregulation of activities, such as movement, predation, and 333 resistance to environmental change. As for the chemosynthetic snails, they both expanded 334 immune system-related genes. In the C. squamiferum genome, the expansions of 335 thioredoxin 1 (Txn1) and Glutamine-fructose-6-phosphate transaminase (GAFT) were 336 found. In the G. aegi genome, different immune and disease response genes were 337 expanded; for example, the major histocompatibility complex (MHC) genes, H-2 class II 338 histocompatibility antigen-like (H2-Aal). These expanded gene families were different 339 from fresh water snails and deep sea mussels. To prepare the Chromium library, 1 ng of high quality DNA was denatured, spiked into 377 reaction mix, and mixed with gel beads and emulsification oil to generate droplets within 378 a Chromium Genome chip. Then, the rest of the steps were completed following the 379 standard protocols for performing PCR. After PCR, the standard circularisation step for 380 BGISEQ-500 was carried out, and DNA nanoballs (DNBs) were prepared [44]. Paired-381 end reads with a length of 150 bp were generated on the BGISEQ-500 platform. 382 383 Oxford Nanopore Technologies 384 DNA for long-read sequencing was isolated from the muscle tissues of our samples. 385 Using 5 flow cells of the ONT chemistry for the GridION X5 sequencer following manufacturer's protocols, we generated 39.61 Gbp of raw genome sequencing 387 data. 388 389

Hi-C library and sequencing 390
The Hi-C library was prepared following the standard in situ Hi-C [45] protocol for 391 muscle samples, using DpnII (NEB, Ipswich, America) as the restriction enzyme. After 392 that, a standard circularisation step was carried out, followed by DNA nanoballs (DNB) 393 preparation following the standard protocols of the BGISEQ-500 sequencing platform as 394 previously described [44]. Paired-end reads with a length of 100 bp were generated on 395 the BGISEQ-500 platform. 396 397

Genome assembly 398
For the genome assembly of Chrysomallon squamiferum, Canu (v1.7) was first used to 399 perform corrections of ONT reads with the parameters "correctedErrorRate=0.105 400 corMinCoverage=0 minReadLength=1000 minOverlapLength=800". Then, wtdbg 401 (v1.2.8) was used to assemble the genome with the parameters "--tidy-reads 3000 -k 0 -p 402 21 -S 4 --rescue-low-cov-edges" using corrected reads generated by Canu. Next, we 403 made use of the sequencing reads from the 10X genomic library to carry out genome   Gigantopelta aegis,and 466 Helobdella robusta) were compared using blastp with the E-value threshold set as 1e-7.  The genome assemblies of these two genomes have been deposited in GenBank under the 537 accession number CNP0000854. The raw sequencing reads were also uploaded to the 538 SRA database under accession number CNP0000854.      Click here to download Figure Figure 4.pdf