Chromosome-level genome of the three-spot damselfish, Dascyllus trimaculatus

Abstract Damselfishes (Family: Pomacentridae) are a group of ecologically important, primarily coral reef fishes that include over 400 species. Damselfishes have been used as model organisms to study recruitment (anemonefishes), the effects of ocean acidification (spiny damselfish), population structure, and speciation (Dascyllus). The genus Dascyllus includes a group of small-bodied species, and a complex of relatively larger bodied species, the Dascyllus trimaculatus species complex that is comprised of several species including D. trimaculatus itself. The three-spot damselfish, D. trimaculatus, is a widespread and common coral reef fish species found across the tropical Indo-Pacific. Here, we present the first-genome assembly of this species. This assembly contains 910 Mb, 90% of the bases are in 24 chromosome-scale scaffolds, and the Benchmarking Universal Single-Copy Orthologs score of the assembly is 97.9%. Our findings confirm previous reports of a karyotype of 2n = 47 in D. trimaculatus in which one parent contributes 24 chromosomes and the other 23. We find evidence that this karyotype is the result of a heterozygous Robertsonian fusion. We also find that the D. trimaculatus chromosomes are each homologous with single chromosomes of the closely related clownfish species, Amphiprion percula. This assembly will be a valuable resource in the population genomics and conservation of Damselfishes, and continued studies of the karyotypic diversity in this clade.


Introduction
Damselfishes (Pomacentridae) are a group of small-bodied species found across all coral reef regions and most temperate marine systems where they are often the most visibly abundant fishes on the reef (Hiatt and Strasburg 1960;Allen 1991;Allen and Werner 2002;Bellwood and Wainwright 2002). This family includes more than 400 species that, despite their small size (max 30 cm), play important ecological roles (Allen 1991; Tang et al. 2021). Within this large family, the genus Dascyllus comprises 11 species, four of which, make up the Dascyllus trimaculatus species complex. This species complex includes three described species with restricted geographic ranges, D. albisella in the Hawaiian Islands, D. strasburgi in the Marquesas Islands, and D. auripinnis in the Line Islands. In contrast, D. trimaculatus has the broadest range, extending from the Red Sea, where it was first described (Rüppell,1828), across the tropical and subtropical Indo-Pacific (Fig. 1).
Three-spot damselfish is an abundant and common species, which exhibits a typical bipartite life history, with a site-attached adult phase, where mate pairs lay, fertilize, and care for demersal eggs, followed by a pelagic larval phase. Larvae hatch after ∼6 days and feed in the water column on zooplankton where their pelagic larval duration lasts 23-30 days until they recruit back to the reef (Wellington and Victor 1989;Robitzch et al. 2016). Larvae settle primarily into anemones for protection often sharing this shelter with different species of the popular anemonefish (in Hawai'i, where anemonefishes and anemones are absent, D. albisella recruits to branching coral). As subadults, they leave the anemone and live nearby in small to large groups.
There has also been considerable effort in understanding the chromosomes architecture and variation of Dascyllus and other damselfishes. Chromosome number varies between species of Dascyllus as well as within species (Ojima and Kashiwagi 1981;Kashiwagi et al. 2005;Getlekha et al. 2017) giving insight into chromosomal drivers of evolution (Galetti et al. 2000;Hardie and Hebert 2004;Molina and Galetti 2004) and how this variation is manifested ecologically (Molina and Galetti 2004;Martinez et al. 2015). As we shift into the age of genomic natural history where genomic tools offer vastly more detail and statistical power, a reference genome will aid in further refining our understanding of wildlife biology (Hotaling et al. 2021). There are currently 14 Pomacentrid reference genomes, five of which are publicly available through the National Center for Biotechnology Information (NCBI; https://www.ncbi.nlm. nih.gov/) (A. ocellaris, (Tan et al. 2018); Acanthochromis polyacanthus, (Schunter et al. 2016); Amphiprion percula, (Lehmann et al. 2019); Amphiprion ocellaris (Ryu et al. 2022); Acanthochromis polyacanthus, Lehmann in review), and another nine from a single study (Marcionetti et al. 2019). Of these, only one is of species other than genus Amphiprion and only three of those listed above (A. ocellaris, A. percula, and A. polyacanthus) are chromosome-scale genomes. Of the Pomacentrid chromosome-scale genomes, all had 2n = 48, with genome sizes ranging between 863 and 956 Mb. The two published genomes, A. ocellaris (Ryu et al. 2022) and A. percula, were highly complete with published Benchmarking Universal Single-Copy Orthologs (BUSCO) values of 97.01 and 97.2%, respectively. Chromosome-scale genomes provide a more complete sequence and locations of genes and allow for research into how chromosome architecture influences ecology, population dynamics, and adaptive evolution. Here, we present the first-genome assembly within the genus Dascyllus and add to the short, but growing list of Pomacentrid chromosome-scale genomes.

Biological materials
The D. trimaculatus individual used for this genome assembly was ordered from an online pet fish supplier (liveaquaria.com), sourced from the West Pacific Rim population (Limon et al. 2023). It was euthanized following an approved IACUC protocol animal use. Liver, muscle, gill, and brain tissue were harvested from the right side of the individual and each placed in separate, preweighed Covaris cryogenic vials, flash frozen in liquid nitrogen, and stored at −80°C until further processing. The remaining intact left side of the specimen is stored in −80°C at University of California Santa Cruz. Dascyllus trimaculatus exhibit nonfunctional protogyny (Asoh and Kasuya 2002), and this individual was determined to be male based on presence of testis.

Nucleic acid library preparation and sequencing
Whole-genome shotgun library preparation DNA was extracted from 13 mg of muscle tissue using a DNeasy Blood and Tissue kit (Qiagen), quantified using Qubit dsDNA HS Assay kit (Thermo Fisher Scientific) and Qubit 4.0 Fluorometer, then assayed with 1.0% agarose gel electrophoresis to determine molecular weight. DNA was sheared for 26 cycles of shearing (15 seconds on, 30 seconds chilling) using a Bioruptor sonicator (Diagenode), then size selected using SPRI beads (Beckman) to select for fragments between 200 and 500 bp.
The NEBNext UltraII DNA Library Prep Kit for Illumina (New England Bio Labs) was used according to manufacturer's protocol except that KAPA Hot Mix Ready Start Master Mix (Roche Diagnostics) was used for library amplification instead of NEB Q5 Master Mix. Paired-end sequencing was done at the University of California Davis Genome Center on a HiSeq4000 sequencer on a 2 × 150PE cycle.

Chicago library preparation
High molecular weight (HMW) DNA was isolated from the Dascyllus trimaculatus individual by lysing gill tissue in low-EDTA TE buffer (Dawson et al. 1998), then purifying with a chloroform, phenol:chloroform, chloroform and ethanol precipitation protocol (Sambrook and Russell 2006). The quality of the HMW DNA was assayed with 1.0% agarose gel electrophoresis. This DNA was used in the preparation of the Chicago, Hi-C, and for Oxford Nanopore Technologies sequencing libraries.
From this DNA, three Chicago libraries were prepared using a published method (Putnam et al. 2016), each using a different restriction enzyme: one with DpnII cutting at GATC sites, one with MluCI cutting at AATT sites, and one with FatI cutting at CATG sites. These libraries were sequenced on a 2 × 150PE cycle at Fulgent Genetics on a HiSeq400 sequencer.

Hi-C library preparation
Two Hi-C libraries were generated from approximately 100 ng of LN 2 -flash-frozen muscle. The libraries were constructed using a published protocol (Adams et al. 2020). One library was constructed using the enzyme DpnII, and the other library was constructed with the enzyme MluCI.

Oxford nanopore library
Next, 1500 ng of the HMW DNA prepared for Chicago libraries was also used to prepare two Oxford nanopore library (ONT) WGS libraries with the SQK-LSK109 modified protocol "versionGDE_ 9063_v109_revT_14Aug2019". The DNA repair steps at 20°C and 65°C were carried out for 20 minutes each, instead of 5 minutes each. We ran each of the resulting libraries on two separate MinION flow cells (FLO-MIN106), each for 72 hours. Raw fast5 files from the two MinION runs were basecalled using Guppy ("Guppy Basecalling Software" 2019) v3.3.
A summary of sequencing information for the various libraries can be found in Supplementary Table 1.

Genome assembly
All programs and versions used for the assembly are listed in Table 1.
Sequencing adapters were removed from the Illumina wholegenome shotgun ( (Zimin et al. 2013;Jiang et al. 2019;Wang et al. 2020) was used to assemble the first version of the genome using both the ONT and WGS reads.
We followed the Arima-HiC mapping pipeline (https://github. com/ArimaGenomics/mapping_pipeline/blob/master/Arima_ Mapping_UserGuide_A160156_v02.pdf) to prepare the data for scaffolding. The pipeline aligns the sequencing data from each the Hi-C and Chicago dataset against the assembly from MaSuRCA, it then filters ligation adapters and removes PCR duplicates from the resulting alignments. These alignments were then processed with samtools Danecek et al. 2021) v1.13 and converted into BED files with bedtools (Quinlan et al. 2010) v2.30.
The MaSuRCA assembly was scaffolded with SALSA (Ghurye et al. 2019) v2.3 with ligation junction parameter -e AATT, GATC, CATG. Iteration number was set to 10 (-i 10) and we allowed for Hi-C/Chicago data to also correct assembly errors (-m yes).
We aligned the trimmed Illumina WGS reads to the scaffolded output of SALSA with bwa mem Durbin 2009) v0.7.17-r1188 and used that alignment to polish the assembly with Pilon (Walker et al. 2014) v1.23. We repeated the alignment and polishing steps once. The error-corrected assembly was then screened for possible contaminants, using Blobtools2 (Laetsch and Blaxter 2017) v3.1.0. Any contigs assigned to phyla other than Chordata were removed. However, any sequences categorized as "No hits' were kept. The assembly was then manually curated by mapping the DpnII and MluCI Hi-C reads to the genome assembly with chromap (Zhang et al. 2021) v0.2.2 with a quality filter of 0 and converted to a .hic file with Juicebox Assembly Tools (JBAT) (Durand et al. 2016) v2.14.00. Artisanal tools commit 9a79889 (https://bitbucket.org/bredeson/artisanal) was used to generate a JBAT assembly file. We used the Juicebox GUI (Dudchenko et al. 2018) v1.11.08 to manually curate the assembly with the .hic and .assembly files. Modifications made to the assembly included ordering and orienting scaffolds into chromosome-scale scaffolds, removing duplicated regions, and making manual assembly breaks to place misassembled contig pieces onto the correct scaffold. Artisanal was used to generate an updated genome assembly FASTA file. Scaffolds not placed on chromosomes were sorted by the strongest Hi-C connection to chromosome-scale scaffolds with genome assembly tools commit b0cda60 (https://github.com/conchoecia/genome_assembly_pipelines. D-Genies (Cabanettes and Klopp 2018), accessed 2022 April 30, was used to align the manually curated assembly to the chromosome-scale assembly of the closely related Amphiprion percula genome assembly (Lehmann et al. 2019). The evidence from this analysis was used to assign chromosome numbers to the D. trimaculatus scaffolds based on homology with A. percula chromosomes. BUSCO (Simão et al. 2015;Waterhouse et al. 2018) v5.2.2 was used to evaluate genome completeness by comparing number of orthologous genes found in the assembly to the 3,640 genes in the acti-nopterygii_odb10 database. Assembly statistics (assembly-stats; https://github.com/sanger-pathogens/assembly-stats) were generated to track N50, L50, contigs, gaps, and lengths at each step. We used merqury (Rhie et al. 2020) v1.3, to calculate the genome completeness and error rates.

Sequencing
We sequenced four library types: a WGS library which resulted in 314.6 Mb paired-end 150 bp reads, representing 103x coverage, and 3.52 M (4.84 Gb) and 8.57 M (19.77 Gb) ONT reads from the   of 154x. In total, across all data types, we had a final coverage of 280x (See Supplementary Table 1 for sequencing details).

Heterozygosity and repetitive sequence estimation
GenomeScope estimated the genome size to be 809 Mb, with 84% unique and 16% repetitive sequences, and 1.02% heterozygosity ( Supplementary Fig. 1).

Genome assembly
Genome quality metrics for each step of the assembly are listed in Table 2. The initial de novo assembly by MaSuRCA with ONT and Illumina shotgun data had a total length of 919,275,268 bp in 3,501 contigs with an N50 of 1,108 Kb. Scaffolding with the HiC and Chicago libraries dropped the number of contigs to 2,467 and increased N50 to 16,013 Kb. After two rounds of polishing with trimmed Illumina shotgun reads gaps decreased from 1,097 to 1,088. Blobtools2 showed that of the 2,467 contigs, none matched other taxa in NCBI databases of bacteria, invertebrates, mammals, phages, plants, and fungi, or environmental samples. Four hundred seventy-eight contigs did not match any databases (nohits) and were left in the genome.
The manual curation of the genome assembly yielded 24 scaffolds consistent with chromosome-scale scaffolds (Fig. 2). A dotplot comparison (Fig. 3) with the Amphiprion percula (Lehmann et al. 2019) genome revealed that each of the D. trimaculatus chromosome-scale scaffolds had a one-to-one corresponding homologous, albeit rearranged, chromosome in the Amphiprion percula genome.
The final assembly (GenBank accession: JAMOIN000000000) has a length of 910.7 Mb, 90% of which was on chromosome-scale scaffolds and BUSCO score of 97.9%. Merqury calculated 86.19% completeness, QV of 44.6, and an estimated error rate 0.0000346, or a single nucleotide error every 28.9 Kb.

Discussion
The biology, evolution, and biogeography of the three-spot damselfish is relatively well studied using genetic (Bernardi and Crane 1999;Bernardi et al. 2001;McCafferty et al. 2002;Leray et al. 2009Leray et al. , 2010Liggins et al. 2016;Getlekha et al. 2017;Crandall et al. 2019) and genomic tools (Salas et al. 2019(Salas et al. , 2020 and, as we shift further into the age of WGS data and tools, a reference genome is an invaluable resource. Here, we present the chromosomescale genome assembly of a three-spot damselfish, Dascyllus trimaculatus, collected from the Indonesian/Philippine population (Limon et al. in review). It is the first within the genus Dascyllus of the widely studied, and large Pomacentridae family. This highquality de novo assembly of a nonmodel coral reef fish is a valuable reference for furthering studies of evolutionary, ecological, and conservation studies for the species and for coral reef fish in general.
We report sequences for 24 chromosomes of the D. trimaculatus genome with total length and repetitive content (Fig. 2a,  Supplementary Fig. 1) that is expected for this species (Arai 1976;Getlekha et al. 2017;Yuan et al. 2018). Interestingly, our Hi-C data also show that chromosomes three and four have strong connections at half the depth of other intra-chromosomal connections (Fig. 2b). This pattern can be explained by a hemizygous state wherein one parental gamete contributed a Robertsonian fusion of chromosomes three and four, and the other parental gamete contributed chromosomes three and four as separate chromosomes making the individual sequenced here, a 2n = 47 individual. This finding is consistent with previous studies that report both 2n = 47 and 2n = 48 for Dascyllus trimaculatus (Arai 1976;Ojima and Kashiwagi 1981;Kashiwagi et al. 2005). Chromosome numbers vary both within and among species of Dascyllus. One report on several Dascyllus species collected in the Philippines and the Ryukyu Archipelago of southern Japan demonstrated polymorphic karyotypes in all but one of the species (Ojima and Kashiwagi, 1981). Dascyllus aruanus had the most karyotypic variation-between 2n = 27-33 chromosomes, D. reticulatus 2n = 34-37, D. trimaculatus 2n = 47-48, and D. melanurus with 2n = 48.
In addition to confirming variation in chromosome number, the dot-plot comparison between this genome and of the closest relative with an available chromosome-scale assembly, Amphiprion percula (Lehmann et al. 2019), revealed several rearrangements in every chromosome between corresponding chromosomes (Fig. 3). The Pomacentrid subfamilies Chrominae and Amphiprionini are estimated to have diverged over 50 million years ago (mya) (McCord et al. 2021). The estimated number of rearrangements within chromosomes ranged from 2+ in chromosome 7 of D. trimaculatus which was the most like its counterpart in A. percula to over 35 in chromosome 24 (Fig. 3). This pattern of rearrangements has not been characterized between chromosome-scale genome assemblies of Pomacentridae. The role of variation in chromosome number has been the subject of several cytogenic studies which have found that chromosome diversity inversely related to mobility of the fish and that chromosome rearrangements can serve to either promote or prevent recombination events (Galetti et al. 2000;Molina and Galetti 2004;Kirkpatrick and Barton 2006;Martinez et al. 2015). Interestingly, chromosome 3 in the genome presented in this paper is one of the most rearranged while also being one of the chromosomes involved in the Robertsonian fusion mentioned above. This assembly will be a useful starting point to study how this type of genome structure varies at a meta-population scale, and how this influences recombination and adaptation.
This assembly represents the first chromosome-level genome of the genus Dascyllus as well as the first nonAmphiprion chromosome-scale genome published in the Pomacentridae family. Damselfishes are excellent model species due to their relatively small size, ease to manage in the wild and lab, and those interested in this group will benefit from this addition to the available genomic resources. Dascyllus trimaculatus itself, is has had a dynamic evolutionary trajectory across the Indo-Pacific, evident in species complex that is continuing to reveal its complexity and provide insight into evolutionary mechanisms. In addition to providing a high-quality reference genome to further our understanding of genomic architecture, this assembly will serve to leverage information stored across the genome to better understand the population dynamics, phylogeny, biogeography, demographics, of Dascyllus trimaculatus, as well as gain insight into historical, current, and future response to changes in climate.

Data availability
The assembly and genomic sequencing reads generated for this study have all been deposited in the NCBI GenBank database under BioProject ID PRJNA828170. The accession for the genome is JAMOIN000000000, WGS data (SRX17663068), proximity ligation data (SRX17663069 -SRX17663073), and ONT data (SRX17742644, SRX177426445).
Supplemental material available at G3 online.