Chromosome-level reference genome of the Siamese fighting fish Betta splendens, a model species for the study of aggression

Abstract Background Siamese fighting fish Betta splendens are notorious for their aggressiveness and accordingly have been widely used to study aggression. However, the lack of a reference genome has, to date, limited the understanding of the genetic basis of aggression in this species. Here, we present the first reference genome assembly of the Siamese fighting fish. Findings Frist, we sequenced and de novo assembled a 465.24-Mb genome for the B. splendens variety Giant, with a weighted average (N50) scaffold size of 949.03 Kb and an N50 contig size of 19.01 Kb, covering 99.93% of the estimated genome size. To obtain a chromosome-level genome assembly, we constructed one Hi-C library and sequenced 75.24 Gb reads using the BGISEQ-500 platform. We anchored approximately 93% of the scaffold sequences into 21 chromosomes and evaluated the quality of our assembly using the high-contact frequency heat map and Benchmarking Universal Single-Copy Orthologs. We also performed comparative chromosome analyses between Oryzias latipes and B. splendens, revealing a chromosome conservation evolution in B. splendens. We predicted  23,981 genes assisted by RNA-sequencing data generated from brain, liver, muscle, and heart tissues of Giant and annotated 15% repetitive sequences in the genome. Additionally, we resequenced five other B. splendens varieties and detected ∼3.4 M single-nucleotide variations and  27,305 insertions and deletions. Conclusions We provide the first chromosome-level genome for the Siamese fighting fish. The genome will lay a valuable foundation for future research on aggression in B. splendens.


Data Description
Males of the Siamese fighting fish Betta splendens are notorious for their aggressiveness.In nature, males establish and vigorously defend territories where they construct a bubble nest to hold fertilized eggs.In laboratory settings, males will readily attack an opponent, their mirror image, physical models of conspecifics or video images of other males, and accordingly the species has been widely used to study the neurobiological mechanisms of aggression.However, the lack of a reference genome limited so far studies on the genetic basis of aggression in B. Downloaded from https://academic.oup.com/gigascience/advance-article-abstract/doi/10.1093/gigascience/giy087/5054042 by guest on 12 October 2018 splendens.The species is also one of the most relevant for the ornamental fish trade as it is easy to keep and reproduce in captivity and throughout its long domestication period many varieties have been selected for their exuberant fins and colors, size or aggressive behavior.Here, we sequenced the genome of B. splendens to provide the genomic foundation for future research on aggression and development of genomic tools.

Sampling and sequencing
We purchased five different varieties of adult male Siamese fighting fish including Giant, Halfmoon (HM), Half-moon plakat (HMPK), Fighter, and Elephant Ear (EE) from Hong Kong supplier TC Northern Betta for DNA and RNA extraction 1,2 (Supplementary Fig. 1).We constructed and sequenced six DNA libraries for the B. splendens variety Giant, including three short insert size libraries and three mate-pair libraries (Supplementary Table 1), and five RNA-seq libraries (Supplementary Table 2) using the HiSeq 2000 sequencing platform.One Hi-C library for Giant was also constructed and sequenced using the BGISEQ-500 sequencing platform, yielding 75.24 Gb of reads.Additionally, we sequenced four short insert size DNA libraries for the other four B. splendens varieties.

Genome assembly
We obtained 52.34 Gb of clean reads using SOAPnuke, version 1.5.3 (SOAPnuke, RRID:SCR_015025) 3 , with strict parameters, including removal of low-quality reads, adapter contamination and PCR duplicates.Then, we performed the de novo assembly of the Giant reads using the SOAPdenovo2, version 2.04 (SOAPdenovo2, RRID:SCR_014986) 4 , assembler.For the genome assembly, the short insert size libraries were used to construct the contig sequences and the mate-paired libraries were used to link the scaffolds.We filled the gaps within the scaffolds using GapCloser, version 1.12 (GapCloser, RRID:SCR_015026).We obtained a genome assembly with a size of 465.24   3 and Supplementary Fig. 2).To construct the reference genome at the chromosome-level, we used a MBOI endonuclease to cut the DNA, and constructed a Hi-C library based on a previous protocol 5 .We sequenced 75.24Gb of data using the BGISEQ-500 sequencing platform, and obtained 34.5Gb valid reads (~45.8%) that could be used to anchor the scaffolds into chromosomes after quality control using the HiC-Pro, version 2.8.0, pipeline 6,7 (Supplementary Fig. 3-7).Lastly, we constructed 21 chromosomes that occupied 95.3% of the genome (Fig. 1, Table 1 and Supplementary Table 4) using Juicer 8 , version 1.5, and 3D-dna, version 170123, pipeline 9 based on the draft genome assembly.To evaluate the quality of the assembly, we found 95.4% of BUSCO version 3.0.1 (BUSCO, RRID:SCR 015008) genes that could be completely covered by our genome (Table 2) and approximately 98% of the transcripts assembled from RNA-seq data could be aligned against the genome with more than 90% coverage (Supplementary Table 5).

Genome annotation
We annotated the repetitive sequences by combining de novo and homolog-based approaches 10 .

Comparative genomic analysis
We compared the fighting fish genome with other species using Lastz , version 1.02.00, both at the whole genome-and gene-level.All of the 21 chromosomes assembled for the fighting fish could be matched to chromosomes of Oryzias latipes with a mean coverage ratio of 75.3%.From these, 18 chromosomes had a single hit to one chromosome of O. latipes, and 3 chromosomes (1, 19 and 21) had a hit in two chromosomes of O. latipes (Fig. 2 and Supplementary Table 8), indicating conservative evolution for most of chromosomes, as well as several chromosome reshuffling events between these two species.Furthermore, from the gene set level, KO (KEGG Orthology) terms of animals from 109 different species were counted and compared with the fighting fish gene set using the KEGG database 21 , version 79.There were five KO terms notably expanded in fighting fish compared with all other animals, including 147 NACHT, LRR and PYD domains-containing protein 3 (NLRP3, K12800), 86 tripartite motif-containing protein 47 (TRIM47, K12023), 43 chloride channel 7 (CLCN7, K05016), 29 arginine vasopressin receptor 2 (AVPR2, K04228) and 17 maltase-glucoamylase (MGAM, K12047) (Fig. 3).NLRP3 has two prominent expansions, corresponding to clade 1, containing 56 genes, and clade2, containing 79 genes, whereas other fish species in these two clades have less than three gene copies (Fig. 4).
NLRP3 encodes a pyrin-like protein containing a pyrin domain, a nucleotide-binding site (NBS) domain, and a leucine-rich repeat (LRR) motif, and plays a role in the regulation of inflammation, the immune response, and apoptosis 22 .

Resequencing
We found through Mirror-Image Stimulation (MIS) test that the different varieties of the Siamese fighting are different in aggressiveness.Males of B. splendens were tested under a Downloaded from https://academic.oup.com/gigascience/advance-article-abstract/doi/10.1093/gigascience/giy087/5054042 by guest on 12 October 2018 standardized mirror-elicited aggression paradigm as this elicits similar aggression levels to those of a real conspecific.One fighting fish was located into the testing tank (30 x 19 x 23 cm) and left undisturbed for 30 min for acclimation.Then, the swimming behavior was recorded by taking 5min video by a side digital camera and the swimming track was recorded by Viewpoint ZebraLab Tracking System for 5 min.This represented the control state.After that, a mirror of similar size with the side wall was placed into the tank to induce aggression of the fish by its own mirror image.Aggression of fighting fish was observed through the following behaviors: opecular flare, fin spreading, 90º turn and mirror hit.As expected, the mirror image elicited a high frequency of aggressive displays.Fish spent most time close to the mirror side and increased overall swimming distance as compared to controls.Within the all tested varieties, Giant had overall the highest frequency of aggressive displays and HM the lowest (Supplementary Fig. 8).
To evaluate the genetic diversity among the four varieties of Betta splendens, we called the SNVs (single-nucleotide variations) and Indels (insertions and deletions) based on the read alignment result using Giant assembly as a reference.We obtained 70.25 Gb of clean reads filtered from 79.18 Gb of raw reads (Supplementary Table 9).We used BWA, version0.6.2 (BWA, RRID:SCR_010910) 23 , to align all the re-sequencing data to the reference genome and the UnifiedGenotyper in Genome Analysis Toolkit , version2.8.1 (GATK , RRID:SCR_001876) 24 , to call variations.In total, we detected approximately 3.4 M SNVs and 27,305 indels, which will provide abundant genetic polymorphism for use in future research and applications.

Fig. 2 .
Fig. 2. Collinear relationship between B. splendens and Oryzias latipes.Green represents the chromosomes of B. splendens and the other multicolor represent the chromosomes of O. latipes.

Fig. 4 .
Fig. 4. The gene phylogenetic tree of NLRP3 gene family (KO: K12800) using the genes of B. splendens and other species.Clade 1 and clade 2 show two prominent expansion sub-families of B. splendens.

Table 2 .
Evaluation results of the genome and gene set using BUSCO.