Genome sequence of the barred knifejaw Oplegnathus fasciatus (Temminck & Schlegel, 1844): the first chromosome-level draft genome in the family Oplegnathidae

Abstract Background The barred knifejaw (Oplegnathus fasciatus), a member of the Oplegnathidae family of the Centrarchiformes, is a commercially important rocky reef fish native to East Asia. Oplegnathus fasciatus has become an important fishery resource for offshore cage aquaculture and fish stocking of marine ranching in China, Japan, and Korea. Recently, sexual dimorphism in growth with neo-sex chromosome and widespread biotic diseases in O. fasciatus have been increasing concern in the industry. However, adequate genome resources for gaining insight into sex-determining mechanisms and establishing genetically resistant breeding systems for O. fasciatus are lacking. Here, we analyzed the entire genome of a female O. fasciatus fish using long-read sequencing and Hi-C data to generate chromosome-length scaffolds and a highly contiguous genome assembly. Findings We assembled the O. fasciatus genome with a total of 245.0 Gb of raw reads that were generated using both Pacific Bioscience (PacBio) Sequel and Illumina HiSeq 2000 platforms. The final draft genome assembly was approximately 778.7 Mb, which reached a high level of continuity with a contig N50 of 2.1 Mb. The genome size was consistent with the estimated genome size (777.5 Mb) based on k-mer analysis. We combined Hi-C data with a draft genome assembly to generate chromosome-length scaffolds. Twenty-four scaffolds corresponding to the 24 chromosomes were assembled to a final size of 768.8 Mb with a contig N50 of 2.1 Mb and a scaffold N50 of 33.5 Mb using 1,372 contigs. The identified repeat sequences accounted for 33.9% of the entire genome, and 24 003 protein-coding genes with an average of 10.1 exons per gene were annotated using de novo methods, with RNA sequencing data and homologies to other teleosts. According to phylogenetic analysis using protein-coding genes, O. fasciatus is closely related to Larimichthys crocea, with O. fasciatus diverging from their common ancestor approximately 70.5–88.5 million years ago. Conclusions We generated a high-quality draft genome for O. fasciatus using long-read PacBio sequencing technology, which represents the first chromosome-level reference genome for Oplegnathidae species. Assembly of this genome assists research into fish sex-determining mechanisms and can serve as a resource for accelerating genome-assisted improvements in resistant breeding systems.

fasciatus has been received increasing concern. However, adequate genome resources for gaining insight into sex-determining mechanisms and establishing genetically resistant breeding systems for O. fasciatus are lacking. Here, we analysed the entire genome of a female O. fasciatus fish using long-read sequencing and Hi-C data to generate chromosome-length scaffolds and a highly contiguous genome assembly.

Findings
We assembled the O. fasciatus genome with a total of 245.0 Gb of raw reads that were generated using both of PacBio Sequel and Illumina HiSeq 2000 platforms. The final draft genome assembly was approximately 778.7 Mb, which reached a high level of continuity with a contig N50 of 2.1 Mb. The genome size was consistent with the estimated genome size (777.5 Mb) based on k-mer analysis. We combined Hi-C data with a draft genome assembly to generate chromosome-length scaffolds. Twenty-four scaffolds corresponding to the twenty-four chromosomes were assembled to a final size of 768.8 Mb with a contig N50 of 2.1 Mb and a scaffold N50 of 33.5 Mb using 1,372 contigs. The identified repeat sequences accounted for 33.9% of the entire genome, and 24,003 protein-coding genes with an average of 10.1 exons per gene were annotated using de novo methods, with RNA-seq data and homologies to other teleosts. According to phylogenetic analysis using protein-coding genes, O. fasciatus is closely related to Larimichthys crocea, with O. fasciatus diverging from their common ancestor approximately 70.5-88.5 million years ago.

Conclusions
We generated a high-quality draft genome with chromosome assembly for O. fasciatus using long reads by using the PacBio sequencing technologies, which represents the first chromosome-level reference genome for Oplegnathidae species. Assembly of this genome will provide insight into sex-determining mechanisms and serve as a resource for accelerating genome-assisted improvements in resistant breeding systems.
My previous comments 3/4, on the k-mer distribution -now at lines 112: this is still not very clear. I understand that the repeat content is based on fitting a model to the distribution. I do not fully agree that the peak labeled as repeated k-mers should be identified with generic repeat content, I think these are very clearly duplications (which are, of course, technically repeat content). I would suggest to clarify the genome size calculation itself, which is now incorrect (line 112): 8.09 x10^10 / 100 = 777.5 Mb. Reply: We agreed with the reviewer's comment on that the peak labeled as repeated k-mers should be identified as generic repeat content. Strictly speaking, the majority of k-mers after the 1.8 times larger than the main depth (100 in our case) were most likely from the repeated regions, including the duplications that mentioned in the comment. That is also the way we estimated the repeat ratio of the genome. We are sorry that the method for the genome size estimation was not clear enough. To clarify the method, the following formula were used : genome size = (Nk-mer -Nerror_k-mer) / D, where G is genome size, Nk-mer is the number of k-mers, Nerror_kmer is the number of k-mers with the depth of 1, and D is the k-mer depth. The number of k-mers with depth of 1 were eliminated since k-mers with low depth were likely from the sequencing errors. As a result, the genome size was estimated as 777.5Mb. We have revised the description of genome size estimation method in the manuscript.
Line 132, 'complexity ... such as heterozygosity': This does not fit the very low heterozygosity levels just identified from the k-mer profile. Possibly structural variants instead of SNPs? I don't think the high duplication levels can explain this? Reply: We agreed with the reviewer's comment on that genome complexity derived from the structural variants might also increase size of the genome assembly. So we revised the sentence as "The genome complexity, such as structural variants and heterozygosity might be possible reasons to explain the relative large genome size in the assembly." Line 162: 'filter all base sequences than 500 bp': more than 500 bp? Less than 500 bp? Reply: We would like to give sincere thanks to reviewer's suggestions. We revised "filter all base sequences than 500 bp" as "filter all base sequences more than 500 bp" There is a lot of redundancy between tables 1 & 3, I would suggest either merging these or moving the finer details of the assembly to table 3 (and keep table 1 as an  overview of the final results, just N50/genome size/coverage). Reply: Thanks a lot for the reviewer's suggestion. We have merged the Table 3 to  Table 1 to eliminate the information redundancy.