Chromosome-level assembly of Dictyophora rubrovolvata genome using third-generation DNA sequencing and Hi-C analysis

Abstract Dictyophora rubrovolvata, a rare edible mushroom with both nutritional and medicinal values, was regarded as the “queen of the mushroom” for its attractive appearance. Dictyophora rubrovolvata has been widely cultivated in China in recent years, and many researchers were focusing on its nutrition, culture condition, and artificial cultivation. Due to a lack of genomic information, research on bioactive substances, cross breeding, lignocellulose degradation, and molecular biology is limited. In this study, we report a chromosome-level reference genome of D. rubrovolvata using the PacBio single-molecule real-time-sequencing technique and high-throughput chromosome conformation capture (Hi-C) technologies. A total of 1.83 Gb circular consensus sequencing reads representing ∼983.34 coverage of the D. rubrovolvata genome were generated. The final genome was assembled into 136 contigs with a total length of 32.89 Mb. The scaffold and contig N50 length were 2.71 and 2.48 Mb, respectively. After chromosome-level scaffolding, 11 chromosomes with a total length of 28.24 Mb were constructed. Genome annotation further revealed that 9.86% of the genome was composed of repetitive sequences, and a total of 508 noncoding RNA (rRNA: 329, tRNA: 150, ncRNA: 29) were annotated. In addition, 9,725 protein-coding genes were predicted, among which 8,830 (90.79%) genes were predicted using homology or RNA-seq. Benchmarking Universal Single-Copy Orthologs results further revealed that there were 80.34% complete single-copy fungal orthologs. In this study, a total of 360 genes were annotated as belonging to the carbohydrate-active enzymes family. Further analysis also predicted 425 cytochromes P450 genes, which can be classified into 41 families. This highly accurate, chromosome-level reference genome of D. rubrovolvata will provide essential genomic information for understanding the molecular mechanism in its fruiting body formation during morphological development and facilitate the exploitation of medicinal compounds produced by this mushroom.


Introduction
Dictyophora rubrovolvata, a saprophytic fungus belonging to the family Phallaceae , is a widely artificially cultivated edible mushroom in Southwest China. Dictyophora rubrovolvata is also called "hong tuo zhu sun" (red-volva basket stinkhorn) in Chinese and regarded as the "queen of the mushroom" for its attractive appearance (Liao et al. 2015). Dictyophora rubrovolvata grows on the wet roots of bamboo groves or in the humus of bitter bamboo forests in Guizhou, Yunnan, and Sichuan provinces of China (Hang et al. 2012). It was initially discovered in Yunnan province of China in 1976 and then successfully artificially cultivated in 1983 (Zang and Ji 1985). In the cultivation cycle of D. rubrovolvata, the development process can be divided into 5 major stages: undifferentiated mycelia, primordia, ball-shaped stage, peach-shaped stage, and the mature stage (Wang et al. 2020). The mature fruiting body of D. rubrovolvata possesses a unique appearance, including a red-volva, white stipe, and white net-like veil. Dictyophora rubrovolvata has been widely used as a functional food in daily life in China and Japan for its variety of nutrients, including proteins, amino acids, minerals, vitamins, thiamine, riboflavin, nicotinic acid, and polysaccharides (Deng et al. 2016). Dictyophora rubrovolvata has also been reported to have many biological and pharmacologic activities, such as antiaging and hypoglycemic (Ye, Wen, Peng, et al. 2016), antifatigue and hypoxia endurance (Ye, Wen, Huan, et al. 2016), etc. Up to now, D. rubrovolvata was still considered a rare edible mushroom in China. However, compared with other mushrooms, such as Flammulina velutipes, Pleurotus eryngii, and Hypsizygus marmoreus, its biological and genetic information remains limited, which impedes the breeding of high-quality cultivars.
In recent years, the genomes of many basidiomycetes have been obtained, including Stropharia rugosoannulata , Naematelia aurantialba (Sun et al. 2021), Sparassis latifolia (Xiao et al. 2018), F. velutipes (Park et al. 2014), P. eryngii (Dai et al. 2019), H. marmoreus (Jin Jing et al. 2015), Lentinula edodes (Shim et al. 2016), and Agaricus bisporus (Morin et al. 2012). The availability of these increased genome sequences promotes research on the development and utilization of medical and pharmaceutical products. Hi-C technology provides a unique and powerful tool to study nuclear organization chromosome architecture (Belton et al. 2012). In recent years, the Hi-C technique has been used to assemble high-quality genomes for many mushrooms. An updated draft genome sequence of S. latifolia was generated by Oxford Nanopore sequencing and the Hi-C technique, a 41.41-Mb chromosome-level reference genome of S. latifolia was assembled, and 13,103 protein-coding genes were annotated . Based on the genome assembly obtained from second-and third-generation sequencing and Hi-C data, Ophiocordyceps sinensis strain 1,229 was found to possess 6 chromosomes with a strong telomere interaction between chromosomes (Meng et al. 2021). The Hi-C technique was also used to construct the chromosome-level genome in L. edodes Yu et al. 2022) and Ganoderma lucidum (Wu et al. 2022).
In the present study, we used the PacBio Sequel II platform to sequence the D. rubrovolvata genome, and assembled a highquality chromosome-scale reference genome using the Illumina platform combined with Hi-C scaffolding. The D. rubrovolvata genome sequence will be helpful for understanding the molecular mechanisms and advancing our understanding of its genetics and evolution.

Fungal strains and strain culture
The D. rubrovolvata fruiting body was collected from Fuzhou, Fujian Province, China. The D. rubrovolvata strain Di001 was obtained from the fruiting body by tissue isolation. At present, this strain has been preserved at the Institute of Edible Mushroom, Fujian Academy of Agricultural Sciences. The strain was maintained on potato dextrose peptone agar slants and subcultured every 3 months.

Extraction of genome DNA
To obtain sufficient cell amounts for genetic DNA extraction, D. rubrovolvata Di001 was inoculated into potato dextrose broth medium in a Petri dish for 10-15 days, and then the aerial mycelium was scratched out of the medium by sterilized cover glass. The sodium dodecyl sulfate method was used to extract the genomic DNA . The genomic DNA concentration was determined using the Nanodrop spectrophotometer (Thermo Fisher Scientific, NANODROP2000) and Qubit Fluorometer (Invitrogen, Qubit 3 Fluorometer), and agarose gel electrophoresis was performed to check its integrity.

De novo sequencing
The 15-kb SMRTbell library was constructed using the SMRTbell Express Template Prep Kit (version 2.0). The 350-bp small, fragmented library was constructed using the NEBNext ultra DNA library prep kit. After the library was qualified, the whole genome of D. rubrovolvata Di001 was sequenced using the PacBio Sequel II platform and Illumina NovaSeq PE150 at the Biomarker Technologies Corporation (Beijing, China), and the sequencing results were used for gene annotation.

Genome assembly and assessment
To obtain chromosome-level whole-genome assembly for D. rubrovolvata, we utilized a combined approach of Illumina, PacBio and Hi-C technology for the genome assembly, and chromosomelevel scaffolding.
Regarding the PacBio Sequel II platform, on the basis of removing the low-quality reads (<500 bp) from the raw data, the automatic error correction function of the single-molecule real-time (SMRT) portal software was used to further improve the accuracy of the seed sequences, and finally, the variant caller module of the SMRT link v5.0.1 software was used to correct and count the variant sites in the initial assembly results using the arrow algorithm (Berlin et al. 2015). The ccs (circular consensus sequencing) reads were assembled using Hifiasm v0.12 (https://github.com/ chhylp123/hifiasm; Cheng et al. 2021), Pilon software (Walker et al. 2014) was used to further correct the assembled genome using the second-generation data, and finally, the genomes with higher accuracy was obtained. Regarding the Illumina NavaSeq PE150 platform, the clean reads were mapped to the D. rubrovolvata Di001 genome using Burrows-Wheeler Aligner software under its default parameters.
Benchmarking Universal Single-Copy Orthologs (BUSCO) v 2.0 software was used to assess the completeness of the genome assembly. The lineage data set of BUSCO was fungi_odb9 (number of species: 85; number of BUSCOs: 290).

Hi-C library construction and assembly of the chromosome
Hi-C libraries were prepared as previously reported . Briefly, the sample was fixed with formaldehyde to maintain the 3D structure of DNA in cells and the restriction enzyme Hind III was applied to DNA digestion. Then, biotin-labeled bases were introduced using the DNA terminal repair mechanism. DNA was fragmented by a Covaris S220 focused ultrasonicator, and 300-700 bp fragments were recovered. The DNA fragments containing interaction relationships were captured by streptavidin immunomagnetic beads for library construction. Library concentration and insert size were determined using the Qubit 2.0 and Agilent 2100, respectively, and Q-PCR was used to estimate the effective concentration of the library. High-quality Hi-C libraries were sequenced on the Illumina NovaSeq PE150 sequencing platform, and the sequencing data were used for chromosome-level assembly (He et al. 2022). Hi-C data were filtered and evaluated using HiC-Pro software (Servant et al. 2015), it could identify the valid interaction pairs and invalid interaction pairs in the Hi-C sequencing results by analyzing the comparison results, and realize the quality assessment of the Hi-C libraries. The order and direction of scaffolds/contigs were clustered into super scaffolds using LACHESIS (Burton et al. 2013), based on the relationships among valid reads.

Sequencing and assembly data
The final D. rubrovolvata genome was composed of 136 contigs after genome assembly. The total length of all assembled contigs was 32,887,457 bp with a GC content of 45.16% and an N50 value of 2,480,000 bp. There were 233 complete BUSCOs in the assembled genes of D. rubrovolvata, the complete single-copy BUSCO was 232 (80%), and the complete duplicated BUSCO was 1 (0.34%). The complete BUSCO in D. rubrovolvata is lower than those in D. indusiata, S. latifolia, Tremella fuciformis, Naematelia encephala, and G. lucidum (Supplementary Table 1). The assembly encodes 9,725 protein-coding genes, which is less than the other 16 fungi except C. militaris (9,651 protein-coding genes) and O. sinensis (6,972 protein-coding genes). The GC content of D. rubrovolvata (45.2%) was also lower than the average value of 19 fungi in this study (Supplementary Table 2). The general features of the D. rubrovolvata genome, including assembly and gene model statistics, are presented in Table 1.

Hi-C
Hi-C has been widely used to map chromatin interactions within regions of interest and across the genome. In total, 20.1 million read pairs (6.03 Gb clean data) were generated from the Hi-C library, and the GC content and Q30 ratio (the percentage of clean reads more than 30 bp) were 43.25 and 94.03%, respectively (Supplementary Table 3).
The Hi-C library quality was assessed based on the ratio of mapped reads and the proportions of valid interaction pairs and invalid interaction pairs. Only valid interaction pairs can provide effective information for genome assembly. Invalid interaction pairs mainly consist of self-circle ligation, dangling ends, religation, and dumped pairs. The mapped reads ratio was 95.19% (Supplementary Table 4). Of the unique mapped read pairs, 74.34% were the valid interaction pairs (12.12 million), which were used for the next Hi-C assembly (Supplementary Table 5).
Overall, we constructed a chromosomal-level assembly of D. rubrovolvata with 11 pseudo-chromosomes with lengths ranging from 1.77 to 3.37 Mb (Table 2). Hi-C assembly incorporated  Table 2. For the Hi-C assembled chromosomes, the genome was cut into 20 kb bins of equal length. The number of Hi-C read pairs covered between any 2 bins was then used as the intensity signal of the interaction between the bins to construct a heat map . The heat map demonstrated that the 11 chromosome groups can be clearly distinguished (Fig. 1). Within each group, the intensity of interaction in the diagonal position was higher than that in the off-diagonal position, indicating that the intensity of adjacent sequences (diagonal position) interaction in the Hi-C assembly was high, while the intensity of nonadjacent sequences (off-diagonal position) interaction was weak. The heat map of the Hi-C assembly interaction bins was consistent with a genome assembly of excellent quality.

Repeat sequence
The total length of the repeat sequence was 3,243,445 bp, which accounted for 9.86% of the D. rubrovolvata genome length. It was subdivided into 5 major types: retrotransposon, transposon, potential host gene, simple sequence repeat (SSR), and unknown duplications. A total of 2,428 retrotransposon, 2,141,399 bp in length, accounted for 6.51% of the genome length. In retrotransposon, the long terminal repeat-retrotransposons Copia (LTR/Copia) and long terminal repeat-retrotransposons Gypsy (LTR/Gypsy) accounted for 0.49 and 2.68% of the assembled genome, respectively. Transposon represented 0.71% of the assembled genomes. The Helitron transposable element, miniature inverted repeat transposable element, and terminal inverted repeat transposable element accounted for 0.16, 0.09, and 0.38% of the assembled genome, respectively (Table 3).

Gene prediction and genome comparisons
A total of 9,725 genes were predicted in the D. rubrovolvata genome (Supplementary Table 6), among which there were 8,830 homology-predicted genes or RNA-seq-predicted genes (90.79%; Fig. 1. Hi-C assembly of a chromosome interactive heat map. Lachesis Group (LG) means chromosome. LG01-LG10 are the abbreviations of 11 chromosomes. The abscissa and ordinate represent the order of each bin on the corresponding chromosome group. Fig. 2), indicating high reliability of the prediction. The total length of the encoded genes was 20.76 Mb, accounting for 63.1% of the whole genome, and the average length of each gene was 2,135.07 bp. The average exon and intron numbers were 7.08 and 6.08, respectively (Table 1).

Gene function annotation
To predict the protein sequences, a similarity analysis of 9,725 nonredundant genes in multiple public databases (GO, KEGG, KOG, NR, Pfam, CAZy, Swiss-Prot, and TrEMBL) identified 8,727 genes that were functionally annotated, which accounted for 89.74% of the assembled genome. Most genes were matched using the Nr (8,671 genes) database, followed by TrEMBL (8,298 genes) and Pfam (6,492 genes) database (Supplementary Table 7).

GO annotations
In GO database, 3 independent ontologies including biological process, cellular component, and molecular function were used to describe gene products according to their functional annotations. A total of 4,001 genes were assigned to 3 major categories: biological processes (18 branches), cellular components (15 branches), and molecular functions (14 branches). These were mainly distributed in 5 functional entries, "catalytic activity," "metabolic process," "cellular process," "cell part," and "cell," of which the number of annotated genes was 2,058, 1,873, 1,777, 1,722, and 1,702, respectively (Fig. 4). Dictyophora rubrovolvata had more genes in the common subcategories of "metabolic process" and "cellular process" within the biological process and "catalytic activity" within the molecular function categories (Supplementary Table 9).

KEGG annotations
To further systematically analyze the metabolic pathways of gene products in cells and the functions of these gene products, the KEGG database was used to annotate the gene functions of D. rubrovolvata. A statistical map of the number of annotated genes in the KEGG database is shown in Fig. 5. The 3,046 genes were assigned into 4 categories in KEGG: metabolism (90 branches), genetic information processing (15 branches), cellular processes (5 branches), and environmental information processing (1 branches). Of these, 1,863 genes were assigned to the "metabolism" category. Within metabolism, the biosynthesis of unsaturated fatty acids possesses 111 genes, followed by carbon metabolism (96), amino sugar and nucleotide sugar metabolism (47), and glutathione metabolism (46). A total of 817 genes were assigned to the "genetic information processing" functional category, including nucleocytoplasmic transport (100), ribosome (99), protein processing in the endoplasmic reticulum (85), and spliceosome (79). For cellular processes (265 genes), the cell cycle   was the most involved (74). In addition to the above 3 major categories, only 28 genes were assigned to the "environmental information processing" category (Supplementary Table 10).

Conclusion
In this study, we report a highly accurate chromosome-level genome assembly of D. rubrovolvata based on the PacBio SMRT and Hi-C technologies. The final genome size was 32.89 Mb. A total of 9,725 protein-coding genes were predicted using the strategy of multievidence combination, and 8,727 genes were functionally annotated. To the best of our knowledge, this genome-wide assembly and annotation data represent the first genome scale assembly of D. rubrovolvata. The genome data created in this study will serve as valuable resources for fungal diversity research and breeding of D. rubrovolvata and will further provide essential genomic information for understanding the molecular mechanism in its fruiting body formation during morphological development and facilitate the exploitation of medicinal compounds produced by this mushroom.

Data availability
Genome sequencing of D. rubrovolvata Di001 generated for this study has been submitted to the NCBI (BioProject: PRJNA908074 and BioSample: SAMN32024313).
Supplemental material available at G3 online.

Funding
This work was supported by the Natural Science Foundation of Fujian province of China (2021J01504), the Special Fund for Scientific Research in the Public Interest of Fujian Province (2022R1035005), the Science and Technology Innovations Program of Fujian Academy of Agricultural Science (CXTD2021016-2), and the guiding scientific and technological innovation projects of Fujian Academy of Agricultural Science (YDXM202209).

Conflicts of interest
The author(s) declare no conflict of interest.