Telomere-to-telomere genome assembly of bitter melon (Momordica charantia L. var. abbreviata Ser.) reveals fruit development, composition and ripening genetic characteristics

Abstract Momordica charantia L. var. abbreviata Ser. (Mca), known as bitter gourd or bitter melon, is a Momordica variety with medicinal value and belongs to the Cucurbitaceae family. In view of the lack of genomic information on bitter gourd and other Momordica species and to promote Mca genomic research, we assembled a 295.6-Mb telomere-to-telomere (T2T) high-quality Mca genome with six gap-free chromosomes after Hi-C correction. This genome is anchored to 11 chromosomes, which is consistent with the karyotype information, and comprises 98 contigs (N50 of 25.4 Mb) and 95 scaffolds (N50 of 25.4 Mb). The Mca genome harbors 19 895 protein-coding genes, of which 45.59% constitute predicted repeat sequences. Synteny analysis revealed variations involved in fruit quality during the divergence of bitter gourd. In addition, assay for transposase-accessible chromatin by high-throughput sequencing and metabolic analysis showed that momordicosides and other substances are characteristic of Mca fruit pulp. A combined transcriptomic and metabolomic analysis revealed the mechanisms of pigment accumulation and cucurbitacin biosynthesis in Mca fruit peels, providing fundamental molecular information for further research on Mca fruit ripening. This report provides a new genetic resource for Momordica genomic studies and contributes additional insights into Cucurbitaceae phylogeny.


Introduction
Momordica charantia L. var. abbreviata Ser. (Mca) is a variety of bitter gourd in the Cucurbitaceae family. Bitter gourd is native to Africa but has been present in Asia for a long period [1,2]. This Mca variety is mainly distributed in Southern Asia, Southeast Asia and China [3]. It is referred to as 'Jin ling zi' or 'Lai pu tao' in Chinese and was predominantly grown during the Ming and Qing dynasties [4]. Mca fruits are smaller than those of M. charantia L. (MC). During the process of fruit ripening, the fruit peel turns from green to yellow/orange at maturity, the pulp turns from bitter white to sweet red (Fig. 1A), and the fruit is both edible and applicable to ornamental use [5]. Terpenoids play essential roles in the bitterness of Momordica fruit. New norcucurbitane triterpenoids have been isolated from the fruit of Momordica in recent years [6][7][8]. The antioxidant and antibacterial effects of these fruits have also attracted attention [9,10] and suggest that these fruits have potential applications in the health food industry and for controlling food-borne pathogens.
The fruits of many members of the Cucurbitaceae are edible, and the family includes economically important crop species such as cucumber, watermelon, and bitter gourd [11]. Genome assembly is a great tool for plant breeding and crop improvement. With the development of deep sequencing, the genomes of 12 species of Cucurbitaceae have been assembled, namely those of cucumber [12], watermelon [13,14], melon [15][16][17], pumpkin [18], sponge gourd [19,20], bitter gourd [3,21], wax gourd [22], snake gourd [23], bottle gourd [24,25], and chayote [26]. All of the species with assembled genomes produce cultivated fruit and, and, with respect to members of the genus Momordica, there are wild relatives that potentially contain valuable germplasm. The first version of the bitter gourd OHB3-1 genome was drafted in 2016 [21], but the genes were not anchored to chromosomes. Cui et al. [2] reported the first chromosome-level bitter gourd genome, and Matsumura et al. [3] published a long-read genome and provided new insights into bitter gourd genetic diversity and domestication. However, to date no Mca genomic information is available.
To promote the progress of Cucurbitaceae and Momordica genomic research, we assembled an Mca genome through thirdgeneration circular consensus sequencing (CCS) as well as highthroughput chromosome conformation capture (Hi-C) sequencing and chromosome assignments. The resulting draft genome was analyzed to identify repeat sequences and coding genes. The evolutionary relationship and gene variations between Mca and other Cucurbitaceae species were indicated by comparative genomic and variation analysis. On the basis of an assay for transposase-accessible chromatin (ATAC) with high-throughput sequencing, transcriptome sequencing and metabolome analysis, Mca genes associated with fruit texture, pigmentation, bitterness, phytohormone synthesis, and signal transduction were identified. Metabolites such as saponins and momordicins and other characteristic substances were analyzed. This assembly of the Mca genome helps improve the understanding of Cucurbitaceae and Mca phylogeny and provides insights into the fruit ripening process.

Genome assembly
A 350-bp library was constructed using Mca genomic DNA and was sequenced and filtered on the Illumina sequencing platform, generating 21.28 Gb of high-quality data. The total sequencing depth was ∼74.31×. The k-mer depth corresponding to the first peak was 30.7 (Supplementary Data Fig. S1), and the calculated length of a single genome was ∼286.40 Mb (Table 1). According to the k-mer distribution, the estimated repeat content constituted ∼25.60%, and the estimated heterozygosity was ∼0.12%. These data indicate that Mca has a simple genome, which is conducive to the construction of subsequent fine genomic maps.  Table S1).
A heat map of Hi-C-assembled chromosomes indicated that the genome assembly was complete (Fig. 1B). A total length of 295.6 Mb anchored to 11 chromosomes; this accounted for 99.87% of the assembled genome and can determine direction. The final genome assembly statistics were as follows: 25.4 Mb for contig N50 and scaffold N50, with 98 contigs and 95 scaffolds ( Table 1). The gene density, transposable element (TE) content, and GC content are shown in Fig. 2A. Using Tandem Repeats Finder (4.07b) [27], we identified centromeric sequences from the assembled genomes, and the most abundant tandem repeat is the centromere DNA [28]. Using the seven-base telomeric repeat (CCCTAAA at the 5 end or TTTAGGG at the 3 end) as a sequence query, we identified 18 telomeres. The candidate centromere tandem repeats, predicted centromeric regions, and telomere regions are shown in Supplementary Data Tables S2, S3, and S4, respectively. With respect to the anchored chromosomes, we found that LG01, LG02, LG05, LG06, LG07, LG09, LG10, and LG11 were assembled as gapless chromosomes, and six LG05, LG07, LG09, and LG11 (Fig. 2B). Only three chromosome assemblies had gaps, indicating that the assembled genome can be considered a high-quality telomere-to-telomere (T2T) genome. Chromosomes with telomeres and centromere regions are shown in Supplementary Data Fig. S2. Karyotype analysis indicated that Mca has 22 chromosomes with length 1.5-2.0 μm, and they are nearly central or proximal centromeric chromosomes. The telomeres and rDNA were subjected to fluorescence in situ hybridization, which showed green signals at the telomeres of each chromosome (Fig. 2C). In Fig. 2D two chromosomes show strong 5SrDNA hybridization signals (red arrows) and another two chromosomes show strong 18SrDNA hybridization signals (green stars). To evaluate genome assembly quality, Merqury [29] results showed that the integrity of the genome assembly was 99.6%, QV = 45.2, and the error rate was only 0.003%, indicating that a genome with high integrity and accuracy was constructed.
Detailed assessment results and k-mer multiplicity are shown in Table 2 and Supplementary Data Fig. S3, respectively. This highquality gapless genome can be used to elucidate the mystery of these 'dark matter' regions.

Genome annotation analysis
Repeated sequences, pseudogenes, non-coding RNAs, and homologous coding genes and functions were annotated using the non-redundant (NR) protein sequence, evolutionary genealogy of genes: Non-supervised Orthologous Groups (eggNOG), Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), SwissProt, Pfam, and TrEMBL (Table 3)

Comparative genomic analysis
Thirty-one Mca gene families and 3680 families common to all species were investigated. A total of 1518 specific gene families were detected in bitter gourd; the others are described in Fig. 3B. GO and KEGG pathway enrichment analysis was performed for the Mca-specific gene family members. KEGG pathway analysis indicated that Mca-specific gene family members were associated with RNA polymerase, sesquiterpenoid, and triterpenoid biosynthesis, purine metabolism, and photosynthesis-antenna proteins (Supplementary Data Fig. S4). A total of 1019 single-copy genes were detected, and the gene copy number distribution for all gene families for 15 analyzed species is shown in Fig. 3A. A rooted tree was constructed based on the 1019 single-copy genes, with the outgroup represented by Amborella trichopoda (Fig. 3E). This showed that Mca diverged from bitter gourd 0.53-52.49 million years ago (Mya) and that bitter gourd evolved gradually from other Cucurbitaceae crop species ∼115.08-120.42 Mya. The contraction and expansion of Mca gene families relative to those of their ancestors were subsequently predicted (Fig. 3E). Sixtytwo expanded genes and 106 contracted genes compared with those in bitter gourd were predicted. KEGG pathway enrichment revealed that the expanded genes were associated with photosynthesis, phenylalanine metabolism, and diterpenoid biosynthesis (Supplementary Data Fig. S5A) and that the contracted genes were associated with linoleic acid metabolism and diterpenoid biosynthesis (Supplementary Data Fig. S5B). Collinearity analysis showed that the percentage of collinear genes between Mca and melon was 55.87%, followed by 54.07% between Mca and sponge gourd (Fig. 3C). Several collinear genes that played roles in fruit ripening and flavor formation during the domestication process, including genes related to phytohormones, the carotenoid pathway and cucurbitacin synthesis, were detected in these three species ( Fig. 3C; Supplementary Data Table S8). To further determine the evolutionary relationships of Mca and other Cucurbitaceae gourds, we used the synonymous mutation rate (K s ) and 4-fold synonymous third-codon transversion (4DTv) curves to identify the occurrence of whole-genome duplication (WGD) events. The K s peaks for Mca, bitter gourd, and sponge gourd were ∼1.32 (Supplementary Data Fig. S6A), and have been verified as corresponding to an ancient genomewide triploidization (WGT; gamma) event [30]; additionally, these findings indicate that no recent WGD events have occurred. This conclusion is consistent with the 4DTv curves (Supplementary Data Fig. S6B). On the other hand, as shown in Fig. 3D

Genome-wide identification of variation
The Mca genome was used as a reference to assess the genome variations that occurred in bitter gourd throughout evolution. This revealed 211 329 single-nucleotide polymorphism (SNP) differences and 100 187 insertions-deletions (InDels), namely 43 085 insertions and 57 102 deletions. Figure 2A shows the SNP and  Table 4, Supplementary Data Table S6), which associated with phytohormone, texture and terpene metabolize.
In this analysis, the presence of a gene was defined as a sequence of >50 bp with no variation from the reference genome sequence. The absence of a sequence of >50 bp identical to one in the reference genome was defined as deletion variation (absence). We found 572 and 706 sequences that were present and absent, respectively, from the bitter gourd genome. KEGG pathway annotation of the genes present in bitter gourd indicated an association with thiamine metabolism, inositol phosphate metabolism, and nitrogen metabolism. Absent genes were related to sesquiterpenoid and triterpenoid biosynthesis, glycerolipid

ATAC-sequencing analysis
ATAC-sequencing (ATAC-seq) can be carried out to determine open chromatin regions. Open regions of chromatin contain transcription factor-binding sites (TFBSs), which play important roles in the regulation of gene expression mediated by transcription factors (TFs). Therefore, identifying the representative motif (overrepresented) sequences enriched in the peak region helps us speculate about the potential TFs binding to the sequences corresponding to the peak region. According to our results, pulp tissue was used for ATACseq. Sequences with different peaks are also known as differentially accessible regions (DARs). Comparison of samples of immature green (IM) fruit versus breaker (BR)-stage fruit, which show the first signs of color change, revealed 7713 DARs, namely, 6248 increased DARs (corresponding to peaks with increased accessibility in BR-stage fruit compared with IM fruit) and 1465 decreased DARs (corresponding to peaks with decreased accessibility in BR-stage fruit compared with IM fruit). In the BR-stage fruit versus ripening stage (RS) fruit comparison group, 3858 DARs were found, namely, 3222 increased DARs and 636 decreased DARs. Relationships (in terms of distance) between DARs and functional elements of genes in the genome were predicted, such as promoters, 5 untranslated regions (UTRs), 3 UTRs, distal intergenic regions, and others, as shown in Fig. 4A and B, the results of which showed that most DARs were distributed in distal intergenic regions and that the second regions were 1 kb from the promoters of genes in both groups. The known TF information obtained from motif annotations can be combined with genes associated with certain peaks, providing a new direction for the study of transcriptional regulation.  Table S10). In the BRstage fruit versus RS fruit comparison group, essential coding genes, such as ethylene-responsive transcription factor WRI1 (WRI1), ethylene-responsive transcription factor ERF118 (ERF118), auxin response factor 1 (ARF1; Fig. 4D, Supplementary Data Fig. S5, Supplementary Data Table S10), were anchored to chromosome 3. The chromosome position coding WRI1 contained RAP2- Fig. 4D and E, Supplementary Data Table S10).
In the fruit peel tissue, 200 DEMs were detected in the preripening stage (IM peel versus BR-stage peel), of which 150 DEMs were increased and 50 DEMs were decreased. These DEMs were classified as carboxylic acids and derivatives, fatty acids, benzene derivatives, organooxygen compounds and others (Supplementary Data Table S13). Among them, the content of kaempferol-6,8-di-C-glucoside increased by 59 369fold and arachidic acid presented the greatest FC decrease.  shown are E motif logos of several TFs, generated through MEME and Tomtom.
In total, 4593 DEGs were detected, namely 1863 upregulated genes and 2730 downregulated genes, from the BR stage to the RS (Supplementary Data Table S16). GO enrichment analysis revealed that these genes were associated with integral components of membranes among the cellular component and calcium binding among the molecular functions ( Supplementary Data Fig. S10A). KEGG annotations revealed associations with processes involving plant hormone signal transduction, carbon metabolism, starch and sucrose metabolism, glycerophospholipid metabolism, and biosynthesis of amino acids (C) Partial pathway of terpenoid backbone biosynthesis (ko00900) and sesquiterpenoid and triterpenoid biosynthesis (ko00909) for the production of cucurbitadienol and gleditsioside. Relative gene expression in the IM stage, BR stage, and RS is described in the boxes beside each gene; red color represents high relative expression and green color represents low relative expression.
( Supplementary Data Fig. S10B). A phylogenetic tree based on the Mca, Trichosanthes anguina, C. moschata, and Arabidopsis thaliana carotene cleavage dioxygenase (CCD) families is shown in Fig. 6B. The expression of 9-cis-epoxycarotenoid dioxygenase NCED5 (NCED5) and carotenoid cleavage dioxygenase 4 (CCD4) was downregulated and that of NCED2 and CCD1 was upregulated. Transcripts associated with cucurbitacin biosynthesis, such as acetyl-CoA C-acetyltransferase (ACAT), hydroxymethylglutaryl-CoA reductase (HMGCR), and squalene monooxygenase (ERG1), were upregulated in the BR-stage versus RS comparison group (Fig. 6C). Changes in the expression of other DEGs involved in phytohormone synthesis, perception and signaling, cell wall structure, and polysaccharide biosynthesis and metabolism are described in Fig. 6D and Supplementary Data Table S16.

Combined transcriptome and metabolome analysis
Coanalysis of DEMs and DEGs indicated the relationships between DEMs and DEGs, the results of which are presented as a heat map in Fig. 7A, together with network diagrams indicating the relationships between them. The results suggested that in the IM-stage versus BR-stage comparison group abscisic acid was positively regulated by auxin-responsive protein (IAA4/14), auxin-induced protein 15A (AUX15A), and ethylene response sensor 1 (ERS2) and the TF PIF7, while auxin-responsive protein SAUR50 played a negative role in abscisic acid biosynthesis (Supplementary Data Fig. S11A). According to the ko00270 KEGG map, aminotransferase TAT2 and homocysteine S-methyltransferase 3 played positive roles in l-methionine and pyruvic acid accumulation and negative roles in S-adenosylmethionine accumulation (Supplementary Data Fig. S11B), whereas the pyruvic acid content was increased by 1-aminocyclopropane-1carboxylate synthase CMW33 (ACS). The content of l-glutamic acid was increased by glutamine synthetase nodule isozyme and phosphoglycolate phosphatase 1B, both of which were negatively correlated with pyruvic acid content in contrast to l-glutamic acid content. Phosphoglycolate phosphatase 1B and oxalate-CoA ligase were found to play different roles involving l-malic acid content, while oxalate-CoA ligase promoted the accumulation of oxalic acid (Supplementary Data Fig. S11C). In addition, βcarotene 3-hydroxylase (crtZ) positively affected β-carotene and canthaxanthin degradation, as indicated in Fig. 7B.

Discussion
Mca fruits are known for their taste and nutrient content, which are driven by diversified saponins, polypeptide polysaccharides, and other nutritionally functional components [31,32], but molecular investigations on Mca fruits are scarce. Other studies involving bitter gourd genome assembly and domestication have revealed genetic maps of Momordica species; however, a highquality genome has been less common in Momordica varieties and subspecies [2,3,21]. To provide more molecular information to aid research progress in Mca and other Momordica species, using a variety of sequencing platforms, we performed a genome assembly and comparative genome and genome-wide variation analysis and high-depth sequencing to construct the first Mca genome. This is a T2T gapless or nearly high-quality genome of a for 6 of the 11 chromosomes in Momordica. Other T2T genome assemblies have been completed for the human X chromosome and chromosome 8 [33,34]. Naish et al. [35] studied the genetic and epigenetic landscape of Arabidopsis centromeres, and two gap-free reference genomes of rice are available [36]. In Cucurbitaceae species, Deng et al. [28] have assembled a gap-free watermelon genome of 11 chromosomes. These are fundamental for genome structure exploration and plant breeding as well as for trait selection. This 295.6-Mb genome was anchored to 11 chromosomes after Hi-C adjustment, which was slightly larger than genomes reported for bitter gourd [3,21], silver-seed gourd [37], cucumber [12], zucchini [38], and pumpkin [18] but much smaller than those of other Cucurbitaceae, including snake gourd [23], wax gourd [22], sponge gourd [19,20], chayote [26], watermelon [13,14], melon [15][16][17], and bottle gourd [24,25]. Our study provides additional genetic resources for bitter gourd genome assembly and for further evolutionary analysis of the Cucurbitaceae.
To better understand the evolutionary relationships between Mca and other members of the Cucurbitaceae, comparative genomic and genome-wide variation analyses were carried out. Figure 3B presents the divergence time of Mca and other species and shows that bitter gourd diverged from other Cucurbitaceae ∼115-120 Mya. Cui et al. [2] reported that bitter gourd had diverged between 33.3 and 40.4 Mya, and Fu et al. [26] suggested that bitter gourd and other gourd species diverged 39-96 Mya; thus, bitter gourd diverged 33. 3-40.4 Mya. According to our results (Fig. 3B), Mca diverged from bitter gourd 0. 53-52.49 Mya; this is a large time span for divergence prediction owing to the quality of the bitter referenced gourd genome assembly. There is no fossil calibration information of varieties and original species, so we inferred that Mca diverged from bitter gourd 0.53-33.3 Mya, which is a relatively recent divergence event.
Through genome-wide variation analysis, we identified genes, such as gibberellin 2-beta-dioxygenase 8 (GA2OX8) and polygalacturonase (PG) (Table 4), that have common InDels and SNPs and may participate in Mca fruit ripening. In A. thaliana, gibberellin 2-betadioxygenase 7 (GA2OX7) participates in gibberellin (GA) synthesis and indole-3-acetic acid conjugation with amino acids [39,40]. Polygalacturonase (PG) is involved in cell wall metabolism, acting in concert with pectinesterase in the ripening process [41,42]. These genes may be related to the phytohormone synthesis and fruit texture changes that occur during ripening of Mca fruit, all of which may contribute to the unique Mca characteristics. Other genes, such as terpene synthase 10 (TPS10) and flavonol synthase (FLS), involving SVs were also predicted; these genes may also play a role in Mca fruit quality. Terpene synthase 10 (TPS10) was demonstrated to be involved in monoterpene biosynthesis [43], and Cucurbitaceae family members are rich in terpenes, which are attractants with antioxidant, anti-inflammatory, and other nutritional values [44]. Flavonol synthase (FLS) is involved in the biosynthesis of flavonol and kaempferol, which are associated with fruit pigment and have antioxidant characteristics [45,46]. The absence of these SVs and presence-absence variations (PAVs) in genes may have been essential for the formation of the unique Mca features during its divergence from bitter gourd.
Throughout the Mca fruit pulp ripening process, the DEMs in pulp samples mainly include amino acids, lipids, organic acids and other substances, such as saponins and cucurbitacin. Saponins were verified as being the main bioactive substances with antihyperglycemic activities in M. charantia [47]. The content of isovitexin-7-O-glucoside (saponarin) decreased in both comparison groups. Substances unique to Cucurbitaceae, such as cucurbitacin E, candicine, cucurbitacin D-O-glucoside, cucurbitacin E-O-glucoside, momordicoside F2, momordicine IV, and momordicoside L, differentially accumulated in the postripening stage (BR-stage versus RS). Cui et al. [2] reported that the main cucurbitane triterpenoid synthesized in bitter gourd is cucurbitacin E, which is the main bitter substance in watermelon [48] and is converted from cucurbitacin I through the action of ClACT and ClBi [49][50][51]. Momordicine and vitamins with antioxidant activity are abundant in bitter gourd [52][53][54]. These substances play important roles in the medicinal value of Mca fruit. The following DEMs were also detected: betaine; l-methionine; l-cystine; l-tyrosine methyl ester; γ -linolenic acid * ; α-linolenic acid * ; malonic acid; citric acid * ; and plant hormones such as jasmonic acid, salicylic acid, and abscisic acid. However, the changes in the expression of DEGs were consistent with the increased accumulation of these metabolites (Supplementary Data Tables S10 and S11). As a previous study described, a number of amino acids were associated with fruit flavor and aroma; these included leucine, tyrosine, and phenylalanine, which are related to branched-chain esters and aromatic esters [55,56]. These metabolites and genes allow a better understanding of the regulatory and biochemical variations in Mca fruit composition during ripening (Fig. 8). The DEMs in the Mca peel were different from those in the fruit pulp, which suggested that studies on the differential accumulation of these and the uses of peel and pulp tissues are worthy of further research.
During the Mca fruit peel ripening process, the peel color changes from green to yellow/orange at maturity, so DEGs involved in fruit chlorophyll degradation and carotenoid pigment accumulation are expected to play crucial roles in the Mca fruit peel color change. In the comparison of the BR stage with the RS, DEGs such as 9-cis-epoxycarotenoid dioxygenase NCED (NCED2/5), carotenoid 9,10(9 ,10 )-cleavage dioxygenase 1 (CCD1), and carotenoid cleavage dioxygenase 4 (CCD4) were detected (Supplementary Data Table S14) and were classified as belonging to the CCD family, whose members are related to fruit pigment accumulation [57]. According to a recent Cucurbitaceae study, NCED2 showed much higher expression during the snake gourd peel RS, during which time the color changed from green/white to red at maturity [23]. Changes in CCD4 gene expression are also involved in squash, pumpkin, and watermelon pulp ripening [58][59][60]. Furthermore, the accumulation of high amounts of β-carotene also occurs during bitter gourd fruit ripening [61]. The biosynthesis and metabolism of cucurbitacin, a metabolite unique to Cucurbitaceae, are important for the study of Mca fruit features. According to our results (Figs 6C  and 8, Supplementary Data Table S14), genes promoting the synthesis of cucurbitacin, including acetyl-CoA C-acetyltransferase (ACAT), hydroxymethylglutaryl-CoA reductase (HMGCR) and squalene monooxygenase (ERG1), were upregulated in the BR-stage versus RS comparison group (Figs 6C and 8). Previous studies of the cucurbitacin biosynthesis pathway showed that in melon fruit Cm890 and Cm180 were annotated in the synthesis of cucurbitacin D and B [62]. Shang et al. [63] described that cucurbitacin C was derived from Cs540 and Cs160, and cucurbitacin I and E are reportedly synthesized through Cl890A/Cl890B and Cl180 [49]. However, specific names for the genes involved in the synthesis of cucurbitacin have not been defined. Further study of the roles of these genes in Mca is needed. DEGs associated with Mca fruit enlargement, including auxin-, ethylene-, GA-, abscisic acid-and pectin biosynthesis-related genes, were also identified (Supplementary Data Tables S10 and S11). Genes involved in cell wall modification, such as the members of the expansin family, pectate lyase (PL), pectinesterase (PME), GAL, PG, and cellulose synthase-like protein [64], during the fruit ripening process have also been found in melon, snake gourd, and chayote [17,23,26]. Further analysis of the transcriptome changes should allow a more complete understanding of the regulatory and biochemical changes that occur during Mca fruit ripening and that determine fruit quality and nutritional value.

Genome sequencing and assembly
Fresh and healthy Mca plants for genome sequencing were collected in Wuzhong district, Suzhou, Jiangsu Province; these plants are generally named 'Jin ling zi' or 'Lai pu tao' as a variety of bitter melon. After harvesting, samples were immediately frozen in liquid nitrogen and preserved at −80 • C for DNA extraction. By the use of the modified cetyl-trimethylammonium bromide (CTAB) method [65], high-quality genomic DNA was extracted from Mca leaves. Redundant RNA was removed using RNase A, and agarose gel electrophoresis was used to check the quality of the DNA. For Illumina sequencing, a short-read (350 bp) library was constructed and sequenced on an Illumina NovaSeq platform to obtain clean reads, which were then used to assess the genome size, GC content, and heterozygosity. For PacBio sequencing, a long-read library was constructed from genomic DNA, which was fragmented to 15 kb according to the manufacturer's instructions. Then, this library was sequenced on a PacBio Sequel II platform. Low-quality reads and sequence adapters were removed to obtain clean subreads. Hifiasm software (v0.14) [66] was used to splice high-precision CCS data to obtain genomic sequences. The assembly results were evaluated using BWA-MEM [67], CEGMA v2.5 (default settings) [68], and BUSCO v5 [69].
To advance the quality of genome assembly, we used Hi-C technology to help anchor contigs. Based on sequencing-bysynthesis (SBS) technology, an Illumina high-flow sequencing platform was employed to sequence the Hi-C database to obtain raw data. Then, high-quality clean data were obtained by filtering the raw Hi-C sequencing data and low-quality reads. Clean read pairs were obtained from the Hi-C library and were subsequently mapped to the originally assembled Mca genome by BWA (bwa-0.7.17), with the default parameters. Contigs mapping to the matching paired reads were used for the performance of Hi-C-associated scaffolding. Null reads, including those near start sites, artifacts from PCR amplification, those resulting from random breaks, those corresponding to large The upper section represents the pulp ripening process. ATAC-seq genes are shown in yellow ellipses, and metabolites are shown in boxes (metabolites whose abundance increased are shown in yellow and metabolites whose abundance decreased in shown in green). The lower section represents the peel ripening process, with transcripts shown in ellipses (blue arrows indicate decreased levels and red arrows indicate increased levels) and metabolites shown in boxes (metabolites whose abundance increased are shown in red and metabolites whose abundance decreased are shown in blue).
fragments, those corresponding to small fragments, and those corresponding to extreme fragments, were filtered and removed. Then, we used Lachesis [70] with the agglomerative hierarchical clustering method to successfully cluster contigs into chromosomes. Lachesis was further applied to order and orient the clustered contigs. The specific parameters used were as follows: cluster_MIN_RE_SITES = 24; CLUSTER_MAX_LINK_DENSITY = 2; ORDER_MIN_N_RES_IN_TRUNK = 6; and ORDER_MIN_N_RES_ IN_Shreds = 6. The contigs were then manually mapped and evaluated to obtain genome sequences at the chromosome level.
Pseudogenes usually have sequences similar to those of functional genes but may have lost their biological function because of genetic mutations, such as InDels. The GenBlastA (v1.0.4) program was used to browse the genomes after the hiding of functional gene prediction. Supposed candidates were then identified by searching for premature stop codon and frameshift mutations using GeneWise (v2.4.1). Non-coding RNAs are usually divided into several groups, including miRNAs, rRNAs, tRNAs, snoRNAs, and snRNAs. tRNAscan-SE (v1.3.1) was used to predict tRNAs, with criteria specific for eukaryotes. Barrnap (v0.9) was applied to identify rRNA genes. miRNAs were picked out by searching the miRBase (release 21) database, and snoRNA and snRNA genes were analyzed using INFERNAL against the Rfam (release 12.0) database (BMK Corporation).

Telomere detection and karyotype analysis
We detected the telomeres of the Mca chromosomes according to the methods of Song et al. [36]. Then, genomic karyotype identification and distribution of important repeats by fluorescence in situ hybridization were carried out. For this, the root tips of Mca samples with a length of 2-3 cm were placed in a 0.5-ml centrifuge tube with holes at the top for dye preparation. First, the centrifuge tube containing the root tips was placed in an inflatable tank filled with 0.9-1.0 MPa nitrous oxide for 2 hours. In an ice bath, 90% precooled glacial acetic acid was used to fix the split phase, which was then washed twice with ddH 2 O for subsequent experiments. After enzymolysis, fragmentation, and dropping, the chromosome samples were examined under an ordinary optical microscope or phase-contrast microscope to find target cells in the middle of mitosis, which were saved for future use. The probe was labeled by the nick translation method, and the fully reactive labeled probe was purified with a Qiagen kit. The purified probe was stored at −20 • C until use. The telomeres were labeled with pTa794 and pTa71 plasmid DNA via synthetic terminal (TTTAGGG) 6 probes and using 5S rDNA and 18S rDNA probes, respectively. Then, 8 μl of TE2 × SSC (pH 7.0) solution, 0.25 μl of probe solution, and two different fluorescent color-labeled probes were added to each slide. After adding the labeled probe solution to each slide, a coverslip was added. The slides were subsequently heated at 80 • C for denaturation for 5 minutes, after which they were placed in a hybridization box. The materials in the hybridization box were subsequently allowed to hybridized overnight at 37-42 • C. Then, each slide was washed with 2× SSC solution two or three times at room temperature and then dried under a stream of air. Then, 8 μl of 4 ,6-diamidino-2-phenylindole (DAPI) dissolved in antifluorescence quencher solution was applied, and a cover-glass was then added. The slides whose materials were subjected to in situ hybridization were imaged with a charge-coupled device a high-resolution fluorescence microscope for accurate karyotype analysis (OMIX Technologies Corporation).

Genome-wide variation analysis
To investigate the genomic differences between Mca and bitter gourd, we performed a genome-wide variation analysis. Taking the assembled Mca as the reference genome, we used the complete bitter gourd assembly genome published by Matsumura et al. [3] for alignment analysis. MUMmer v4.0 [75] with the parameters -c 500 -b 500 -l 100 -maxmatch was used for genome-wide alignment, and the raw alignment results were further filtered using a delta filter, with the parameters −1 -i 90 -l 500. Then, SyRI [76] with the default parameters was used to detect the SVs among the filtered delta files. For the genes where mutations were detected, clusterProfiler v3.14.0 was used, and GO and KEGG enrichment analyses and annotations were performed. ANNOVAR was performed as part of a software program [77] used to determine the functional effects (BMK Corporation).

Comparative genomic analysis methods
OrthoFinder v2.4 software [78] was applied to organize the protein sequences of 15 species into families (the diamond alignment method was adopted, and the alignment e value was 0.001), and the obtained gene families were annotated using the Panther v15 database [79]. Finally, the unique gene families of these species were determined by GO and KEGG enrichment analyses. The gene copy number of each gene family in each species was analyzed, and the unique gene families of these species were analyzed using clusterProfiler v3.14.0 [80] for GO and KEGG enrichment analysis. Using single-copy gene sequences, IQ-TREE v1.6.11 [81] was used to build an evolutionary tree. Specifically, MAFFT v7.205 [82] (parameter -localpair -maximal 1000) was applied to contrast the sequence of every single-copy gene family, and then the regions with poor sequence alignment or large divergences were filtered out using gBlocks v0.91b [83] (parameter -B5 = h). Finally, all the well-aligned gene family sequences of each species were attached end to end to gain supergenes, after which ModelFinder [84], a model detection tool in IQ-TREE, was used. The most excellent model was the JTT + F + I + G4 one. Then, using this most excellent model, we constructed an evolutionary tree by the maximum likelihood method, setting the bootstraps as 1000. The divergence time was calculated by the PAML v4.9i software package MCMCTree 9i [85]. Computational Analysis of gene Family Evolution (CAFÉ; v4.2) software was applied to predict the gene families that contracted and expanded in Mca species relative to its ancestors. Forward selection analysis was carried out with the CodeML module in PAML. The specific method of collinearity analysis involved the use of DIAMOND v0.9.29.130 [86] to identify the similar gene pairs between two species (e < 1E − 5 , C score >0.5, where the C score value is screened by JCVI software). Then, according to the gff3 file, we determined whether similar gene pairs were adjacent on a chromosome. This process was mainly carried out through MCScanX [87] (parameter -m5). Finally, the genes in all collinear blocks were obtained. The K s and 4DTv combination method is currently widely used to identify WGD events. WGD v1.1.1 software [88] and a custom script (https://github.com/ JinfengChen/Scripts) were used to identify WGD events within Mca. For this, LTR transposons were the focus of attention, as they typically have been. We used LTR_ FINDER v1.07 software [89] to search for LTR sequences with scores ≥6 in the genome (i.e. via parameter -S6) and filter LTRs corresponding to duplicate results at the same time. After MAFFT comparisons and Kimura model-based distances were calculated in EMBOSS v6.6.0, the LTR insertion times were analyzed [90] (BMK Corporation).

ATAC-sequencing analysis
Fresh fruit samples at three different maturity periods, namely IM fruit (14 days after flowering), BR-stage fruit (28 days after flowering), and RS fruit (35 days after flowering), were collected in Wuzhong district, Suzhou, Jiangsu Province. Three biological replicates in each group and Mca fruit pulp samples were used for ATAC-seq. The sample preparation and sequencing procedures were processed using the common steps described by Illumina Corporation. The raw reads contained adapters (with length <35 bp) and low-quality reads (reads with >10% N, and reads with a base quality value Q ≤ 10 proportion of >50% of the whole read), which were both cleared by Cutadapt software. Then, the cleared high-quality reads given in FASTQ arrangement were used for further analysis. Bowtie2 software was used to contrast the clean reads gained from the sequencing of every sample with the reference genome to obtain the alignment efficiency of the sample reads and the positional information of the reads on the genome. DeepTools v2.07 was applied to outline the density distribution of the sequences. The 3-kb interval upstream and downstream of the transcription start site (TSS) of every gene was read, and represented as heat maps. The read abundance throughout the whole genome was statistically analyzed by the sliding window method, with 10 kb taken as the unit length of interval and the chromosomes divided into multiple small windows. The mapped reads that landed in each window were counted as read abundance, and the Pearson correlation coefficient of the normalized read abundance was calculated across all the samples. A sample correlation clustering heat map was made. MACS2 v2.1.1 software was used to perform peak extraction. The DiffBind package was used to analyze differences in peaks in different comparison groups. Specifically, the number of read counts supported by peaks in each sample was calculated, and the affinity score was calculated based on the number of counts (i.e. the standardized read); the affinity score was used as the input of DESeq2 [91] software for differential screening of samples in each group. The screening criteria were as follows: FC >1.5 and P value <.05. Here, FC is defined as the read count multiple of the treatment group relative to the control group, i.e. the treatment group/control group. Motif analysis was performed on the basis of the difference peak analysis results, and MEME-ChIP software was also used to identify and annotate the motifs. Tomtom was used to align the detected motif sequences with known motifs (BMK Corporation).

Metabolome analysis
Mca fruit pulp and peel samples at the three maturity periods were used for metabolome analysis, and three biological replicates were included in each group. The samples were subjected to vacuum freeze-drying and HPLC-MS analysis. Based on the self-built database, the materials were characterized according to their second-order spectral information and then quantified by a multireaction monitoring (MRM) model. Standardized processing data were used for further analysis. The overall metabolic differences and variability of each group of samples were preliminarily determined with a principal component analysis of the samples. The square of the Spearman rank correlation coefficient R (Spearman rank correlation) was used as an evaluation index of biological repetition correlation. The detected metabolites were contrasted for classification and pathway information in the KEGG, HMDB, and LipidMaps databases. According to the grouping information, the difference multiples were calculated and compared, and a t-test was applied to determine the difference significance (P value) of every metabolite. The R language package ropls was applied to represent orthogonal projections to latent structures discriminant analysis (OPLS)-DA modeling, and 200 permutation tests were carried out to verify the reliability of the model. The variable importance in projection (VIP) value of the model was calculated using multiple cross-validation. The method of composing the difference multiple, the P value and the VIP value of the OPLS-DA model was adopted to check the DEMs. The checking criteria were FC >2, P value <.05 and VIP >1. The significance of the DEMs associated with KEGG pathway enrichment was determined using a hypergeometric distribution test (BMK Corporation).

Transcriptome analysis
Mca fruit peel samples from three maturity periods were used for transcriptome analysis, and three biological replicates were included for each group. The original data were obtained by sequencing on the Illumina platform, and clean data were obtained after filtering. Mapping data were obtained by sequence alignment with the specified reference genome, and then the quality of the database was evaluated. According to the gene expression of different sample groups, the DEGs were annotated and enriched. DESeq2 (v1.6.3, default value: test = 'Wald', fittype = 'parameter') [91] was used to analyze the differences in expression between sample groups. When detecting DEGs, we used a screening criterion of FC ≥2 and false discovery rate <.01. The GO and KEGG databases were used to annotate and determine the enriched DEGs.

Combined transcriptome and metabolome analysis
According to the DEM analysis results of this experiment, combined with the transcriptome analysis results of the DEGs, the differentially expressed genes and differential metabolites in the same group were annotated to the KEGG pathway map at the same time, the results of which can help to perform communications between genes and metabolites. Selecting genes and metabolic pathways with a P value <.05 for priority analysis can save time for screening data and quickly find the pathways related to the research purpose for follow-up analysis. For this, we grouped the genes according to each difference, calculated the correlation coefficients between all genes and metabolites based on the Pearson correlation method, preprocessed the data with the Z-value transformation method before calculating the correlations, and then screened them according to the correlation coefficient (CC) and corresponding P value. The screening threshold criteria were as follows: | CC | > .80 and CCP < .05 (BMK Corporation).