Haplotype-resolved genome assembly for tetraploid Chinese cherry (Prunus pseudocerasus) offers insights into fruit firmness

Abstract Chinese cherry (Prunus pseudocerasus) holds considerable importance as one of the primary stone fruit crops in China. However, artificially improving its traits and genetic analysis are challenging due to lack of high-quality genomic resources, which mainly result from difficulties associated with resolving its tetraploid and highly heterozygous genome. Herein, we assembled a chromosome-level, haplotype-resolved genome of the cultivar ‘Zhuji Duanbing’, comprising 993.69 Mb assembled into 32 pseudochromosomes using PacBio HiFi, Oxford Nanopore, and Hi-C. Intra-haplotype comparative analyses revealed extensive intra-genomic sequence and expression consistency. Phylogenetic and comparative genomic analyses demonstrated that P. pseudocerasus was a stable autotetraploid species, closely related to wild P. pusilliflora, with the two diverging ~18.34 million years ago. Similar to other Prunus species, P. pseudocerasus underwent a common whole-genome duplication event that occurred ~139.96 million years ago. Because of its low fruit firmness, P. pseudocerasus is unsuitable for long-distance transportation, thereby restricting its rapid development throughout China. At the ripe fruit stage, P. pseudocerasus cv. ‘Zhuji Duanbing’ was significantly less firm than P. avium cv. ‘Heizhenzhu’. The difference in firmness is attributed to the degree of alteration in pectin, cellulose, and hemicellulose contents. In addition, comparative transcriptomic analyses identified GalAK-like and Stv1, two genes involved in pectin biosynthesis, which potentially caused the difference in firmness between ‘Zhuji Duanbing’ and ‘Heizhenzhu’. Transient transformations of PpsGalAK-like and PpsStv1 increase protopectin content and thereby enhance fruit firmness. Our study lays a solid foundation for functional genomic studies and the enhancement of important horticultural traits in Chinese cherries.


Introduction
Chinese cherry (Prunus pseudocerasus) is one of the most popular stone fruit species in China, with high economic and ornamental value [1].Its cultivation history in China dates back to 3000-4000 years ago [2].It originated in southwest China and is now usually cultivated in the temperate zones of the Northern Hemisphere [3,4].The plant is characterized by ovate leaves with serrated teeth, corymbose-racemose or umbel inf lorescences with two to six f lowers, white single suborbicular petals, 30-49 stamens nearly as long as its styles, soft hairs attached to the peduncle and pedicel, and red fruit (Fig. 1A and B; Supplementary Data Table S1).Notably, the fruits are rich in nutrients and trace elements, offering potential preventive benefits against cancer, Alzheimer's disease, and inf lammation-related disorders [5,6].The species' economic importance, potential advantages for human health, and advancing position in Chinese agriculture all contribute to the need for corresponding research that will maintain the global competitiveness of local growers.The haplotype-based chromosome number of P. pseudocerasus and other Cerasus species is x = 8, and the karyotype has been characterized [7][8][9].Both wild and cultivated populations of P. pseudocerasus are mainly tetraploid [8], indicating that the species has a stable ploidy level.
Prunus pseudocerasus cultivars vary widely in phenology, firmness, fruit size and shape, f lavor, aroma, texture, antioxidant activity, phenolic composition, and pulp and skin color.Compared with sweet cherry (P.avium L.), P. pseudocerasus is characterized by earlier blooming and maturation (20-30 days earlier) [10].Fruit firmness constitutes a critical quality metric for consumers; however, P. pseudocerasus exhibits suboptimal performance in this attribute (Supplementary Data Fig.S1).As such, the fruit's inadequate firmness renders it unsuitable for long-distance transportation, confining its distribution to U-pick operations and local markets-a key constraint on its rapid commercial proliferation throughout China.Conversely, due to its superior fruit firmness, sweet cherry is amenable to global distribution and possesses an extended shelf life, facilitating its worldwide sale.Accordingly, elucidating the reason behind the difference in fruit firmness between P. pseudocerasus and P. avium holds considerable importance.The processes underlying changes in fruit firmness mainly involve the disassembly of cell wall components, including celluloses, hemicelluloses, and pectins [11,12].Pectin is the most complex cell wall component, forming a matrix in which cellulose and hemicellulose are embedded [13].Degradation occurs via cell wall degrading enzymes, such as pectinases (e.g.polygalacturonases [PGs], pectate lyases [PLs]), xyloglucan endotransglucosylase/hydrolases (XTHs), and β-galactosidase (β-Gal) [14,15].Pectinases catalyze different pectins and alter their solubilization, whereas XTHs promote hemicellulose depolymerization [15].Thus, fruit softening is linked strongly to the expression of genes encoding these enzymes [16,17].Pectin biosynthesis requires various nucleotide sugars as activated sugar donor substrates [18].Galacturonic acid (GalA) is phosphorylated by galacturonic acid-1-phosphate kinase (GalAK) to produce UDP-GalA, which is the precursor for the synthesis of pectins [19].Glycosyltransferases (GTs) are involved in the biosynthesis of type II arabinogalactan (AG), which is one of the most complex polysaccharides in the cell wall [20].Stv1 is a homologous gene of the GT29 family, AtGALT29A, which functions in transferring galactose from UDPgalactose to a mixture of numerous oligosaccharides derived from arabinogalactan proteins [21].
The availability of polyploid plant genomes, as exemplified by the tetraploid Prunus species, has long been limited by a high degree of heterozygosity, thus impeding research endeavors related to heterosis, desirable traits, and genomic structure.However, technological advancements like third-generation sequencing now allow the assembly of high-quality genomes from an extremely heterozygous genetic background.For instance, chromosome-scale haplotype-resolved genomes of autotetraploid potatoes [22], autopolyploid sugars [23], autotetraploid alfalfa [24], and octoploid strawberries [25] are now available.For Prunus, the Chinese plum (P.mume) was the first genome to be fully sequenced [26].Genomes from the subgenus Cerasus have also been sequenced, including P. avium [27,28], P. yedoensis [29], a Cerasus × yedoensis hybrid ('Somei-Yoshino') [30], C. serrulata [31], P. fruticosa [32], P. campanulata [33], and P. pusillif lora [34].These genomic resources have greatly facilitated our understanding of Prunus/Cerasus evolution, origin, and genomic selection.However, P. pseudocerasus lacks a chromosome-level genome assembly, hampering efforts to address the fruit's undesirable lack of firmness.Furthermore, controversy exists over whether they are an autotetraploid or allotetraploid species [8].Herein, we assembled a chromosome-level haplotype-resolved P. pseudocerasus genome using Pacific Biosciences high-fidelity (PacBio HiFi) sequencing, Oxford Nanopore Technology (ONT) sequencing, Illumina sequencing, and high-throughput chromosome conformation capture (Hi-C) technology.We integrated transcriptomics, comparative genomics, and biochemical and functional assays to elucidate the evolution of the P. pseudocerasus genome and the diversification of fruit firmness between P. pseudocerasus and P. avium.The findings of our study offer a valuable resource to facilitate research in the identification of functional genes and molecular breeding in P. pseudocerasus.

Sequencing, assembly, and annotation of the P. pseudocerasus genome
We obtained 98.19 Gb of Illumina short-read data and 83.70 Gb of ONT long-read data (Supplementary Data Table S2).Genome size (1.09Gb) of P. pseudocerasus (PruPse) was estimated using f low cytometry (Supplementary Data Fig.S2; Supplementary Data Table S3).The P. pseudocerasus genome was twice the size of the diploid P. avium genome (2n = 2x), suggesting that P. pseudocerasus is a tetraploid species, which is consistent with previous reports [35,36].After obtaining the draft genome (824.19Mb) (Supplementary Data Table S4), we conducted a chromosomelevel assembly using 176.35 Gb of Hi-C reads.After correcting chromosomal order and orientation, the initial chromosome-level genome assembly (version 1.0, hereafter PruPse V1.0) comprised 154 scaffolds, covering 359.26 Mb, with a contig N50 of 1.11 Mb and a scaffold N50 of 33.00 Mb (Fig. 1C; Supplementary Data Table S5).Within the genome, 305.72 Mb (∼85.10%) was anchored to eight pseudochromosomes (Supplementary Data Table S6).The Hi-C heat map did not reveal any notable assembly errors among these eight pseudochromosomes, which were well linked along the diagonal line (Supplementary Data Fig.S3).BUSCO analysis of PruPse V1.0 indicated 97.60% completeness, with only 1.40% missing BUSCOs (Supplementary Data Table S7).We then identified ∼158.31Mb of repetitive sequences, comprising ∼51.78% of this genome, which includes simple repeats and transposable elements (Supplementary Data Table S8).The repeat-masked genome served as input data for gene predictors.A total of 41 811 protein-coding genes were identified in PruPse V1.0.The BUSCO completeness between the genome (97.6%) and the annotated gene set (94.1%) was relatively close, implying that most genes in PruPse V1.0 were annotated successfully (Supplementary Data Table S7).Specifically, 41 105 (98.31%) genes were annotated functionally via searches of public protein databases, such as non-redundant (NR), Swiss-Prot, and eggNOG (Supplementary Data Table S9).However, we were unable to assemble the four haplotype genomes of PruPse based on the Illumina and ONT sequencing data.
Single-molecule real-time sequencing is more accurate than ONT sequencing.Thus, we conducted PacBio HiFi sequencing using the Sequel II platform, assembling an allele-aware chromosome-level PruPse genome.After filtering low-quality sequences, 42.48 Gb of clean HiFi reads remained, representing 43-fold genome coverage (Supplementary Data Table S10).The PacBio HiFi clean reads were assembled using HiCanu [37], resulting in a 993.69-Mb draft genome with a contig N50 length of 7.05 Mb, and a high value (98.60%) of complete BUSCOs (Supplementary Data Tables S11-S13).Next, we used HiC-Pro and 3D-DNA to scaffold the tetraploid genome (Fig. 2).This algorithm uses Hi-C paired-end reads to build an allele-aware chromosomescale assembly for polyploid genomes [23].For each set, we clustered and ordered contigs using Hi-C data to generate eight chromosomes.We investigated the Hi-C contact matrix to determine assembly quality and found that chromosome groups were clearly delineated (Supplementary Data Fig.S4).We then obtained a haplotype-resolved PruPse assembly with four haplotypes of 246.32, 237.05, 225.55, and 192.94 Mb (hereafter Hap1, Hap2, Hap3, and Hap4; Supplementary Data Table S14), containing 97.50, 93.90, 88.50, and 75.10% complete BUSCOs, respectively (Table 1).This effort culminated in the haplotype-resolved genome assembly (PruPse V2.0), encompassing 994.21 Mb across 32 pseudochromosomes and unplaced unitigs, achieving a scaffold N50 of 27.26 Mb, and manifesting 98.60% completeness of BUSCOs.The 32 pseudochromosomes included eight homologous groups each with four allelic chromosomes.
We found that the sequence divergence (∼0.02) among allelic chromosome pairs was low (Fig. 2A).To clarify whether the P. pseudocerasus genome is auto-or allotetraploid, we systematically compared the four allelic chromosomes and found that they were similar in terms of gene structure (Fig. 2B-F).Moreover, gene expression patterns for each allelic chromosome group did not exhibit notable signs of overall allelic dominance (Fig. 2G).Synteny plots and the K a /K s ratios of syntenic gene pairs demonstrated a high degree of conserved synteny, with no notable overall difference in K a /K s ratio observed between any two allelomorphic chromosomes (Fig. 3; Supplementary Data Fig.S5).Thus, P. pseudocerasus appears to be a stable random-pairing autotetraploid species, a characteristic that hinders genomic deciphering.

Synteny and gene family analyses
We performed collinearity analyses and constructed three synteny maps after comparing the PruPse genome with the P. avium (PruAvi), P. persica (PruPer), and P. serrulata (PruSer) genomes (Fig. 4A-C).The synteny maps demonstrated that PruPse and the three species have strong collinear relationships with 2757, 6439, and 4147 syntenic blocks, respectively  S21-S24).A mass of gene syntenic blocks identified by comparing the four Prunus genomes were distributed across eight chromosomes, indicating strong crossspecies synteny (Fig. 4D).We observed all syntenic blocks that were located on the same chromosome, indicating that P. pseudocerasus is closely related to P. avium and P. serrulata (Supplementary Data Tables S25-S27).Orthologous clustering was performed on the PruPse, PruAvi, PruSer, P. yedoensis (PruYed), and PruPer genomes (Supplementary Data Tables S28-S32).We identified 18 075 gene families in PruPse, more than in the other four species (Fig. 5A).The five Prunus species shared 11 819 gene families, with PruPse harboring a higher number of unique gene families (1001) than those harbored by PruAvi (682), PruSer (368), and PruPer (96) (Fig. 5A).We subsequently compared the counts of multiple-and single-copy orthologs, unique paralogs, other orthologs, and unclustered genes between PruPse and 20 selected species (Fig. 5B; Supplementary Data Table S33).Comprehensive statistics on expanded, unique, and contracted gene families of the PruPse genome are presented in Supplementary Data Tables S34-S36.A number of 2229 and 4967 gene families contracted and expanded, respectively, in P. pseudocerasus after speciation from P. avium (Fig. 5C).The number of expanded gene families was greater in PruPse than that in PruAvi, PruSer, or PruPus.The unique, contracted, and expanded family genes exhibited significant enrichment (P < 0.05) across 44, 76, and 218 GO terms, respectively (Supplementary Data Tables S37-S39).Notably, expanded genes showed the most significant enrichment in the 'cell wall' term of cellular component (CC) (Supplementary Data Fig.S6), whereas contracted genes exhibited enrichment in the 'Casparian strip' term of CC (Supplementary Data Fig.S7).

Phylogenetic and whole-genome duplication event analyses
To investigate genome evolution, we compared PruPse with 20 other plant species, using Arabidopsis thaliana and Vitis vinifera as an outgroup.We used 107 high-quality single-copy gene families from 21 plant species to construct a maximum-likelihood phylogenetic tree and found that PruPse was a sister species to PruPus (Fig. 5C).The subgenus Amygdalus (PruPer and PruDul) clustered with the subgenus Prunus (PruMum, PruArm, and PruSal) and was most closely related to the subgenus Cerasus (PruYed, PruSer, PruPus, PruPse, and PruAvi) (Fig. 5C).Pyrus was most closely related to Malus, followed by Eriobotrya (Fig. 5C).
The three genera were located on two branches from Prunus (Fig. 5C).Based on TimeTree data (http://www.timetree.org/),we assessed when PruPse and other plant species diverged.S40-S42).We identified 63, 80, and 38 positively selected genes encoding transcription factors (TFs) with matched Pfam domains in PruPse vs PruAvi, PruPse vs PruSer, and PruPse vs PruPer, respectively (Supplementary Data Tables S43-S45).Functional analyses of common TFs, such as NAC, MYB, bHLH, ERF, and bZIP, suggest their involvement in stress response, growth and development, and metabolism in P. pseudocerasus.We compared the distribution of synonymous substitution rates (K s , Fig. 5D) to investigate whole-genome duplication (WGD) events in the PruPse genome.The K s distribution of PruPse showed a clear peak at ∼1.384, similar to other selected Rosaceae species, indicating that PruPse experienced one common WGD event in the Rosaceae family (Fig. 5D; Supplementary Data Table S46).Referring to the WGD event time in V. vinifera (117 Mya) [38], we estimated that the PruPse WGD event arose ∼139.96Mya (Supplementary Data Table S46).We then used K s distributions of orthologous genes to deduce divergence of the PruPse with other angiosperms (Fig. 5E).PruPse exhibited a single peak with PruPus, PruSer, PruAvi, PruPer, PruSal, PruDul, and PruMum at a K s value (f-i) K a /K s ratios of syntenic gene pairs identified between the first (f), second (g), third (h), and fourth (i) haplotype against the other three, respectively.Lines within the core connect synteny blocks, with blue ribbons indicating synteny blocks between the first haplotype and the other three, green ribbons indicating synteny blocks between the second haplotype and the third and fourth, and red ribbons indicating blocks between the third and fourth haplotypes. of 0.0139, 0.0227, 0.0266, 0.0416, 0.0458, 0.0475, and 0.0485, respectively (Fig. 5E; Supplementary Data Table S47).These values signify the sequential divergence within the subgenera Cerasus and Amygdalus/Prunus, indicating that the diversification among these seven Prunus species occurred relatively recently.Additionally, P. avium is posited to have diverged earlier than P. pusillif lora or P. serrulata (Fig. 5E).

Cell wall degradation is the principal determinant of the soft texture in PruPse compared with PruAvi
We selected our sequenced variety 'Zhuji Duanbing' and the sweet cherry cv.'Heizhenzhu' to measure texture-related variables at the ripe fruit (RF) stage (Fig. 6A).'Heizhenzhu' scored significantly higher in texture parameters (hardness and chewiness) than 'Zhuji Duanbing' (Fig. 6B and C; Supplementary Data Table S48).'Heizhenzhu' also displayed superior fruit firmness and pericarp strength compared with 'Zhuji Duanbing' at this stage (Fig. 6D  and E).We also inspected the cytological characteristics of ripe fruit using paraffin sections and cryo-scanning electron microscopy (cryo-SEM).The exocarp cell shape differed between 'Zhuji Duanbing' and 'Heizhenzhu', the former being square and closely arranged, while the latter was long, oval, and sparse (Fig. 6F and G).'Zhuji Duanbing' displayed greater disruption of pulp cell walls than that displayed by 'Heizhenzhu', likely attributable to more pronounced cell wall degradation (Fig. 6F).Subsequently, we analyzed the cell wall components of fruits at the small green fruit (SGF), veraison fruit (VF), and RF stages in both varieties.We observed a decline in total pectin, covalently soluble pectin (CSP), cellulose, and hemicellulose levels, whereas water-soluble pectin (WSP) and ionic-soluble pectin (ISP) levels increased in both varieties during fruit maturation (Fig. 6H-M).Furthermore, the soft-f leshed 'Zhuji Duanbing' contained higher pectin levels than the hard-f leshed 'Heizhenzhu' at the SGF and VF stages, but lower pectin levels at the RF stage, indicating a sharper decrease in total pectin content for 'Zhuji Duanbing' than for 'Heizhenzhu' during fruit maturation (Fig. 6H).In the SGF and VF stages, 'Zhuji Duanbing' had lower WSP and ISP levels than those exhibited by 'Heizhenzhu', but both forms of pectin increased more rapidly in 'Zhuji Duanbing' as fruits matured (Supplementary Data Table S49), resulting in WSP and ISP levels at the RF stage being 1.41 and 1.31 times higher, respectively, in 'Zhuji Duanbing' than in 'Heizhenzhu' (Supplementary Data Table S49).Concurrently, 'Zhuji Duanbing' fruits contained less CSP and hemicellulose than 'Heizhenzhu' fruits (Fig. 6J and M), with both variables decreasing more sharply during fruit maturation in the former cultivar than in the latter (Supplementary Data Table S49).Cellulose content exhibited a comparable decline in both varieties from the SGF to the VF stage, but dropped to a greater extent in 'Zhuji Duanbing' from the VF to the RF stage (Fig. 6L).Overall, these results indicated that fruit softening in the two varieties depends on cell wall degradation, with differences in fruit firmness between the two varieties attributed to varying degrees of alteration in pectin, cellulose, and hemicellulose contents.

GalAK-like and Stv1 potentially regulate cherry fruit firmness
To better clarify the difference in fruit firmness between 'Heizhenzhu' and 'Zhuji Duanbing, a comparative transcriptomic analysis using SGF, VF, and RF fruits from the two varieties detected 41 811 and 25 720 genes, respectively (Supplementary Data Table S52).We then used the STEM program to calculate differential expression (log 2 fold change, FC) for all developmental stages relative to the control (SGF).We observed eight expression patterns (EPs) (3 (n − 1) − 1) for the three developmental stages: EP0-EP7 (Fig. 8A; Supplementary Data Table S53).Subsequent analyses focused on EP3 and EP4 since these genes exhibited differential expression only in RF fruits (Fig. 8B; Supplementary Data Table S53).Next, we used OrthoFinder to obtain orthogroups, orthologs, an orthogroup root gene tree, and all occurrences of gene duplications in the two varieties.Ortholog networks varied in complexity, comprising 2099 different subnetworks with manyto-many, one-to-many, many-to-one, and one-to-one orthogroups (Fig. 8A; Supplementary Data Table S54).GO analysis of nonorthologous EP3 and EP4 genes revealed that they were enriched in several processes, including cell wall modification, cuticle hydrocarbon biosynthesis, and primary metabolic processes (Fig. 8C; Supplementary Data Table S55).Specific functions of different EPs in 'Zhuji Duanbing' and 'Heizhenzhu' (Fig. 8D) included organic substance metabolic process and primary metabolic process (Fig. 8E; Supplementary Data Table S55).To identify core EP3 and EP4 members, we performed a correlation network analysis.The results showed that GalAK-like and Stv1 had the best connectivity among the genes involved in primary and organic substance metabolic processes (Fig. 8F; Supplementary Data Table S56).The two genes were differentially expressed in the SGF, VF, and RF stages of both 'Zhuji Duanbing' and 'Heizhenzhu' (Fig. 8G).Subcellular localization using f luorescent protein-tagging then revealed that PpsGalAK-like and PpsStv1 were expressed mainly in the nucleus and cytoplasm (Fig. 8H), consistent with the predictions obtained by CELLO v.2.5.

PpsGalAK-like and PpsStv1 affect pectin content and thereby change fruit firmness
We transiently overexpressed PpsGalAK-like and PpsStv1 in cherry fruits to ascertain their function (Fig. 9A).Initial verification of transient transformation in cherry fruits was conducted through PCR amplification and microscopic examination (Fig. 9B).PpsGalAK-like expression levels were considerably increased by 24.0-, 43.7-, and 41.9-fold in transgenic fruits
Phylogenetic analysis revealed that five Cerasus species formed a distinct branch with the shortest divergence time, separate from the subgenus Prunus species.Consistent with previous findings, PruAvi diverged earlier than PruPus, PruSer, and PruYed [34,35].Prunus pseudocerasus is more closely related to P. pusillif lora than to P. avium; this may be due to the fact that P. pseudocerasus and P. pusillif lora originated from southwest China [34].Additionally, we estimated that the genus Prunus diverged ∼35.56 Mya, and the estimated divergence time between PruSer/PruYed and PruAvi was ∼26.78 Mya.In contrast, Chin et al. [42] and Baek et al. [29] estimated that Prunus diverged ∼56.0 and 66.2 Mya, respectively.Similarly, the age estimates for the split of the subgenera Prunus and Amygdalus were ∼44.0 Mya and the divergence time between PruYed and PruAvi was estimated to be ∼35.9Mya [29].Recently, Yi et al. [31] estimated that Prunus diverged ∼28.14 Mya, while PruSer/PruYed and PruAvi separated ∼21.44 Mya.These discrepancies in estimates may be attributable to the differences in number and species of the plant materials, leading to distinctions in the selected single-copy homologous genes.Alternatively, differences in the timeframes of fossil records used for checksum estimation could also contribute to these discrepancies.Meanwhile, our investigation revealed that P. pusillif lora, like other species within the Rosaceae family, underwent a common WGD event.As P. pusillif lora is a tetraploid species, we posit that, in addition to the WGD event shared with the Rosaceae family, it has experienced a lineage-specific polyploidy event.However, predicting this event solely based on homologous genes is challenging.We observed minor peaks in regions where the K s distribution closely approached zero across several species (PruAvi, PruSer, PruSal, PruPus, and PruPer).These peaks could potentially be attributed to fragmentation or repetition within the genomes of these species (Fig. 5D).
Our research sheds light on the molecular basis of fruit firmness and provides potential targets for improving this desired trait in P. pseudocerasus.We observed higher hemicellulose and cellulose contents in hard-f leshed 'Heizhenzhu' compared with soft-f leshed 'Zhuji Duanbing' at the RF stage (Fig. 6), consistent with the results from other fruits (blueberries and apples) [11,43].In addition, our findings revealed that degradation rates of cellulose, hemicellulose, and pectin contributed to firmness differences between Chinese cherry cv.'Zhuji Duanbing' and sweet cherry cv.'Heizhenzhu' (Fig. 6).Notably, however, our results were at odds with a previous report by Zhai et al. [44], indicating that only hemicellulose and pectin degradation rates contribute to differences in firmness between soft-and hard-f leshed sweet cherry varieties.This apparent contradiction may be the result of different genetic backgrounds in experimental materials.Regardless, transcriptomic analysis showed that the pectin synthesis genes GalAK-like and Stv1 were differentially expressed between 'Zhuji Duanbing' and 'Heizhenzhu', strongly suggesting the involvement of pectin content in fruit firmness.
The available technology (Hi-C, HiFi, and allele-aware assembly algorithms) allowed us to assemble all allelomorphic chromosomes of P. pseudocerasus, although the features of tetrasomic inheritance may have caused slight errors in phasing all allelomorphic chromosomes.Nevertheless, this haplotype-resolved genome is well assembled and has adequate quality, which can be used in genetic anatomy and molecular breeding of Chinese cherries.In the future, we plan to draw a pan-genome map of Cerasus species to further explore the structure variations and copy number variations related to the fruit firmness and f lowering periods, which will clarify the role of structure variations in the domestication of Cerasus species and help to elucidate their important economic traits.In conclusion, we present a haplotype-resolved genome assembly of Chinese cherry.This is the first step to fully grasping the molecular underpinnings of various desired traits in an economically important Cerasus species, although more research on chromosomal structural diversity and the mechanisms of complex allelic expression is required.Nevertheless, our work paves the way for related research in molecular biology, comparative genomics, genetics, and breeding of Chinese cherries.

Plant materials and sequencing
Prunus pseudocerasus (Lindl.)G. Don cv.'Zhuji Duanbing' was selected for genome sequencing and assembly.Fruits and leaves were sampled from a 6-year-old 'Zhuji Duanbing' tree at Shanghai Jinyuan Fruit and Vegetable Cultivation Professional Cooperative in Shanghai, China (121 • 24 13 N, 30 • 52 18 E).Stamen number, f lower number of each inf lorescence, f lower pedicel length, fruit weight, total soluble solids, titratable acid (TA), length and diameter of fruit pedicel, and longitudinal and transverse diameters of fruits were determined in this study.The TA value was expressed as percentage of anhydrous malic acid.Genomic DNA was extracted from fresh, young leaves via a DNeasy Plant Mini Kit (Tiangen Biotech Co. Ltd, Beijing, China).The concentration and purity of DNA were assessed using a Nanodrop 2000 spectrophotometer (Thermo Fisher Scientific Inc., Waltham, MA, USA) and Qubit 3.0 (Thermo Fisher Scientific Inc.).After determining DNA integrity via pulsed-field 0.8% agarose gel electrophoresis, samples were separately packaged for Illumina, HiFi, ONT ultralong, and Hi-C sequencing.For Illumina shortread sequencing, several paired-end libraries were constructed using GenElute Plant Genomic DNA Miniprep kits (Sigma-Aldrich Corp., St Louis, MO, USA) as described by the manufacturer on an Illumina HiSeq X Ten platform (Illumina Inc., San Diego, CA, USA) (Supplementary Data Table S57).For long-read sequencing, a Nanopore library was prepared using a PromethION 48 device (Oxford Nanopore Technologies, Oxford, UK) at Novogene Co. Ltd (Beijing, China).For PacBio HiFi sequencing, a standard SMRTbell library was prepared from 50 μg of DNA using the SMRTbell Express Template Prep Kit 2.0 and sequenced on a PacBio Sequel II (Pacific Biosciences, CA, USA).Three Hi-C libraries were constructed via chromatin extraction and digestion, followed by DNA ligation, purification, and fragmentation [45], before sequencing on the Illumina HiSeq X Ten platform.Total RNA was extracted and evaluated as described by Jiu et al. [46].A total of 18 cDNA libraries representing the three fruit samples at the SGF, VF, and RF stages, as well as blooming f lower, young leaf and mature leaf of 'Zhuji Duanbing', were constructed as described by Quan et al. [47].Transcriptome sequencing was conducted on an Illumina NovaSeq platform.Fruit samples of SGF (10 days after full bloom [DAFB]), VF (25 DAFB), and RF (40 DAFB) stages from 'Heizhenzhu' were collected at the cherry orchard of Shanghai Yingzhiyuan Agricultural Technology Co., Ltd (121 • 32 56 N, 30 • 56 45 E).The detailed analysis procedures followed published descriptions [9].In addition, six Chinese cherry varieties ('Zhuji Duanbing', 'Xiangxing', 'Manaohong', 'Zhushahong', 'Lixianxiaoyingtao', and 'Wenchuanxiaoheiyingtao') and three sweet cherry varieties ('Lita', 'Daxing', and 'Geleisixing') were collected at the orchard of the Sichuan Academy of Agricultural Sciences (30 • 46 47 N, 104 • 12 30 E).

Genome assembly and evaluation
Prior to de novo genome assembly, the genome size of PruPse was estimated via f low cytometry (using BD FACScalibur) using tomato as an internal standard.Fastp v.0.20.2 [48] was used to perform the quality control of the next-generation sequencing (NGS) data, including Hi-C reads and whole-genome sequencing paired-end reads, with default parameters to produce clean reads.For the Nanopore data, passed reads were assembled into a de novo genome using NECAT v.0.0.1 with default parameters [49].Subsequently, the assembly underwent three rounds of polishing using Racon with default parameters [50].All clean NGS genome paired-end reads were performed with two iterations of polishing using Pilon v.1.21with default parameters [51].Subsequently, the redundant sequences were removed using purge_dup v.1.2.5 with default parameters, and the final contig genome was produced.The total clean Hi-C data were used to perform chromosome-scale genome assembly using HiC-Pro [52] and 3D-DNA v.180922 [53] with default parameters.Based on the karyotype results [8] and Hi-C heat maps, the chromosome-level genome was manually checked for misorientation using Juicer v.1.6.2 [54].NGS data were aligned to the assembly using the Burrows-Wheeler Aligner with default settings, yielding an estimate of coverage ratio [55].For the PacBio Sequel II HiFi reads, the genome was assembled using HiCanu with default parameters [37].We used the same methods as above for chromosome scaffolding and correction of the resulting genome.Genome integrity was assessed with the LAI, which was calculated using LTR_FINDER v.1.0.7 [56] and LTR_retriever [57].The completeness and accuracy of the PruPse genome were assessed with BUSCO v.5.3.1 [58].

Sequence divergence, gene structure, and expression level for genes with four alleles
We aligned each allelic chromosome to others using MUMmer v.4.0.0beta2 (parameters: nucmer −g 1000 − c 90 − l 40 − t 6), and calculated the sequence divergence for each alignment block.We also calculated exon length and number, intron, mRNA, and protein length of four haplotypes.The expression levels of genes were quantified in FPKM values using StringTie v.1.3.4,with default parameters.Subsequently, we counted the sum of all gene expression levels on each chromosome of the four haplotypes.Finally, we visualized them using R (https://www.r-project.org/).

qRT-PCR
Extracted RNA was reverse-transcribed using a HiScript ® III RT SuperMix Kit (Vazyme, Nanjing, China).Then, qRT-PCR was conducted on a LineGene 9600 Plus Fluorescent Quantitative Detection System (FQD-96A, Bioer, Hangzhou, China) using a TB Green™ Premix Ex Taq™ II kit (TaKaRa, Tokyo, Japan).The operating parameters were determined according to the method described by Jiu et al. [46].Relative expression levels of genes were determined using the 2 −ΔΔT method as previously described by Livak and Schmittgen [106], with PavActin1 serving as the reference gene for normalization.The qRT-PCR primer sequences are presented in Supplementary Data Table S58.

Subcellular localization
The coding sequences of PavGalAK-like and PavStv1 without the stop codon were amplified and inserted into the pHB vector, which harbors two caulif lower mosaic virus (CaMV) 35S promoters, a GFP f luorescent protein tag, and a translation enhancer.This process generated two fusion constructs, namely p35S-PavGalAKlike-GFP and p35S-PavStv1-GFP.Transformation was performed as previously described [107].Fluorescent protein localization was detected 3-5 days post-infiltration, when GFP f luorescence reached optimal levels, using a confocal laser scanning microscope (Zeiss LSM 780, Germany).

Transient overexpression in cherry fruits
Transient overexpression was conducted following previously established methods [44], with slight modifications.The fulllength coding sequences of PavGalAK-like and PavStv1 were amplified and inserted into the pHB vector and transformed into Agrobacterium tumefaciens GV3101.Approximately 200 μl of A. tumefaciens suspension containing PavGalAK-like/PavStv1 and the control empty vector were then separately injected into the cherry fruits by injecting it until the entire fruit was infiltrated.Approximately 20 fruits were permeated with each gene.The inoculated cherry fruits were harvested 1 week after infiltration, and 15 fruits were collected after discarding malformed ones for subsequent analyses.Genomic DNA was extracted using the CTAB method [108].The primer sequences for transient overexpression assay are presented in Supplementary Data Table S58.

Statistical analysis
All data were analyzed using SAS software v.9.3 (SAS Institute Inc., Cary, NC, USA).Statistical differences were determined using a two-tailed Student's t-test.The results of fruit texture parameters are presented as the means of at least eight replicates (± standard deviation).For qRT-PCR, cell wall components, and enzyme activity assays, data are presented as the means of three replicates (± standard deviation), with each replicate consisting of five fruits.Significance was set at P < 0.05.

Figure 1 .
Figure 1.De novo genome assembly of Prunus pseudocerasus.A Phenotypic characteristics of P. pseudocerasus trunks, f lowers, leaf buds, f lower buds, leaves, berries, seeds, and peduncles collected between February and May of 2022 and 2023.B Fruit and f lower parameters, such as stamen number, f lower number of each inf lorescence, f lower pedicel length, single fruit weight, total soluble solids, titratable acid, fruit pedicel length and diameter, and fruit longitudinal and transverse diameters.C Summary of de novo genome assembly and sequencing analysis of P. pseudocerasus.Moving from outside to inside, the tracks indicate: (a) chromosome size (Mb), (b) GC content, (c) protein-coding gene density, (d) density of DNA transposons, (e) density of LTR/Copia transposons, (f) density of LTR/Gypsy transposons, and (g) synteny blocks among P. pseudocerasus chromosomes.

Figure 2 .
Figure 2. Sequence divergence, gene structure, and expression levels of genes with four alleles from cultivated PruPse.A Sequence divergence between allelic chromosomes.Each allelic chromosome was aligned to others using BLAST with default parameters.Sequence divergence was calculated for each alignment block.The distribution of sequence divergence peaks at ∼0.02.A comparison of gene structures revealed that exon length (B), exon number (C), intron length (D), mRNA length (E), and protein length (F) were highly similar across the four haplotype genomes of PruPse.G Expression levels of genes with four alleles, presented as fragments per kilobase of exon model per million reads mapped (FPKM) values.

Figure 3 .
Figure 3. Overview of the haplotype-resolved genome assembly of the 'Zhuji Duanbing' (Prunus pseudocerasus) genome.From the outermost to the innermost tracks, the following features are indicated: the density of (a) LTR transposons, (b) LINE transposons, (c) DNA transposons, (d) genes; (e) gene expression levels;(f-i) K a /K s ratios of syntenic gene pairs identified between the first (f), second (g), third (h), and fourth (i) haplotype against the other three, respectively.Lines within the core connect synteny blocks, with blue ribbons indicating synteny blocks between the first haplotype and the other three, green ribbons indicating synteny blocks between the second haplotype and the third and fourth, and red ribbons indicating blocks between the third and fourth haplotypes.

Figure 7 .
Figure 7. 3D visualization of voids in Chinese cherry ('Zhuji Duanbing') and sweet cherry ('Heizhenzhu') fruits using X-ray micro-CT.A Image of fruit extrusion treatment using texture analyzer.B-D Grayscale images in XY (B), XZ (C), and YZ (D) directions of mature 'Zhuji Duanbing' fruits depicting voids (dark areas represent voids and light areas represent organic matter).E 3D model of mature 'Zhuji Duanbing' fruit.F Longitudinal 3D image of rupture at the peduncle base of 'Zhuji Duanbing' fruits after extrusion treatment.G 3D model of rupture at the top of 'Zhuji Duanbing' fruits after extrusion treatment.H-J Grayscale images in XY (H), XZ (I), and YZ (J) directions of mature 'Heizhenzhu' fruits depicting voids.K 3D model of mature 'Heizhenzhu' fruit.L Volume distribution of voids in 'Zhuji Duanbing' and 'Heizhenzhu' fruits after extrusion.M Comparison of total void volume and void proportion between 'Zhuji Duanbing' and 'Heizhenzhu' fruits.Blue arrow indicates the location of the damaged crack at the peduncle base; white arrow indicates the damaged crack at the top of the fruit; red arrow indicates void location.

Figure 8 .
Figure 8. GalAK-like and Stv1 potentially regulate cherry fruit firmness.A Expression patterns (EPs) and evolutionary relationship data were combined to identify non-orthologous genes with the same expression pattern.HF, hard-f leshed; SF, soft-f leshed.B EP3 and EP4 showed significant differential expression only at the RF stage.C GO enrichment of non-orthologs in EP3 and EP4.Color ref lects the P-values of the GO terms.D Venn diagram of specific functions of different EPs in soft-f leshed 'Zhuji Duanbing' and hard-f leshed 'Heizhenzhu'.E Specific functions of EP3 and EP4.Color ref lects the P-values of GO terms.F Identification of core members among EP3 and EP4.Color depth indicates the degree of correlation.The size of the symbols denotes network connectivity, where larger symbols signify superior connectivity (P < 0.05).Red edges indicate positive correlation, and blue edges indicate negative correlation.G Orthologs of GalAK-like and Stv1 were significantly expressed in soft-f leshed 'Zhuji Duanbing' and hard-f leshed 'Heizhenzhu'.Colors ref lect gene expression level, with redder shades corresponding to higher expression.H Subcellular localization of PpsGalAK-like and PpsStv1 proteins in tobacco leaf epidermal cells.35S::GFP was used as the positive control.Green f luorescence denotes the GFP fusion protein signal; red f luorescence denotes nucleus marker p2300-mCherry.Saffron indicates merged signals.Scale bars = 25 μm.

Figure 9 .
Figure 9. Transient overexpression of PpsGalAK-like and PpsStv1 in cherry fruits.A Phenotype of PpsGalAK-like and PpsStv1 transgenic cherry fruits.Green arrows point to injection sites.Scale bar = 5 mm.B Confirmation of transformations based on the amplifications of PCR amplicons of the plasmid sequence from extracted genomic DNA.C, D Relative expression levels of PpsGalAK-like (C) and PpsStv1 (D) in PpsGalAK-like-OE and PpsStv1-OE transgenic fruits.E, F Firmness of PpsGalAK-like-OE (E) and PpsStv1-OE (F) transgenic fruits.G-N Relative expression levels of PL5 (G), QRT3 (H), β-Gal (I), and XTH26 (J) in PpsGalAK-like-OE transgenic fruits, and of PL5 (K), QRT3 (L), β-Gal (M), and XTH26 (N) in PpsStv1-OE transgenic fruits.O-T Enzymatic activity of PL (O), PG (P), and XTH (Q) in PpsGalAK-like-OE transgenic fruits, and of PL (R), PG, β-Gal (S), and XTH (T) in PpsStv1-OE transgenic fruits.U, V Protopectin content in PpsGalAK-like-OE (U) and PpsStv1-OE (V) transgenic fruits.Data are shown as means ± standard deviation of three biological replicates, each with five fruits.* P < 0.05 (Student's t-test) relative to the empty control (GFP).

Table 1 .
Metrics of the PruPse genome assembly.