Haplotype-resolved chromosomal-level genome assembly reveals regulatory variations in mulberry fruit anthocyanin content

Abstract Understanding the intricate regulatory mechanisms underlying the anthocyanin content (AC) in fruits and vegetables is crucial for advanced biotechnological customization. In this study, we generated high-quality haplotype-resolved genome assemblies for two mulberry cultivars: the high-AC ‘Zhongsang5801’ (ZS5801) and the low-AC ‘Zhenzhubai’ (ZZB). Additionally, we conducted a comprehensive analysis of genes associated with AC production. Through genome-wide association studies (GWAS) on 112 mulberry fruits, we identified MaVHAG3, which encodes a vacuolar-type H+-ATPase G3 subunit, as a key gene linked to purple pigmentation. To gain deeper insights into the genetic and molecular processes underlying high AC, we compared the genomes of ZS5801 and ZZB, along with fruit transcriptome data across five developmental stages, and quantified the accumulation of metabolic substances. Compared to ZZB, ZS5801 exhibited significantly more differentially expressed genes (DEGs) related to anthocyanin metabolism and higher levels of anthocyanins and flavonoids. Comparative analyses revealed expansions and contractions in the flavonol synthase (FLS) and dihydroflavonol 4-reductase (DFR) genes, resulting in altered carbon flow. Co-expression analysis demonstrated that ZS5801 displayed more significant alterations in genes involved in late-stage AC regulation compared to ZZB, particularly during the phase stage. In summary, our findings provide valuable insights into the regulation of mulberry fruit AC, offering genetic resources to enhance cultivars with higher AC traits.


Introduction
The Morus genus, belonging to the Moraceae family, is found all over the world, but it is most common in the tropical and subtropical areas of the Northern Hemisphere.Because of their delicious f lavor, eye-catching color, high nutritious content, and low-calorie content, mulberry fruits are widely grown throughout Asia.The cultivation area in China spans over 100 000 hectares, resulting in substantial seasonal harvests.Beyond its status as a consumable fruit, mulberry holds a historical legacy as both a dietary staple and a traditional medicinal herb [1].Mulberries have various chemical constituents across their diverse types and maturation phases, including amino acids, fatty acids, minerals, polyphenols, and polysaccharides.Extracts obtained from mulberry fruits and their key bioactive elements, such as anthocyanins, rutin, and polysaccharides, have demonstrated a spectrum of biological activities, as evidenced by both in vitro and in vivo investigations [2].Anthocyanins are the primary f lavonoid compounds found in purple mulberries, with cyanidin-3-O-rutinoside (C3R) and cyanidin-3-O-glucoside (C3G) being predominant [3,4].Notably, compared to their yellow/white counterparts, red/purple fruits accumulate more anthocyanin [5].The total AC significantly increases from the green/light red stage to the black background transition, accompanied by visible color changes [6,7].As a result, the color composition of mulberry fruit reveals valuable compounds and facilitates germplasm innovation.
Anthocyanins, renowned for their robust antioxidant attributes, have undergone a comprehensive exploration of their plausible human health benefits and anti-aging implications.The biosynthesis pathway and transcriptional regulatory intricacies governing anthocyanin pigmentation have been extensively studied in the context of crops, horticultural species, and model organisms [8][9][10].Enzymatic reactions involving structural genes synthesize various modified forms of anthocyanins, such as glycosylation and acylation [11,12].These synthesized anthocyanins are transported to vacuoles for stable accumulation and storage in the cytoplasm [13,14].Regulatory factors affect the regulation of structural genes involved in anthocyanin biosynthesis [15], including the MBW complex [16,17], ERF [18,19], NAC [20], and WRKY [21,22].Furthermore, the pH of the vacuole affects the degree of color change [23], potentially inf luencing variations in AC among plant species and cultivars [24,25].Unlike f leshy fruits derived from ovary development, mulberries form through the thickening of the corolla and ovary walls of pistillate f lowers, suggesting a different coloring mechanism [26].Previous studies have identified genes and molecular mechanisms associated with mulberry fruit coloration, including pigment synthesis, transport, and degradation [27][28][29].Notably, recent research has demonstrated that the MYBA-bHLH3-TTG1 transcription factor complex regulates anthocyanin biosynthesis with varying levels and proportions of anthocyanins, f lavones, and f lavonols found in mulberry fruits of different colors, such as purple, yellow, and white [30].Despite notable advancements, substantial lacunae remain in comprehending the regulatory framework and pivotal genes dictating mulberry fruit color characteristics.This underscores the imperative of continued inquiry and extensive exploration in this domain.
Multi-omics integrative analysis has emerged as a powerful technique for the in-depth exploration of plant genomes, allowing breakthroughs in the mulberry industry by combining highquality genomes with traditional breeding programs.Following the publication of the first mulberry genome of Morus notabilis [31], several high-quality chromosome-level genomes were made available [32][33][34][35].However, the high heterozygosity, polyploidy, and extensive diversity of most mulberry varieties present challenges in unlocking the genome.Consequently, molecular-assisted breeding of elite cultivars and metabolic engineering of anthocyanins is hindered.Recent advances in PacBio HiFi read sequencing technology have enabled the generation of haplotype-resolved assemblies in tea plants [36], Artemisia annua [37], and kiwifruit [38].These studies have greatly facilitated researchers in studying variety and species variations, obtaining more accurate chromosomal maps [39].Integrative transcriptomic and metabolomic analyses are highly available for exploring new genes and critical biological pathways, providing a better understanding of the anthocyanin biosynthesis mechanism in mulberries.Whole-genome resequencing (WGS) of extensive germplasm collections has become feasible with the advent of low-cost and high-throughput nextgeneration sequencing technologies [40].GWAS methods are commonly employed to identify genetic loci associated with traits, offering increased resolution.A combination of WGS and GWAS applied to complete genetic populations or diverse germplasm collections has been successfully employed in some crops [41][42][43].
We conducted sequencing of two diploid mulberries (Morus) cultivars, namely ZS5801 and ZZB.ZS5801, a valuable hybridderived variant, is notable for its prolific purple fruits.Conversely, ZZB, characterized by its ivory hue, exhibits exceptional sweetness and translucent white f lesh resembling pearls, rendering it a valuable asset for varietal development.Our objective was to decipher the molecular underpinnings governing the accumulation of fruit color, a pivotal domestication trait in cultivated Morus trees, by employing a multi-faceted omics approach.Furthermore, we extended our scope by resequencing multiple color-representative mulberry fruits to scrutinize the genetic diversity of fruit color across distinct geographic populations and identify potential loci via association analysis.Our discoveries provide insightful perspectives on the mechanisms underpinning heterosis and fruit color buildup in mulberries, presenting a novel and comprehensive regulatory network.

Sequencing, assembly, and annotation of haplotype-resolved genomes for ZS5801 and ZZB
We selected two representative mulberry cultivars in this study, ZS5801 and ZZB, showing notable AC variations for comprehensive genome sequencing.A high-depth sequencing approach was employed for ZS5801, incorporating 80× Illumina paired-end reads (PE150), 98× PacBio Hi-Fi reads, and 118× Hi-C data.Similarly, ZZB was subjected to sequencing involving 82× Illumina paired-end reads (PE150), 73× Hi-Fi reads, and 79× Hi-C data (Table S1, see online supplementary material).Initial insights into the overall genomic attributes were gleaned using high-depth Illumina paired-end reads (150 bp).Genome size estimation ranged between approximately 237 Mb and 256 Mb for both cultivars, accompanied by heterozygosity rates ranging from 1.27% to 1.45% (Fig. S1, Table S2, see online supplementary material).We conducted preliminary assembly by leveraging the hifiasm (v0.16.1-r375) [44] algorithm in 'hic' mode, yielding four haploid genomes for ZS5801 and ZZB (Fig. 1A and B, Table 1): ZS5801_Hap1, ZS5801_Hap2, ZZB_Hap1, and ZZB_Hap2, while retaining the primary assembly.The lengths of these haploid assemblies encompassed approximately 300 Mb (293.6-301.3Mb) and 319 Mb (309-314.9Mb), respectively.Notably, the contig N50 values ranged from 2.6 Mb to 8.9 Mb, indicating a high level of contiguity within the separated contig assemblies.Chromosome interaction maps aligned with the preliminary assembly were constructed using the HiC-Pro (v3.1.0)[45] program (Fig. S2, see online supplementary material).The interaction signals anchored each of the 14 pseudochromosomes to a distinct newly assembled contig, suggesting the absence of chromosome loss for all chromosomes.
The accuracy and reliability of the assemblies underwent thorough assessment through various methodologies.Initially, the precision of contig separation was verified by aligning short Illumina reads individually to single haplotypes and merged haplotypes of both genomes, followed by single nucleotide polymorphism (SNP) calling.Utilizing a solitary haplotype as a reference yielded a notably higher SNP density than employing both haplotypes (Fig. 1C and D; Fig. S3, see online supplementary material), attesting to the precise regional separation achieved within the assemblies.Moreover, our analysis unveiled an impressive 99.01% and 94.10% of Illumina paired-end short sequences displaying optimal pairing with the ZS5801 and ZZB assemblies, effectively accounting for 99.58% and 99.68% of assembly positions, respectively (Table S1, see online supplementary material).These results decisively confirm the comprehensive representation of sequence content within the assemblies.
Further evaluation of assembly integrity employed benchmarking universal single-copy orthologs (BUSCO) (v5.3.0)[46] analysis, which revealed a high level of completeness in genic regions, with 97% (1565/1614) in ZS5801 and 97.3% (1570/1614) in ZZB (Fig. S2, see online supplementary material).The assemblies successfully tackled complexities arising from intricate repetitive sequences.Assembly challenges often arise from highly complex repetitive sequences.The long terminal repeat (LTR) assembly index (LAI) [47] values for all four haplotypes exceeded 11.9 (Table 1), signifying their reference-grade quality and robust completeness, even in the presence of challenging repetitive elements.Telomeres were identified in the genomes of ZS5801 and ZZB using the conserved telomeric repeat sequence (5 -CCCTAAA/TTTAGGG-3 ) as a query (Table S3, see online supplementary material).In ZS5801, eight telomeres were Haplotype 1 as ref.
ZS5801 Whole Genome

SNP count per 1Kb
Figure 1.Genome assembly and annotation of ZS5801 and ZZB Haplotypes.A and B Circular diagrams illustrating the characteristics of haplotype 1 for ZS5801 and ZZB, respectively.From outer to inner tracks: (a) chromosome ideograms, (b) GC content (500 kb window size), (c) repetitive sequence content (500 kb window size), (d) gene density (500 kb window size), (e) SNP density (500 kb window size), (f) indel density (500 kb window size), and (g) interchromosomal synteny blocks.C From left to right: The first boxplot depicts SNP density in 1 Kb windows, using the fusion of two haplotypes as a reference.The second and third boxplots display the SNP density of haplotypes 1 and 2, respectively, with single haplotypes as references.When using a single haplotype as a reference, reads align to allelic regions, revealing heterozygous sites as SNPs and resulting in a notably higher SNP density than a single-haplotype reference.D Detailed representation of SNP density distribution for chromosome 1 in ZS5801 for the two haplotypes, employing distinct reference genomes.
identified in haplotype 1, six telomeres in haplotype 2, and 11 telomeres in the monoploid genome.In ZZB, seven telomeres were identified in haplotype 1, eight telomeres in haplotype 2, and 10 telomeres in the monoploid genome.The approximate locations of the centromeres were also determined and recorded in Table S3 (see online supplementary material).To assess the switch errors between the two haplotypes of ZS5801 and ZZB, the calc_switchErr pipeline (https://github.com/tangerzhang/calc_switchErr/) was used.The switch error rate for ZS5801 was calculated to be 6.42% (25 083 out of 390 462), while the switch error rate for ZZB was 7.93% (39 420 out of 497 258) (Table 1).Previous studies have demonstrated that a high switch error rate (15%) can be observed in genomes with low sequence diversity (0.5%) [48].In the case of ZS5801 and ZZB, their switch error rates were relatively low, which may be attributed to their higher heterozygosity rates (>1%).It is worth noting that the switch error rates can also be inf luenced by factors such as the contig assembly method and the process of Hi-C data scaffolding.
Repetitive sequences within the genomes spanned regions from 151 Mb to 159 Mb and 167 Mb to 172 Mb, constituting 51.61-52.61% of ZS5801 and 54.12-54.65% of ZZB (Table S4, see online supplementary material).Genome annotation employed an integrated approach, combining de novo, homology-based, and transcript-based predictions via the EvidenceModeler (v1.1.1)[49] (EVM) pipeline.This process yielded 23 950-24 283 highconfidence protein-coding gene models across the four haplotypes (Table 1).These models featured average gene and coding sequence (CDS) lengths of 5089.S6, see online supplementary material).These results collectively establish the high integrity, continuity, and precision of the haplotype-resolved assemblies for the two mulberry varieties.These assemblies confidently facilitated subsequent downstream analyses.

Genomic evolution and comparative analysis of ZS5801 and ZZB
ZZB, a colorless mulberry variety native to northern China, is esteemed for its sweet f lavor, while ZS5801, a pigmented mulberry cultivar, boasts enhanced yield and nutrient composition.Decoding the genetic makeup of ZS5801 holds promise for elevating AC, fruit production, and expanding cultivation areas.We compiled protein-coding genes from various sources, including Arabidopsis thaliana, banyan, and published mulberry genomes, to assess the phylogenetic relationship between ZS5801 and ZZB (Table S7, see online supplementary material).We elucidated their evolutionary dynamics by analysing 3584 single-copy gene families.The constructed maximum likelihood phylogenetic tree indicated a relatively recent divergence between ZS5801 and ZZB approximately 1.84 million years ago (Fig. 2A and B).Seventy gene families expanded, and 183 contracted, linked to processes such as protein ubiquitination, ATP-dependent enzymes, and defense responses during the speciation of ZS5801 (Fig. S4A and Table S7, see online supplementary material).In the ancestral development of ZZB, 76 gene families expanded, while 142 contracted, potentially connected to pollen recognition and salt stress responses (Fig. S4B and Table S7, see online supplementary material), ref lecting their distinctive adaptability traits.
We identified 24 254, 26 246, and 24 266 pairs of orthologous genes between ZS5801, grape, and ZZB, respectively, using JCVI (https://github.com/tanghaibao/jcvi).Parameter distribution analysis suggested the absence of whole-genome duplication events in both genomes (Fig. 2C).Like many plants, mulberry genomes contain abundant repetitive sequences, predominantly transposable elements (TEs), with Copia and Gypsy LTR elements prevalent.Notably, ZZB possessed more intact LTR elements (80 077 in ZS5801 and 136 057 in ZZB) (Fig. 2D; Table S4, see online supplementary material).Both genomes exhibited recent insertion events (Fig. 2E) consistent with their recent divergence.Furthermore, ZZB displayed a denser pattern of LTR insertions, which was potentially linked to the higher count of intact LTR elements.This high-quality assembly provides insights into these evolutionary events.

Assessing genetic diversity in the mulberry fruit population
Population genetic analysis provides a comprehensive understanding of the genetic factors associated with various phenotypes.Here we conducted an extensive investigation of genetic variations across the mulberry fruit population.A total of 112 individuals from diverse geographic regions in China, ranging from the south to the north, were included in the analysis (Table S8, see online supplementary material).Whole-genome sequencing (WGS) was performed on these individuals to capture a wide range of genetic information.The ZS5801_haplotype1 genome was used as the reference for variant calling.As a result, we identified a total of 3 205 768 SNPs with a minor allele frequency (MAF) greater than 0.05.The average sequencing depth achieved for the samples was 25.58× (Table S8, see online supplementary material).
These SNPs facilitated phylogenetic and principal component analyses (PCA) to discern the relationships among the 112 samples (Fig. 3A and B).The mulberry varieties delineated distinct branches (Fig. 3C), forming nine populations: (I) wild/near-wild populations, (II) primarily from Japan, (III) Guangdong province varieties, (IV) varieties concentrated in the 20-40 latitude band, and (V) to (IX) varieties mainly distributed within the 40-60 Vitis vinifera Evolutionary and comparative analysis of the Mulberry genome.A A phylogenetic tree was reconstructed using 3584 single-copy genes from mulberry species and three outgroup species.The OrthoFinder [50] identified orthologous gene families, and CAFE5 [51] calculated the count of expanded or contracted gene families, denoted by symbols (+ for expanded, − for contracted).Nodes on the phylogenetic tree include transparent circular plots indicating divergence times.B Clustering of orthologous and copy number gene families.Genes not clustered are from orthologous families with one gene each.Singlecopy genes are from families with one gene across eight genomes, while multi-copy genes are found in families with at least two genes in eight genomes.Unique genes are from families with genes exclusive to a single genome.C Genome collinearity analysis among ZS5801, grape, and ZZB.D Statistics on the size of repetitive sequences for each genome.E Comparison of long terminal repeat (LTR) insertion times across five mulberry genomes.latitude band.These encompassed major cultivated mulberry species, including both purple and white varieties.Notably, the (VI) group originated from northeastern China, while the (VIII) and (IX) groups originated from northwestern and northern China, respectively.These groupings encapsulate the principal mulberry domestication and cultivation zones.Despite distinct phenotypic traits, including a pink phenotype in group (IV) and a white phenotype in groups (VII) and (IX), genetic clustering did not consistently align with color differentiation.ADMIXTURE [52] analysis corroborated these observations, identifying an optimal genetic cluster number (k) of 7 (Table S9, see online supplementary material).
Pairwise F st values quantified genetic divergence across the nine subgroups, with the highest values observed between Pop1 and Pop9, suggesting limited gene f low between these groups (Fig. 3D).Notably, genetic differentiation persisted even among closely related varieties, such as purple Pop6 and white Pop9, highlighting the inf luence of color-based selection.Linkage disequilibrium (LD) decay values were also computed for the samples, revealing faster LD decay in the purple variety compared to the white variety (Fig. 3E).These findings support the chronological divergence of the purple variety, which serves as the precursor for the white variety.Nucleotide diversity in the purple variety exceeded that in the white variety, potentially contributing to accelerated LD decay in the former.

Selection of MaVHAG3 genes governs mulberry color evolution
Fruit color, a vital agricultural trait in crops and fruit trees, remains incompletely understood in mulberry fruits.We conducted a GWAS using a linear mixed model (LMM) implemented in GEMMA [53], employing genotype and phenotype datasets from 112 mulberry individuals to elucidate the molecular underpinnings of mulberry fruit color evolution.We aimed to pinpoint the genomic loci responsible for fruit color modulation during domestication and breeding.A stringent significance threshold of 1.11 × 10 −8 (Bonferroni correction) was set for Pvalues, revealing a prominent association signal on chromosome 5, encompassing 180 trait-associated markers (MTAs) (Fig. 4A).In search of candidate genes linked to mulberry fruit color, we scrutinized genes within a 10 kb window of the most significant SNP, covering at least 10% of the gene length and identifying 41 candidates.Integration with transcriptomic data further highlighted ZS5801_Hap1.007690.1 as a key candidate (Fig. S5, Table S10, see online supplementary material).Phylogenetic analysis positioned ZS5801_Hap1.007690.1 within the VHA subunit G gene family, with the closest relation to A. thaliana's VHAG3 protein, thus termed MaVHAG3.Notably, MaVHAG3 demonstrated significant upregulation in ZS5801 during fruit ripening.Remarkably, a significant SNP in the last exon of MaVHAG3 contained a Met110Leu missense mutation, highly linked to fruit color variation (Chr5: 8762676, A > T, P-value = 1.72 × 10 −24 ) (Fig. 4C).These findings identify MaVHAG3 as a credible candidate for improving mulberry fruit color, a pivotal determinant of fruit f lavor quality.
Subsequently, we assessed the nucleotide sequence diversity of MaVHAG3, F st , and Tajima's D (Fig. 4B and D).Notably, the purple population exhibited heightened diversity values at the mutation sites, and both color populations displayed pronounced signs of selection.We comprehensively examined this variation across mulberry germplasms to discern whether this variation arose during domestication.Notably, MaVHAG3 harbored a significant single SNP (A > T) in the mulberry population.Varieties with the A/A genotype were mainly found in groups (I), (II), and (III), while the T/T genotype predominated in groups (VII) and (IX) (Fig. 4E; Table S11, see online supplementary material).Intriguingly, T/T genotype materials exhibited lower AC and higher latitude values (Fig. 4F).The AC and latitude displayed a negative correlation (Pearson correlation coefficient = −0.9769) in a selected subset of six varieties (Fig. 4G and H).The allele frequency of the single SNP and nucleotide sequence diversity within MaVHAG3 suggest that the natural variation of MaVHAG3 alleles might have been subject to selection during mulberry population domestication, potentially correlated with latitude.We uncovered the significant inf luence of a single SNP variation within MaVHAG3 on anthocyanin accumulation in mulberry fruits through RNA-seq and qRT-PCR analyses (Fig. 4I; Fig. S5B, see online supplementary material).MaVHAG3 expression levels exhibited pronounced disparities between the colorless and purple varieties, indicating mutation-induced changes in the regulatory region of the gene.Subsequent examination of MaVHAG3's upstream 2 kb promoter sequence unveiled a 5bp Indel (90.90%) in the promoter region of colorless mulberry fruits (Chr5: 8760137), whereas the majority of purple mulberry fruits lacked this deletion variation (89.74%) (Fig. 4J; Table S12, see online supplementary material).qRT-PCR results further confirmed that the expression level of the MaVHAG3 gene with the deletion variation in the promoter was significantly lower in white mulberry varieties compared to nonvariant purple fruit varieties (Fig. 4K).These findings underscore the functional impact of the 5-bp indel on the promoter, leading to divergent expression patterns of MaVHAG3.
Our study highlights MaVHAG3 as a pivotal candidate gene that orchestrates anthocyanin accumulation in mulberry fruit.The naturally occurring genetic variations encompassing a single SNP within the coding region and a 5-bp indel within the promoter region are essential causal loci underpinning this trait.

Dynamic regulation of flux synthesis enzymes governing fruit AC.
To deepen our understanding of the regulation of elevated AC, the metabolic pathways driving the formation of fruit colors have been reconstructed.We successfully identified the enzyme genes involved in this process (Fig. 5A; Fig. S6, Table S13, see online supplementary material).We predicted anthocyanin and f lavonol biosynthetic enzymes by comparing the anthocyanin synthesis pathways between ZS5801 and ZZB.Flavonol synthesis forms anthocyanins and colorless f lavonols through a branching pathway (Table S14, see online supplementary material).Notably, we identified a genome-wide absence of the F3'5'H gene.
The comparative analysis demonstrated differential expansion and contraction (six expansions and three contractions) within nine out of 13 anthocyanin biosynthesis pathway families between the two cultivars (Fig. 5B).This contrasted with the relatively few families (8%) within the chlorophyll metabolism pathway.Flavonol shares seven biosynthetic steps with anthocyanins, known as early anthocyanin synthesis (EBGs), and exhibits active homologous gene expression.However, the cultivars exhibited substantial differences in the number of pathway enzyme genes (76 and 71, respectively) during late-stage anthocyanin synthesis (LBGs), leading to a significant divergence in the overall expression patterns (Fig. 5C).
Crucially, two enzymes, DFR and FLS, showed notable expansion and contraction (7:5 and 8:12, respectively) (Fig. 5D).The contracted DFR gene's total expression decreased significantly, while the expression of the expanded FLS gene increased notably (Fig. S7, see online supplementary material).These enzymes act as rate-limiting factors during anthocyanin and f lavonoid synthesis.Changes in enzyme numbers could alter carbon f lux within the f lavonol pathway, consequently affecting color accumulation.

Regulation of temporal gene expression during fruit development
Considering the temporal-scale linear or nonlinear dynamics inherent in AC, we have implemented a multi-omics investigation that spans across multiple developmental stages.Here, we combined transcriptome and metabolome to conduct an indepth analysis of the mulberry fruit color regulation.Distinct color variations were observed in the ZS5801 and ZZB throughout fruit development (Fig. 6A).ZS5801 accumulated anthocyanins, leading to a purple hue, while ZZB exhibited gradual chlorophyll degradation and turned jade white.Metabolite profiling indicated significant differences between high anthocyanin-yielding (HG2) and low anthocyanin-yielding (BYW) lines [27], particularly in f lavonol content.Untargeted metabolic analysis of ripe fruits using liquid chromatography-mass spectrometry (LC-MS/MS) revealed the identification of 543 metabolites, including carbohydrates, organic acids, and amino acids, contributing to fruity f lavors (Fig. 6B; Table S15, see online supplementary material).The detected metabolites associated with anthocyanin accumulation, such as cyanidin 3-O-glucoside, quercetin 3-O-rutinoside, and others, exhibited significant differential accumulation between ZS5801 and ZZB (Fig. 6C), consistent with previous findings [27].Furthermore, we aimed to gain new insights into gene expression and transcriptional regulation during fruit development by analysing 30 RNA-sequencing datasets from five fruit developmental stages of ZS5801 and ZZB.A total of 18 025 genes were found to exhibit differential expression (|log 2 [fold change] | ≥ 2 and corrected P-value <0.05) across the five tissues and two cultivars.Among these genes, 12 260 and 14 443 were upregulated, while 22 466 and 17 437 were downregulated (Fig. 6D).We classified the expression patterns of differentially expressed genes (DEGs) into eight clusters according to their expression patterns, demonstrating either similar or opposite expression trends between the two cultivars (Fig. S8, see online supplementary material).
Specifically, 2049 homologous genes exhibited similar expression patterns, whereas 3822 genes displayed divergent expression patterns.Enrichment analysis revealed that both cultivars showed consistent differential expression trends in genes related to cell wall metabolism (GO:0005618, P-value = 3.09E-08) and photosynthesis (GO:0019864, P-value = 4.50E-10), while showing opposite trends in genes associated with anthocyanin, f lavonol, and other secondary metabolic pathways (GO:0019748, P-value = 1.09E-11) (Fig. 6E).Moreover, utilizing an additional set of 32 transcriptome datasets, we constructed a gene co-expression network using the weighted gene co-expression network analysis (WGCNA) software package [54].This resulted in the formation of 25 modules comprising 22 807 genes.Notably, the orangered module, which included the anthocyanin synthase gene as a central hub, contained 110 additional genes (Table S16, see online supplementary material).Functional enrichment analysis of these genes revealed significant associations with secondary metabolism and f lavonoid synthesis, indicating their potential involvement in fruit color formation (Fig. S9, see online supplementary material).
Using identified anthocyanin pathway enzymes and transcription factors, we constructed time-ordered gene co-expression networks (TO-GCNs) for two mulberry cultivars (Table S17, see online supplementary material).These networks were compared to unravel the gene regulatory mechanisms governing the fruit color transition between the cultivars.Among the 1624 genes exhibiting significant expression differences across the five developmental time stages (S1-S5) and with an average TPM value   greater than 0.5, a basic helix-loop-helix (bHLH) transcription factor (ZS5801_Monoploid.003909.1)with a characteristic expression pattern was selected as the initial node.ZS5801-specific, ZZBspecific, and Consensus TO-GCNs were reconstructed (Fig. 6F), each comprising five temporal subnetworks (L1-L5) corresponding to the sampling time points.Genes associated with anthocyanin/f lavonol biosynthesis pathways were identified within these networks: 657 genes (562 transcription factors and 95 pathway enzymes) in ZS5801-specific, 540 genes (480 transcription factors and 60 pathway enzymes) in ZZB-specific, and 230 genes (212 transcription factors and 18 pathway enzymes) in Consensus TO-GCNs (Fig. 6G).Species-specific networks revealed crucial regulatory genes and their hierarchical interactions.In the late coloring stages (S4-S5), the ZS5801-specific subnetwork predicted critical anthocyanin pathway genes (F3H and ANS), while the ZZBspecific subnetwork highlighted CHS, FLS, and F3oGT genes (Fig. 6H and I; Fig. S10, see online supplementary material).These genes were validated through expression profiles correlated with color accumulation patterns during the critical transition and ripening stages.Interaction patterns between pathway enzymes and regulatory transcription factors were outlined, where the expression profiles of transcription factors (e.g., MYB, bHLH, WD40, ERF, and WRKY) closely aligned with color accumulation patterns, affirming the network's accuracy.These network topologies lay the groundwork for further exploration of anthocyanin regulation.

Discussion
A key agronomic trait sought by breeders in mulberry plants is high AC.Mulberry fruits exhibit significant variations in AC across different genotypes [55,56], resulting in diverse mature fruit colors such as black, pink, red, and white [57].This natural diversity among mulberry genotypes is essential for studying the metabolic pathways involved in anthocyanin biosynthesis and understanding the fundamental compounds that inf luence variations in AC among different mulberry types.These insights have the potential to streamline the screening of various germplasms and inform agricultural practices focused on anthocyanin-related applications.
Through the identification of these differences, specific loci and candidate genes associated with fruit color determination have been pinpointed.One of the identified candidate genes is MaVHAG3, which encodes a VHA transporter protein.This protein plays a crucial role in cellular functionality and coloration in f leshy fruits and ornamental plants.Processes such as anthocyanin and malic acid transportation rely on the activity of VHAs and vacuolar H + -pyrophosphatases, which are responsible for H + pumping.In apples, the MdMYB1 gene has been found to activate VHA B-subunit genes (MdVHA-B1 and MdVHA-B2) through promoter binding, enhancing VHA activity to regulate cellular pH and anthocyanin accumulation [58].These findings highlight the applicability of our dataset and help precise insights that support genetic enhancements in coloration.
An intriguing discovery materialized in a 5 bp variation within the MaVHAG3 promoter, closely associated with mulberry population AC.Additionally, this 5-bp sequence is absent in early domesticated Japanese and Guangdong mulberry varieties, while a 5-bp insertion characterizes several recently domestic white mulberry materials.These observations posit the potential role of the identified genetic variation in domestication.Moreover, a salient inverse correlation materialized between MaVHAG3 variation and latitudinal position, thereby augmenting our comprehension of the geographic impetus shaping genetic heterogeneity.Our dissection of the MaVHAG3 promoter sequence (Fig. S11, see online supplementary material) uncovered many light-responsive essential elements.This prompts speculation regarding the potential inf luence of varying light intensities at different latitudes on anthocyanin accumulation, conceivably mediated by light-responsive interactions with the MaVHAG3 promoter region.
Furthermore, high AC in mulberry is the result of multi-level regulation.Varieties such as ZS5801 and ZZB exhibit significant differences in fruit color.By utilizing PacBio HiFi long reads and Hi-C data, we have successfully achieved haplotype phasing, providing valuable insights into the heterozygous genome landscape of mulberry.The dynamic expression of candidate gene is speculated to play a pivotal role in plant development and adaptation, but further investigation is needed to determine their precise molecular functions.Comprehensive investigations are warranted to elucidate their involvement in fruit anthocyanin accumulation.

Plant materials
Genomic sequencing involved the analysis of young leaves from individual plants of two diploids (2n = 2x = 28) mulberry cultivars, ZS5801 and ZZB, characterized by distinct fruit coloration.To investigate transcriptomic and chemical dynamics related to fruit color development, we harvested fruit tissue samples at five developmental stages from three mulberry cultivars (ZS5801, ZZB, and CQ109) based on discernible external traits.To facilitate transcriptome sequencing, 45 tissue samples (comprising three replicates per time) were obtained across these five-time points.Furthermore, mature mulberry fruits (S5 stage) from ZS5801 and ZZB, encompassing three replicates each, were gathered for metabolome comparative analyses.To explore populationbased fruit color variations, comprehensive WGS was conducted using 112 distinct materials from diverse geographic locations, predominantly in China.This set comprised 54 re-sequenced specimens and 58 samples procured from GenBank (Accession Number PRJNA597170) [59].All sequencing subjects were cultivated and maintained within mulberry germplasm nurseries at Southwest University and Chongqing Sericulture Research Institute.Following collection, freshly harvested tissues were promptly cryopreserved in liquid nitrogen.

Genome sequencing
Illumina Short-Read Sequencing: Genomic DNA was extracted using a CTAB-based protocol as previously described [60].Sequencing on the Illumina HiSeq platform utilized a 350 bp insert size library, generating high-quality 150 bp paired-end reads.Following sequencing, raw data underwent trimming and filtering procedures.
PacBio Sequel II Sequencing: DNA samples were outsourced to AnnoRoad, a reputable genomics service provider.CCS libraries were created using the Pacific Biosciences method, producing two distinct Hi-Fi libraries with 20 kb insert sizes for the cultivars ZS5801 and ZZB.Sequencing was performed on the Pacific Biosciences Sequel II platform.

Identification and annotation of variants
SNPs and Indel (length ≤ 500 bp) were compared between two distinct cultivar genomes using MUMmer (v3.23) [89] software.The Nucmer tool within MUMmer was employed initially with parameters '-mum -g 1000 -c 90 -l 40' to generate alignment outcomes.Resultant alignments underwent one-to-one mapping using the delta-filter tool in MUMmer, applying options '-r -q'.The Show-snp module in MUMmer, with parameters '-ClrTH', facilitated SNP and Indel identification within the one-to-one alignments.For advanced analysis, the SyRI [90] pipeline was utilized for variant detection.Variant annotation was conducted through the snpEff [91] tool.

Weighted correlation network analysis
A gene co-expression network was constructed using the R package WGCNA (v1.61) [54] based on the normalized (z-score) TPM matrix.RNA-seq data from 21 samples and genes from 12 SRA datasets were included in the network construction.A soft threshold (β) was calculated to achieve a scale-free topology, with a selected scale-free fitting index (R 2 ) of 0.9.Gene clustering based on dissimilarity and module definition using the dynamic cut method was performed, with each module containing a minimum of 30 genes sharing similar expression profiles.
Analysis of TO-GCNs TO-GCN [99] were constructed for purple fruit (ZS5801specific TO-GCN) and white fruit (ZZB-specific TO-GCN), along with a consensus TO-GCN between the two networks.Potential orthologous genes (34953) were inferred using GeneTribe (v1.2.1) [100].Orthologous gene pairs were merged as nodes in the consensus TO-GCN.Expressed differentially expressed genes (DEGs) were defined if they displayed significant differential expression between any two of the five fruit ripening time points (S1-S5) and had an average TPM greater than 0.5.Pearson's correlation coefficient of 0.8 was used for TF-gene pairs.Two bHLH transcription factors (ZS5801_Monoploid.003909.1 and ZZB_Monoploid.003862.1)served as initial nodes for TO-GCNs generated by MFSelector [101] (https://github.com/yushuen/MFSelector_STAR-Protocol). Three TO-GCNs under the 'C1 + C2+' mode were combined for an extensive large network and then visualized using Cytoscape [102].The degrees of the nodes (genes) were calculated as the number of connections or edges the nodes have to other nodes.

Phylogeny and population analysis
A comprehensive investigation was conducted into the phylogenetic relationships among mulberry trees, leveraging 4 914 314 high-quality single nucleotide polymorphisms (SNPs).Multiple methods encompassed NJ (neighbor-joining) tree construction, population structure assessment, and genomic SNP-based PCA.The evolutionary tree was generated using Plink (v1.90b6.26)[106] with the '-distance-matrix' parameter for calculating genetic distances.Conversion of results to .megfiles was achieved through Perl scripts, and MEGA(v6.0)[107] software was utilized for further analysis (bootstrap = 1000).The resulting tree was visualized using the online tool iTOL [108], with color labels corresponding to sample population information.PCA analysis utilized the EIGENSOFT (v6.1) [109] smartpca program, considering linkage-imbalanced attenuated chromosomal SNPs (plink: -indep-pairwise 50 5 0.4), and significance assessed via the Tracy-Widom test for eigenvalues.PCA plots were generated using ggplot2 in R [110].Population structure analysis employed ADMIXTURE (v1.3.0)[52] software to scrutinize fruit mulberry's genetic structure and confounding.ADMIXTURE was executed for k = 2-15 clusters with default parameters.The optimal choice, k = 7, was determined based on minimal CV error, serving as a representative model for the population genetic characteristics of fruit mulberry.

Genomic diversity and selection scanning
To assess the attenuation patterns of LD within each population, we calculated the mean r 2 values between pairs of SNPs within a 500 kb range.These computations were performed using PopLDdecay (v3.27) [111] software, where higher r 2 values indicate stronger LD.To evaluate genomic differences, we conducted F st calculations between two sample groups, employing a sliding window of 25 kb with a step size of 10 kb across the mulberry genome.For this analysis, we utilized VCFtools (v0.1.13)[112], which enabled the quantification of the intensity of genomic differentiation between the two compared groups.The resulting values range from 0 (indicating no differentiation) to 1 (signifying complete or fixed differentiation).

GWAS
The GEMMA [53] package, employing mixed linear models, was utilized for the GWAS analysis to enhance statistical power and mitigate false positives.Involving 112 fruit mulberry accessions, GEMMA computed the correlation matrix, and the first three principal component values (feature vectors) derived from highquality genome-wide SNPs (MAF ≥ 0.05 and deletion rate ≤ 50%) were integrated into the LMM as fixed effects to account for stratification effects.GEMMA generated P-values for all SNPs and target traits.The Bonferroni correction established the genomewide significance threshold of 0.01 divided by the total number of SNPs.Haploblocks were inferred using Plink (v 1.90b6.26)[106] with default parameter (−hap).

VHAG3 selection analysis
For the comparison of VHAG3 and nucleotide diversity patterns between the purple and non-purple populations, whose population structure is established, we computed F st , nucleotide diversity (θπ), and Tajima's D using VCFtools (v0.1.13)[112].These calculations were performed on the SNP dataset using the unit point method, employing a 2 kb window and 1 bp step size.

Figure 3 .
Figure 3. Population structure of mulberry fruit varieties.A Phylogenetic tree illustrating the relationships among 112 fruit mulberry accessions.B The principal component analysis (PCA) of the 112 fruit mulberry accessions revealed the population structure.Various subspecies are differentiated by unique symbols (circles, squares, triangles) which correspond to fruit color categories identified in (A).A line distinguishes the groups with purple fruit from those without.C ADMIXTURE analysis of nine fruit mulberry taxa, with representative mulberry types displayed beneath each group.D Genetic differentiation (F st ) between each pair of subspecies, as indicated in (A).E Linkage disequilibrium (LD) attenuation for groups differentiated by fruit color characteristics discussed in (B).

Figure 4 .
Figure 4. Identification of key loci for fruit color traits via GWAS.A Manhattan plot illustrating the results of GWAS for fruit color in 112 accessions, with data analyzed using a linear mixed model (LMM).A dashed line marks the significance threshold, set by Bonferroni correction (P = 1.11 × 10 −8 ).The quantile-quantile plot displays -log 10 transformed observed P-values against expected P-values.B Genome-wide F st analysis depicted through a Manhattan map, showing genetic differentiation across the genome.C Gene structure of MaVHAG3 within the haploblock containing the lead SNP.Exons are represented by boxes, introns by connecting lines, and gene orientation by arrows.D Comparative analysis of Tamjim's D, F st , and π values for the MaVHAG3 sketch region (−1 kb).Areas indicating potential selection signals are highlighted.E Allelic distribution in MaVHAG3 genes among cultivars producing fruit with differing color traits, visually depicted alongside corresponding fruit images.Scale bars: 1 cm.F Geographical distribution of 112 accessions based on MaVHAG3 genotypes, with the percentage ratio between different latitudes shown in the histogram.G Anthocyanin and f lavonol content in the fruits of 19 accessions presented through box plots.H Correlation investigation between anthocyanin content (AC) and latitude in six mulberry cultivars, indicated by correlation coefficients.I qRT-PCR analysis of MaVHAG3 expression during fruit development, quantifying dynamic changes in expression across diverse cultivars.J Percentage distribution statistics of promoter variation types in the three groups of cultivars.K Relative expression of different promoter variant types in 10 germplasms during mulberry fruit development, indicated by statistical significance using a one-way ANOVA test.Each accession underwent three replicates for robustness.