Chromosome restructuring and number change during the evolution of Morus notabilis and Morus alba

Abstract Mulberry (Morus spp.) is an economically important plant as the main food plant used for rearing domesticated silkworm and it has multiple uses in traditional Chinese medicine. Two basic chromosome numbers (Morus notabilis, n = 7, and Morus alba, n = 14) have been reported in the genus Morus, but the evolutionary history and relationship between them remain unclear. In the present study, a 335-Mb high-quality chromosome-scale genome was assembled for the wild mulberry species M. notabilis. Comparative genomic analyses indicated high chromosomal synteny between the 14 chromosomes of cultivated M. alba and the six chromosomes of wild M. notabilis. These results were successfully verified by fluorescence in situ hybridization. Chromosomal fission/fusion events played crucial roles in the chromosome restructuring process between M. notabilis and M. alba. The activity of the centromere was another key factor that ensured the stable inheritance of chromosomes. Our results also revealed that long terminal repeat retrotransposons were a major driver of the genome divergence and evolution of the mulberry genomes after they diverged from each other. This study provides important insights and a solid foundation for studying the evolution of mulberry, allowing the accelerated genetic improvement of cultivated mulberry species.


Introduction
Mulberry (Morus spp.), which is recognized as the main food source for silkworm (Bombyx mori), is a deciduous tree of great economic importance. Mulberry originated in the Himalayan foothills and is distributed worldwide, including on the Eurasian, American, and African continents [1]. A draft mulberry genome of the wild species Morus notabilis was published in 2013 [2]. This genome sequence of M. notabilis has provided fundamental resources promoting studies of gene identification, genome evolution, trait development, and genetic improvement in mulberry. Mulberry leaves and fruits have been used as medicines, in beverages, and as functional foods in many countries because they contain abundant biologically active compounds [3], and they have attracted much attention from researchers. Based on published genomic [2], metabolomic [4,5], and transcriptomic data [2,6], most of the genes related to the biosynthesis of biologically active compounds in mulberry have been identified in M. notabilis, and the mechanisms regulating secondary metabolite synthesis in mulberry leaves and fruits have been elucidated [5][6][7]. These findings have contributed to increases in the genetic improvement and utilization of mulberry.
Polyploidy has played an important role in the speciation and evolution of plants. Karyotyping and ploidy-level identification have been employed as tools for parental selection in polyploid plant breeding [8]. Ten different chromosome numbers (14,28,35,42,49,56,84,112,126, and 308) have been identified in mulberry, which are related to altered ploidy levels [9,10], and this information has been invaluable for the polyploid improvement of mulberry. Mulberry is believed to have a basic chromosome number of 14, and this assertion has been widely cited in the literature [11]. The wild mulberry species M. notabilis was first recorded by the German preacher Schneider in 1916 [12]. It has the minimal chromosome number [14] found in Morus and was the first species to be used for mulberry genome sequencing [2]. Karyotyping analyses definitively illustrated that the 14 chromosomes of M. notabilis formed seven pairs, thus indicating a basic chromosome number of seven [2]. Moreover, the mitotic and diakinesis karyotypes of M. notabilis were constructed using fluorescence in situ hybridization (FISH), and the results supported the finding that the basic chromosome number of M. notabilis was seven [9]. Recently, the genome of cultivated mulberry cultivar M. alba 'Heyebai' (2n = 28) was published, and the basic chromosome number of M. alba was proposed to be 14 [13]. Phylogenetic analyses of mulberry using nuclear ribosomal internal transcribed spacer (ITS) sequences, chloroplast DNA sequences, and genomic sequences indicated that M. notabilis was a more ancient species than M. alba [1,10,14] and that M. alba diverged from M. notabilis ∼10 million years ago (MYA) [13,14]. The karyotypes of M. notabilis and M. alba also showed significant differences [9], even though they were reported to have similar genome sizes [13]. M. notabilis contained 14 chromosomes with clear length differences. In contrast, all of the chromosomes of M. alba were dot or large dot chromosomes, lacking morphological characteristics. To date, the chromosomal evolutionary process in mulberry species remains unclear. A study of mulberry chromosomal evolution should thus contribute to the understanding of the genetic consequences of the evolution and domestication of mulberry [5,6].
Previous studies have indicated the genome-wide relationships between cultivated plant species and their wild relatives, including citrus [15], soybean [16], tomato [17], and pistachio [18]. These studies have revealed that some genes or regions have experienced artificial selection in cultivated relatives. For example, energyand reproduction-associated genes have been shown to be under selection in cultivated citrus [15]. Some new quantitative trait loci have also been well characterized in cultivated soybean [16] and tomato [17]. Because long terminal repeat (LTR) retrotransposon elements always account for the largest proportion of a plant genome [19], many studies have demonstrated that LTRs and their associated processes not only shape differences in the architecture of the plant genome within a genus but also contribute to the diversification and speciation of plants [20]. For example, a major burst of different LTRs has contributed to the diversification of apple [20]. LTR insertions lead to changes in the expression of genes related to anthocyanin biosynthesis, which ultimately leads to changes in the fruit phenotypes of grapes and oranges [19,21,22]. Similarly, understanding mulberry evolutionary divergence at the genome-wide level is essential for the further improvement of mulberry breeding.
In this paper, we describe the generation of a highquality M. notabilis genome assembled using Oxford Nanopore Technologies (ONT) ultrahigh-throughput sequencing and high-throughput chromosome conformation capture (Hi-C) technologies. Moreover, we carried out detailed comparative chromosomal analyses between the genomes of M. notabilis and M. alba to examine the genomic evolution process. The mechanisms of genome reorganization and evolution reported in this study will be of value for further genetic research on mulberry species.

Genome sequencing, assembly, and genome annotation
The predicted size of the M. notabilis genome was approximately 401.62 Mb (Supplementary Table 1). To enhance the quality of our genome assembly, approximately 57.29  Table 3). The final 335.39 Mb assembly genome was generated from 650 contigs, with a contig N50 size of 5.72 Mb (Table 1). Then, 40.50 Gb of Hi-C data were used to further improve the integrity of the assembled genome (Supplementary Table 4). All highquality Hi-C data were mapped to the aforementioned 650 assembled contigs. A total of 327.01 Mb of the 335.39 Mb assembled genome could be anchored to six chromosomes ( Fig. 1 and Supplementary Fig. 1). The chromosome-specific probes reported in a previous study [9] could be clearly anchored to the corresponding chromosomes (Supplementary Table 5). Compared with the assembled genome previously generated [2], both the quality and integrity of the genome assembled in the present study were greatly improved (  Table 6). For gene prediction, a total of 26 010 protein-coding genes, with an average gene size of 1454 bp, were annotated in the newly assembled M. notabilis genome (Supplementary Table 7

Genome and gene collinearity
Phylogenic trees were constructed using 132 single-copy genes, which suggested that M. alba diverged from M. notabilis ∼10.99 MYA ( Supplementary Fig. 2). We then performed a large-scale genome alignment of the M. notabilis genome sequenced in this study and the genome of M. alba (Fig. 2a), which demonstrated a high level of chromosome synteny between these two genomes (Fig. 2a). We also analyzed gene collinearity between the two genomes. A total of 36 436 collinear genes (68.72% of a total of 53 022 genes) were identified. If the chromosome location of collinear genes in the two genomes was inconsistent with our genome alignment results, the gene pairs were defined as rearranged genes. A total of 386 rearranged genes were identified in the two genomes based on their positions on the corresponding chromosomes (Fig. 2b, Supplementary Table 8). KEGG analysis suggested that these genes mainly participated in metabolic pathways and the biosynthesis of secondary metabolites (Supplementary Tables 9, 10). Among these genes, 16.06% (62/386) of the genes from M. notabilis could be mapped to 67 KEGG pathways, including metabolic pathways (32/62) and the biosynthesis of secondary metabolites (18/62). A total of 18.13% (70/386) of the genes from the M. alba genome could be assigned to 70 KEGG pathways, including metabolic pathways (38/70) and the biosynthesis of secondary metabolites (22/70).
To investigate chromosomal rearrangements in mulberry without an available genome of the common ancestor of Morus, the distribution of orthologous genes between grapevine (Vitis vinifera) and mulberry was analyzed. In the comparison of the gene order in M. notabilis and M. alba with that in V. vinifera, each region with a succession of dots in mulberry corresponded to three regions in grapevine and vice versa ( Supplementary Fig. 3). The results suggested that both M. notabilis and M. alba share the eudicot-common triplication with V. vinifera and that there have been no whole-genome duplication (WGD) events in mulberry since its separation from the Eurosid I clade. The chromosomal structures of M. notabilis and M. alba were also reconstructed according to the composition of the grapevine chromosomes, and the observed correspondence between the reconstructed chromosomes of M. notabilis and M. alba was fairly good ( Supplementary Fig. 3).
As reported previously [9], mitotic chromosomes 5 and 7 were fused to form diakinesis chromosome 5 of M. notabilis. To illustrate the relationships between the 14 mitotic chromosomes, six diakinesis chromosomes, and six assembled pseudochromosomes, two singlecopy sequence probes (Mn-chr5A and Mn-chr5B) from pseudochromosome 5 of M. notabilis were designed, and FISH was performed. As shown in Fig. 3j-m, Mn-chr5A mapped to the smallest dot mitotic chromosome 7 (Fig. 3j) and the distal part of the short arm of diakinesis chromosome 5, which colocalized with the 25S rDNA probe (Fig. 3k). Mn-chr5B mapped to the largest dot mitotic chromosome 5 (Fig. 3l) and another distal portion of diakinesis chromosome 5 (Fig. 3m). The chromosomes of 'Lunjiao109' showed similar signal patterns of probes Mn-chr5A, Mn-chr5B, and 25S rDNA. Mn-chr5A mapped to one of three pairs of mitotic chromosomes, which were deeply stained with 4 ,6diamidino-2-phenylindole (DAPI) (blue) (Fig. 3n). Mn-chr5A and 25S rDNA colocalized with the largest diakinesis chromosome (chromosome 1) of 'Lunjiao109' (Fig. 3o). In detail, Mn-chr5A hybridized to the distal part of diakinesis chromosome 1, and the centrally located signal pattern was identified as 25S rDNA. Mn-chr5B mapped to another pair of mitotic chromosomes stained deeply by DAPI (Fig. 3p). Mn-chr5B was located in the distal part of the short arm of diakinesis chromosome 1, which was distinguished by the 25S rDNA signal (Fig. 3q). The results of the FISH-based comparative chromosomal analysis indicated that two pairs of mitotic chromosomes from M. notabilis and 'Lunjiao109' were fused into one diakinesis chromosome 5 and one diakinesis chromosome 1, respectively (Fig. 3r).

Identification and localization of centromere tandem repeats in M. notabilis and M. alba
The centromere is one of the key elements ensuring the stability of chromosomal inheritance, especially when chromosome fusion and fission occur. The centromere was selected to illustrate the chromosomal evolution process in M. notabilis and M. alba in the present study. Tandem repeats with a repeat monomer length of 175 bp showed high abundance in M. notabilis and M. alba and were treated as prime candidate centromere tandem repeats ( Supplementary Figs. 6, 7). The physical location of these tandem repeats indicated that Mn-175 nt was distributed on all chromosomes of M. notabilis in at least two clusters except for chromosome 6, which lacked Mn-175 nt altogether (Fig. 4a). Tandem repeats of Ma-175 nt indicated its physical location on all chromosomes in M. alba in at least one cluster (Fig. 4b). The Mn-175 nt and Ma-175 nt sequences were shown to exhibit 89.14% similarity ( Supplementary Fig. 8). Then, Mn-175 nt was labeled with biotin and hybridized to the chromosomes of M. notabilis and 'Lunjiao109'. The Mn-175 nt probe mapped to all the diakinesis chromosomes of M. notabilis. For example, at least six pairs of signals were observed on diakinesis chromosome 1, whereas diakinesis chromosome 6 showed only one pair of obvious signals (Fig. 4c). Probe Mn-175 nt mapped to all of the diakinesis chromosomes of 'Lunjiao109' with seven major signals, and its signal was especially prominent on the two largest diakinesis chromosomes (chromosomes 1 and 2) (Fig. 4d). When homologous chromosomes were pulled to the opposite poles in metaphase I, the Mn-175 nt probe localized to centromere regions (Fig. 4e). The Mn-175 nt probe clearly localized to the centromere regions of nine cruciform bivalents in metaphase II in 'Lunjiao109' (Fig. 4f).

Comparative genomic analyses of gene families
Comparative analysis of these two genomes indicated that 2138 and 2219 genes were specific to the M. notabilis and M. alba genomes, respectively. When we compared the positional relationships between specific genes and their neighboring collinear genes in the two genomes, a total of 20 and 21 species-specific genes of M. notabilis and M. alba were found to be located in the rearranged regions (Supplementary Table 11). A total of four genes of M. alba were located at the end of the corresponding chromosome (Supplementary Table 11). When we mapped these species-specific genes to KEGG pathways, we found that only 24 specific genes in M. notabilis could be assigned to KEGG pathways and that they were mainly involved in two types of pathways: metabolic pathways (12/24) and the biosynthesis of secondary metabolites (6/24) (Supplementary Table 12). A total of 50 specific genes in the M. alba genome could be mapped to 55 pathways, and these were mainly related to metabolic pathways (19/50) and the biosynthesis of secondary metabolites (12/50) (Supplementary Table 13). To gain insight into gene family evolution, 18 845 groups of orthologous gene families were generated between the two genomes. The results of gene family evolution analysis revealed that a total of 293 and 97 gene families from M. notabilis had undergone expansion or contraction, respectively (Supplementary Tables 14, 15 and Supplementary Figs. 9a, 9c). The numbers of expanded and contracted gene families in M. alba were 372 and 66, respectively (Supplementary Tables 16, 17 and Supplementary Figs. 9b, 9d). KEGG enrichment analysis of the expanded gene families from the two genomes revealed that these genes were predominantly involved in the biosynthesis of secondary metabolites. In contrast, the contracted gene family pathways were mostly related to cyanoamino acid metabolism in M. notabilis and the flavonoid biosynthesis pathway in M. alba.
Furthermore, a total of 33 and 50 genes were shown to have evolved under positive Darwinian selection using codeml from the PAML package with a branch-site model in M. notabilis and M. alba, respectively. Although only six and eight positively selected genes from the M. notabilis and M. alba genomes, respectively, could be mapped to KEGG pathways, the results suggested that these genes were mainly enriched in pathways related to the biosynthesis of secondary metabolites (Supplementary Tables 18, 19).

A significant difference in transposable element content between the two mulberries
A total of 57.07% of the M. notabilis genome could be classified as repetitive sequences, whereas the repetitive sequence proportion in the M. alba genome was 52.91% (Supplementary Table 6). Among these repetitive sequences, the proportions of both total LTRs and the related superfamilies showed a significant difference between these two genomes ( Fig. 5a and Supplementary Fig. 10). LTRs showed lower copy numbers and proportions in the M. notabilis genome than in the M. alba genome ( Fig. 5b and Supplementary Fig. 11). Insertion time analysis of all full-length LTRs in the two genomes indicated that most of these elements were inserted into the corresponding genomes within the past 4 MYA (Fig. 5c). Peak frequencies of LTR insertions were found at approximately 1 MYA and 0.2 MYA in M. notabilis and M. alba, respectively. Compared with M. notabilis, the distribution pattern of full-length LTRs in M. alba resembled a negative exponential distribution.
To further understand the relationship of LTRs between these two genomes, all LTRs belonging to the same family or species-specific family were identified between the two genomes. As shown in Fig. 5d and Supplementary Fig. 12a, Table 20). In regard to the speciesspecific families in the two genomes, the proportion of specific families in the M. notabilis genome was significantly higher than that in the M. alba genome (Supplementary Fig. 12b). The largest specific family, MnotRLC0001, which accounted for 30.09% of the total length of the specific families of Copia, contributed most to the burst of special Copia families in the M. notabilis genome (Supplementary Table 20). If one LTR belonging to the same family was located on other noncorresponding chromosomes in the two genomes, it was regarded as a rearranged LTR. Further analysis suggested that rearranged LTRs occupied a large proportion of the total number of LTRs in M. alba, including 37.43% and 46.07% of Copia and Gypsy elements, respectively (Fig. 5e).
In light of the observations that there were significantly different total LTR contents in the two genomes and that rearranged LTRs accounted for a large fraction of the total LTRs in M. alba, the distribution patterns of the LTRs in the two genomes were analyzed to determine whether any of these superfamilies tended to be distributed within or around genes in each respective genome. LTRs with similar distribution patterns in the two genomes were therefore studied (Fig. 5f). Both Copia and unknown LTRs showed a greater tendency to be distributed within gene bodies and adjacent genes (i.e. within a 2 kb adjacent region upstream or downstream of a gene), whereas Gypsy elements were found within the intergenic region more often than expected (Fig. 5f). In regard to the distribution patterns of rearranged LTRs in M. alba, however, both Copia and Gypsy elements were significantly overrepresented in the gene body regions (Fig. 5g). All LTR-related genes were mapped to KEGG pathways, and our KEGG enrichment analysis suggested that such genes in M. alba were enriched with respect to the biosynthesis of secondary metabolites (Supplementary Fig. 13 and Supplementary Table 21). Although these genes were not enriched in the same pathway in M. notabilis, the largest proportion of KEGG mapped genes (26.33%, 933/3542) participated in the same pathway (Supplementary Table 22). Detailed distribution analysis further suggested that LTRs showed different distribution patterns in many f lavonoid biosynthesis pathway-related genes, such as ANS, CHS, and UGT (Fig. 5h). It should be mentioned that more LTRs were found within and around such genes in M. notabilis than in M. alba. The results of metabolomic analysis also revealed that many f lavonoid metabolites showed significant differences in concentration between these two mulberry species (Fig. 5i). For example, the fold difference of the quercetin O-rhamnosyl-O-rhamnosyl-O-hexoside-I concentrations between the two species reached 12.07 (Fig. 5i).

Chromosome number evolution from M. notabilis to M. alba
In this study, the genome of M. notabilis was sequenced using the techniques of ONT and Hi-C and assembled into six high-quality chromosome sequences. The haplotype chromosome number of M. notabilis was seven [2,17], which was different from the assembled chromosome number. Mitotic chromosomes 5 and 7 fused to form meiotic chromosome 5, as proposed in our previous study [17]. In the present study, two singlecopy sequence probes from assembled chromosome 5 were designed to test this chromosomal fusion event. Our results indicated that Mn-chr5A located on chromosome 7 and Mn-chr5B located on chromosome 5 were fused into diakinesis chromosome 5 in M. notabilis (Fig. 3j-m), demonstrating why the genome sequences of M. notabilis were assembled into six chromosomes. The physical lengths of our assembled chromosomes were consistent with the relative lengths of diakinesis chromosomes in M. notabilis, suggesting that a corrected chromosome order should be proposed [17]. A chromosomal fusion event was also identified in M. alba, in which two mitotic chromosomes were fused into diakinesis chromosome 1 (Fig. 3n-r). Thus, chromosomal fusion events from mitotic chromosomes to meiotic chromosomes were shared by ancient wild M. notabilis and cultivated M. alba.
Previously, a consensus was reached that the basic chromosome number of M. notabilis was seven [2]. Recently, Jiao et al. proposed that the basic chromosome number of cultivated mulberry, M. alba, was 14 at the sequence level [13]. Hence, two basic chromosome numbers (n = 7 and n = 14) have been assumed to coexist in the genus Morus. Multiple basic chromosome numbers have been reported in many genera, such as Melampodium (n = 11, n = 9, and n = 14) [23], Gossypium (n = 12 and n = 13) [24], and Cucumis (n = 12 and n = 7) [25]. Highquality genome sequences provide valuable resources for studying the genome-wide evolution of plant species. Chromosomal fission/fusion and polyploidization are the predominant causes of chromosome number evolution [26][27][28][29]. Genomic synteny analyses and FISH-based comparative chromosome analyses of M. notabilis and M. alba were also performed in the current study. Our results indicated strong chromosome-level synteny between the six chromosomes of M. notabilis and the 14 chromosomes of M. alba (Fig. 2 and Fig. 3a-i). In addition, no intermediate chromosome number between 14 and 28 has been identified in mulberry. We propose that large-scale chromosomal fission/fusion events resulted in changes in the basic chromosome number of mulberry based on several lines of evidence. Triplication predated mulberry/grapevine separation, and no WGD events occurred in the mulberry lineage after separation ( Supplementary Fig. 3). Correspondence was observed between the chromosomal structures of the two mulberries reconstructed from the three ancestral genomes in grapevine ( Supplementary Fig. 3), and only 386 rearranged genes were identified in the two mulberry genomes (Fig. 2b, and Supplementary Table 8). Similar findings were observed in monkeyflower (Mimulus), in which at least eight fission events and two fusion events led the basic chromosome number to evolve from 8 to 14 [30]. These results suggest that large-scale chromosomal fission/fusion events drove the speciation of M. notabilis and M. alba.

Centromere evolution in mulberry
As the key element required for faithful chromosome segregation during mitotic and meiotic cell division, centromeres have played important roles in the evolutionof mulberry, especially during the occurrence of large-scalechromosome fission/fusion events. The multiple occurrences of Mn-175 nt on the chromosomes of M. notabilis suggested that inactivated centromeres existed in this genome. This phenomenon has also been observed in maize and wheat [31,32]. The inactivated centromere sites in M. notabilis provided candidate centromere sites to ensure the accurate inheritance of the broken chromosomes of M. alba, thus avoiding the loss of genetic material. The reactivation of inactivated centromeres has been reported in maize [33]. In addition, chromosome fusion from mitotic chromosomes to meiotic chromosomes was identified in M. notabilis and M. alba in the present study. The fused chromosomes both divided and segregated together normally in the meiotic phases, as previously reported [9], suggesting that a core centromere had formed. The fusion of multiple centromeres into a single centromere has also been reported previously [32], a finding that was consistent with our results. Thus, the inactivation and reactivation of centromeres ensured the stable inheritance of the broken and fused chromosomes of mulberry, making it a good model for further studies of centromere evolution.

Long terminal repeats and associated processes play major roles in shaping mulberry genomes.
It is worth paying attention to the dynamics of LTRs in these two genomes. Our results indicated that the two mulberry species diverged from each other approximately 10 MYA (Supplementary Fig. 2). Significant differences were found both within a given gene family and in species-specific gene family proportions between the two genomes ( Fig. 5d and Supplementary Fig. 12). Combined with the results described earlier, we suggest that after being derived from M. notabilis, the amplification and proliferation of species-specific LTRs were not as high in the M. alba genome as in the M. notabilis genome. In other words, the LTRs of M. notabilis showed more potent activity than those of M. alba. For instance, MnotRLC0001, the largest species-specific family, belonged to the Copia superfamily, contributing the most to the burst of specific Copia in the M. notabilis genome. A possible explanation for this phenomenon may be the different environments in which the wild and cultivated mulberry species grew and evolved. As reported previously, retrotransposons become activated and are quickly amplified under strong natural selection forces, such as unexpected climate change, abiotic stresses, and biotic stresses [21,34].
In regard to the rearrangement of LTRs in the two genomes following their divergence from one another, an interesting phenomenon was observed. The distribution patterns of the rearranged Copia and Gypsy elements were not completely random, showing a greater tendency to be distributed in gene body regions (Fig. 5g) than to exhibit an overall random distribution pattern (Fig. 5f). Previous studies carried out in different plant species have suggested that numerous unique structural chromosomal rearrangements are largely attributable to transposition and proliferation events of LTRs [35,36]. Our results were not only consistent with those previously reported in other plants but also indicated that rearranged LTRs exhibited dramatic dynamic changes relative to other species-specific LTRs in mulberry genomes and may play important roles in the diversity of the two mulberry genomes.
Due to the significant difference in the abundance of LTRs in the genomes of the wild and cultivated mulberry species, it was not surprising to find that LTRs may affect gene functions when they are located in the direct vicinity of such genes. In fact, the important roles of these elements as regulators of gene function in many plant genomes have been well demonstrated [21,22,37,38]. Some examples of LTRs regulating gene expression should be mentioned here. When the Gret LTR was integrated upstream of the VvmbyA1 gene, which regulates anthocyanin biosynthesis in the black-skinned grape cultivar Kyoho, grape skin color changed in grape cultivars such as 'Ruby Okuyama' and 'Chardonnay' containing the integrated Gret LTR [19,22]. Another LTR, Rider, provides a novel promoter that can drive the expression of the Ruby gene in the flesh of orange fruit, resulting in the development of blood oranges [19,21]. When we focused on the metabolite differences between the wild and cultivated mulberry species, we found significant differences in flavonoid metabolite contents in these two mulberry species (Fig. 5i). Such a difference was also revealed at the genomic level. As illustrated in the present study, we assigned the members of five different gene sets (rearranged, species-specific, contracted, expanded, and positively selected genes) to KEGG pathways and found that a total of 18, 6, 7, 52, and 2 genes, respectively, were involved in the biosynthesis of secondary metabolites, compared with 22, 12, 6, 55, and 2 genes, respectively, mapped to the biosynthesis of secondary metabolites in M. alba, but none of these genes exhibited a degree of altered gene function comparable to those associated with LTRs. The numbers of LTR-associated genes involved in the biosynthesis of secondary metabolites were as high as 933 and 793 in M. notabilis and M. alba, respectively. For example, a total of 73 genes associated with flavonoid biosynthesis were flanked by LTRs or contained LTRs in the two genomes (Fig. 5h). Considering these results in conjunction with the roles LTRs have been shown to play in previous reports, we assert that LTRs are a major driver in genome divergence and the evolution of mulberry species.
In conclusion, we assembled a high-quality chromosomelevel genome sequence of M. notabilis. Our results illustrated that chromosome fissions/fusions were the key mechanism underlying the evolution of differences in the basic chromosome number between M. notabilis and M. alba. Subsequently, LTRs played critical roles in determining differences in the architecture of these two mulberry genomes. Our high-quality genome sequence of M. notabilis provides a valuable resource for the study of the origin, genome evolution, domestication, and subsequent genetic improvement of mulberry.

Plant materials, genome sequencing
Fresh young leaves were collected from the wild mulberry species M. notabilis growing wild in Ya'an, Sichuan Province, China. The leaves were immediately frozen in liquid nitrogen and stored. Total DNA was prepared using a standard cetyltrimethylammonium bromide method. After extraction, the concentration, integrity, and purity of the DNA were determined using a Nan-oDrop spectrophotometer (A 260 /A 280 = 1.8, A 260 /A 230 = 2.0-2.2) and Qubit. Then, genomic DNA was repaired using the NEBNext FFPE DNA Repair Mix Kit (M6630, USA) and was subsequently processed using the Ligation Sequencing Kit (SQK-LSK109, UK) according to the manufacturer's instructions. The high-quality large segment library was premixed with loading beads and sequenced on the ONT PromethION platform with the ONT sequencing reagents kit (EXP-FLP001.PRO.6, UK) and a corresponding R9 cell, according to the manufacturer's instructions.
For Hi-C sequencing, the chromosomal structure was crosslinked with formaldehyde, and the genomic DNA was digested using HindIII, as previously described [39].
A Hi-C library with a 300 bp-700 bp insert size was constructed and subsequently sequenced on the Illumina NovaSeq 6000 platform.

De novo genome assembly
The genome size of M. notabilis was estimated with GenomeScope2 (v2.0) [40] using high-quality Illumina reads. All raw reads generated on the ONT platform were first corrected with Canu (v1.8) software [41]. Thereafter, the error correction and de novo assembly tool NECAT (v0.01, https://github.com/xiaochuanle/NE CAT) was used to assemble the original contigs. After initial assembly, the contigs generated were immediately corrected by aligning ONT reads from three rounds using Racon (v1.3.3) [42]. Then, Pilon (v1.23) [43] was used to polish the contigs for three rounds with the Illumina paired-end (PE) reads to generate the final contigs, which were assessed by evaluating the mapping ratio to Illumina reads and completeness and integrity by using Benchmarking Universal Single-Copy Orthologs (BUSCO) (v3) [44].
For chromosome-level assembly, all raw Hi-C data reads were cleaned to remove adapter sequences and low-quality PE reads to generate high-quality clean data. Then, the high-quality Hi-C data were truncated at putative Hi-C junctions and mapped to contigs using the Burrows-Wheeler Aligner (v0.7.7) software package. HiC-Pro (v2.8.1) [45] was used to filter valid reads. Only uniquely mapped interaction paired reads were selected and used to cluster and order the genome sequences onto chromosomes using LACHESIS [46] and Juicebox (v1.11.08) [47].
Gene prediction in the genome was carried out by two strategies: de novo and homolog-based prediction. The MAKER (v2.31.10) genome annotation pipeline was utilized for the whole process. Gene evidence was produced by assembling RNA-Seq reads from male flowers, winter buds, bark, root, fruit, and leaves using Trinity (v2.9.0) [55]. The custom repeat library generated as described earlier was used as the input repeat library. Protein homology evidence was obtained from M. notabilis. Two ab initio gene predictors, Augustus (v3.3.3) [56] and GeneMark-ES (v3.60) [57], were used. The keep_preds parameter was set to 1 to ensure that no gene models were lost from the legacy annotations. Thereafter, amino acid sequences were searched against the InterProScan database [58]. The quality_filter.pl script, distributed with MAKER, was used to filter gene models based on domain content and evidence-based support (Annotation Edit Distance, AED < 1 and/or Pfam domain present). The retained amino acid sequences were then searched against other databases, including the Swiss-Prot, Pfam, KEGG, and GO databases. To compare the two mulberry species, we also performed genome annotation using the M. alba 'Heyebai' genome using the same methods described earlier.

Genome and gene collinearity
MUMmer (v3.1) [59] was used to carry out genome alignment between M. notabilis and M. alba. The Dot-Prep.py (https://github.com/dnanexus/dot) script was used to visualize the genome-genome alignment results. TBtools (v1.055) [60] was used to generate and visualize gene synteny between the two genomes. OrthoFinder (v2.4.0) [61] was used to identify orthologous gene pairs between grapevine and mulberry. The V. vinifera genome, which is by far the closest to the ancestral genome, was used to reconstruct mulberry chromosomal structures.

Gene family annotation and comparative genomic analyses
Homologous genes and species-specific genes from different plant species (M. notabilis, M. alba, Pyunus persica, and Cannabis sativa) were identified with OrthoFinder (v2.4.0) [61]. To construct a phylogenetic tree between these species, the integrated platform PhyloSuite (v1.2.2) [62] was used. All single-copy genes were combined and aligned using MAFFT (v7.471) [63] in codon alignment mode. Then, the alignments were refined using the codon-aware program MACSE (v2.03) [64]. This was followed by the application of Gblocks (v0.91b) [65] to remove ambiguously aligned fragments. All sequences from one species were concatenated into one sequence, and ModelFinder [66] was utilized to select the bestfit partition model using the Bayesian information criterion. Finally, IQ-TREE (v1.6.12) [67] was used to construct a phylogenetic tree according to the bestfit model generated earlier. After this, MCMCtree, implemented in PAML (v4.9) [53], was used to calculate the divergence time between these species, and the time was calibrated with the reported time [2,13]. CAFÉ (v4.2.1) [68] was used to analyze the expansion and contraction of gene families in the two genomes. Enrichment analyses were performed using clusterProfiler (v3.16.1).

Analyses of positively selected genes
A species tree was built as mentioned earlier. All oneto-one orthologs identified by OrthoFinder (v2.4.0) [61] were combined and aligned using MAFFT (v7.471) [63]. PAL2NAL (v14) [69] was used to convert the protein alignment to a codon-based nucleotide alignment. After removing ambiguously aligned regions using Gblocks (v0.91b) [65], codeml from the PAML (v4.9) package [53] was used to estimate ω values (Ka/Ks) with a branch-site model to identify genes that had undergone Darwinian positive selection [53]. Genes with a false discovery rate value of ≤0.05 were considered to be positively selected genes.

Chromosome preparation
The mitotic and diakinetic meiotic chromosomes of M. notabilis and M. alba 'Lunjiao109' were prepared as described previously [9]. For mitotic chromosome preparation, young leaves were pretreated with 2 mM 8-hydroxyquinoline at room temperature for 3 h, fixed in 9:1 ethanol/glacial acetic acid at 4 • C for 4 h, and then stored in 70% ethanol at −20 • C. The leaves were then washed three times in distilled water and digested in an enzyme solution composed of 2% (w/v) cellulase Onozuka R-10 (YaKult, Japan) and 1% (w/v) pectolyase Y-23 (YaKult, Japan) (pH 5.5) at 37 • C for 2.5 h. Young inflorescences were directly fixed in 9:1 ethanol/glacial acetic acid at 4 • C for 4 h and stored in 70% ethanol at −20 • C. Anthers were digested in an enzyme solution of 2% (w/v) cellulase Onozuka R-10 (YaKult, Japan) and 1% (w/v) pectolyase Y-23 (YaKult, Japan) (pH 5.5) at 37 • C for 4 h. Digested leaves and anthers were rinsed with 70% ethanol and macerated into fine suspensions. The cells were resuspended in glacial acetic acid, and a drop of the suspension was added to a glass slide. Then, the slides were screened for well-spread chromosome preparations under an Olympus IX73 microscope (Olympus, Tokyo, Japan).

Probe development and FISH
Single-copy sequences with a length >10 000 bp were screened using BLASTn against the genome sequence of M. notabilis. Repeat sequences were removed through primer design using Primer3. Six out of the 11 sequences (Mn-chr1A, Mn-chr1B, Mn-chr1C, and Mn-chr1D; Mn-chr5A and Mn-chr5B) are shown in Supplementary  Table 23. Candidate centromere tandem repeats from M. notabilis and M. alba were screened using Tandem Repeats Finder (TRF) with parameters = [2,7] (weights for matches, mismatches, and indels, respectively) [70]. A minimum alignment score of 80 and a maximum period size of 1000 were used in the identification of tandem repeats. A plasmid vector containing two copies of Mn-175 nt, was synthesized by TsingKe Biological Technology (Beijing, China). Single-copy, Mn-175 nt, and 25S rDNA sequence probes were labeled, and FISH was performed as described previously [9]. The primers used in this study are listed in Supplementary Table 23. Images were captured with a DP80 charge-coupled device camera attached to an Olympus IX73 microscope (Olympus, Tokyo, Japan). The images were processed with Adobe Photoshop CS6 and Adobe Illustrator CS6. out comparative genomics analyses, Y.X. and Y.T. performed FISH, B.M. and D.L. carried out metabolomics analysis, B.M. and Y.X. prepared the figures and tables and wrote the draft manuscript, and N.H. revised the manuscript. All authors read and approved the final manuscript.

Data availability
All the data generated during the present study have been deposited in the Genome Sequence Archive (GSA) and the Genome Warehouse (GWH) database (PRJCA004004).

Conflicts of interest statement
The authors declare that there are no conflicts of interest.