The chrysanthemum lavandulifolium genome and the molecular mechanism underlying diverse capitulum types

Abstract Cultivated chrysanthemum (Chrysanthemum × morifolium Ramat.) is a beloved ornamental crop due to the diverse capitula types among varieties, but the molecular mechanism of capitulum development remains unclear. Here, we report a 2.60 Gb chromosome-scale reference genome of C. lavandulifolium, a wild Chrysanthemum species found in China, Korea and Japan. The evolutionary analysis of the genome revealed that only recent tandem duplications occurred in the C. lavandulifolium genome after the shared whole genome triplication (WGT) in Asteraceae. Based on the transcriptomic profiling of six important developmental stages of the radiate capitulum in C. lavandulifolium, we found genes in the MADS-box, TCP, NAC and LOB gene families that were involved in disc and ray floret primordia differentiation. Notably, NAM and LOB30 homologs were specifically expressed in the radiate capitulum, suggesting their pivotal roles in the genetic network of disc and ray floret primordia differentiation in chrysanthemum. The present study not only provides a high-quality reference genome of chrysanthemum but also provides insight into the molecular mechanism underlying the diverse capitulum types in chrysanthemum.


Introduction
Cultivated chrysanthemum (Chrysanthemum × morifolium Ramat.) is a well-known ornamental crop showing very diverse f lower morphologies.The f lower of a chrysanthemum is actually a capitulum that comprises inner disc florets and peripheral ray f lorets.The f lower types of chrysanthemum are determined by the morphology and relative numbers of disc and ray f lorets on a capitulum [1].Understanding the molecular mechanism of disc and ray f loret differentiation under the same genetic background will provide not only a foundation for the clarification of complex capitulum morphology in chrysanthemum but also insight into the f loral development mechanism in Asteraceae.However, studies on the mechanism of f lower type in chrysanthemum are hindered by the complex background of chrysanthemum, making it difficult to directionally breed for flower types, which restricts the utilization of rich flower type resources in chrysanthemum.To date, the genomes of 16 Asteraceae species have been sequenced, which provides insights into the Asteraceae genome [2].However, most of these species have a relatively distant relationship with chrysanthemum.The genome sequencing quality of some Asteraceae species was relatively low due to the limitation of sequencing technology and the high heterozygosity of Asteraceae (Supplementary Table 1).The genomes of C. seticuspe and C. nankingense, belonging to the genus Chrysanthemum, have been sequenced by the Illumina sequencing platform and Oxford Nanopore long-read technology, respectively, without reaching the chromosome level [3,4].Therefore, it is necessary to obtain a high-quality chromosome-scale genome of the genus Chrysanthemum and study the origin of chrysanthemum at the whole-genome level.
Flower development is a complex process that involves a complex gene regulatory network [5,6].The expression patterns of abundant genes could be detected during flower development using transcriptomic and genomic sequencing technology.Studies on single f lowers have revealed that ABCE-class and CYC2-LIKE genes are involved in f loral organ identity and the regulation of floral symmetry [6,7].The development of next-and third-generation sequencing technologies enables the generation of high-quality genomes, and subsequently, these reference genome data can provide more information to study the molecular mechanism of flower development [8].Previous studies showed that the expression patterns and copy numbers of these genes could regulate f loral organ morphology to modify flower types [5,7,9].Comparative genomic analysis can also help to reveal the copy numbers and evolution of floral development-related genes [10][11][12].To date, several transcriptome sequencing profiles have been carried out in C. lavandulifolium, C. nankingense and C. × morifolium "Jinba" [3,13].However, the analysis of these transcriptomes lacked either a reference genome or the transcriptomic profiles of disc and ray f loret differentiation stages.Moreover, studies in Gerbera hybrida, Cosmos bipinnata and Senecio vulgaris found additional f loral-related genes involved in capitulum development [14][15][16].In conclusion, more transcriptomes and reference genomes of chrysanthemum are needed to explore hub genes that participate in the molecular mechanism underlying the diverse capitula types.
The diploid species C. lavandulifolium (2n = 2x = 18) is often regarded as one of the ancestral species of chrysanthemum [17,18].It is also used as a model plant to study the diverse capitula types in chrysanthemum due to its simple capitulum type, which possesses only one round peripheral ray f loret and many round inner disc florets [19,20].Here, we sequenced and anchored 2.60 Gb sequences to 9 pseudochromosomes of C. lavandulifolium.Phylogenetic analysis showed that C. lavandulifolium was an important donor species for chrysanthemum [17].Transcriptomic profiles of different developmental stages in radiate (C.lavandulifolium, Chrysanthemum indicum, Chrysanthemum vesticum, C × morifolium "28", Erigeron brevisca, Helianthus annuus), discoid (Hippolytia alashanensis and Helenium aromaticum) and ligulate (Lactuca sativa and Taraxacum kok-saghyz) capitula were also analyzed to explore the hub genes involved in diverse capitulum development.Our study not only provides a high-quality reference for the assembly of chrysanthemum genomes but also sheds light on the regulatory mechanism of capitulum development in Asteraceae.
With the aid of high-resolution chromosome conformation capture (Hi-C) technology, 2.93 Gb (94.46%) of contigs were anchored onto the 9 chromosomes, forming pseudomolecules (Figure 1 and Supplementary Table 3).
After the removal of redundant and heterozygous sequences by the heatmap signal of Hi-C, a 2.60 Gb chromosomal-level genome for C. lavandulifolium was obtained, with a contig N50 of up to 497 kb (Figure 1a, Table 1 and Supplementary Figure 2).Compared with the estimated C. lavandulifolium.genome size by K-mer distribution analysis, 98.48% of the C. lavandulifolium sequences were successfully assembled and anchored onto the nine chromosomes (Supplementary Table 3).
This C. lavandulifolium genome is the first chromosomelevel genome in the genus Chrysanthemum.The proteincoding genes were annotated by ab initio prediction, homology and transcriptome-based approaches, and the results were then integrated by Evidence Modeler.A total of 64 257 protein-coding genes were predicted (Supplementary Table 4), and the number of genes supported by homology prediction and transcriptome prediction was 53 714, accounting for 83.59% (Supplementary Figure 3).Furthermore, 54 203 (84.35%) genes in the present reference genome were annotated by functional databases (Supplementary Table 5).We identified 1417 noncoding RNAs classified into 44 families: 16 families of miRNA, 4 families of rRNA, and 24 families of tRNA (Supplementary Table 6).A total of 12 097 pseudogenes were identified using GeneWise.Benchmarking Universal Single-Copy Orthologs (BUSCOs) evaluation showed that 89.02% and 92.36% of complete genes were obtained in genome mode and protein mode, respectively, which suggested the high quality of our assembled C. lavandulifolium genome (Supplementary Table 7).
Based on the high-quality reference genome in this study, 1.76 Gb of repetitive sequences of C. lavandulifolium were predicted, with 60.25% being retrotransposons and 3.5% DNA transposons (Supplementary Table 8).Retrotransposons are the main components of transposons, with Copia and Gypsy (37.92% and 29.06%, respectively) being the most common.Compared with other Asteraceae species, LTR expansion of C. lavandulifolium was detected at ∼1.25 Mya, which was very close to that of C. nankingense (∼1.45 Mya) (Figure 1c).Bursts of Copia and Gypsy occurred at ∼1.25 Mya, which was consistent with the LTR expansion time of C. lavandulifolium (Figure 1d).This result indicated that LTR expansion in C. lavandulifolium was mainly driven by bursts of Copia and Gypsy.

Evolution of the C. lavandulifolium genome
To study the conservation and specificity of the genomic structure of C. lavandulifolium, clustering of the predicted proteins in the C. lavandulifolium genome with those from 4 other representative Asteraceae species showed 11 419 gene families shared by 5 species, with 2750 gene families specific to C. lavandulifolium (Supplementary Figure 4).Gene Ontology (GO) enrichment analysis revealed that C. lavandulifolium-specific genes were mainly enriched in cellular process (GO: 0044763), metabolic process (GO: 0044710) and catalytic activity (GO: 0003824) (Supplementary Table 9).Furthermore, a phylogenetic tree was constructed with 166 single-copy genes in C. lavandulifolium, and the other 10 species showed that C. lavandulifolium diverged from C. nankingense at approximately 7.2 Mya (Figure 1b).We also compared gene family expansion and contraction among the 11 species to examine the evolution of the C. lavandulifolium genome (Figure 1b).The results showed that 1305 and 453 gene families were expanded and contracted in C. lavandulifolium, respectively (Figure 1b).The gene families that were expanded in the C. lavandulifolium genome were enriched in flower development-related GO terms and cell synthesis-related GO terms (Supplementary Table 10).
Whole-genome duplication (WGD) is one of the most important driving forces in the evolution and divergence of flowering plants [23].The collinearity of the C. lavandulifolium and Vitis vinifera genomes showed a 3 vs. 1 syntenic relationship between C. lavandulifolium and V. vinifera, which provides evidence for pancore eudicot γ events and whole-genome triplication (WGT-1) in C. lavandulifolium (Supplementary Figure 6c).The synonymous substitution rate (Ks) for paralogous genes in C. lavandulifolium revealed a WGD peak with an approximate Ks value of 0.8 (Supplementary Figure 5).We also calculated the Ks values of C. lavandulifolium, Helianthus annuus, and Cynara cardunculus, which showed that the WGT event was shared with asterids II in Asteraceae species (Figure 1e).Compared with H. annuus, which had another WGD event after divergence, the C. lavandulifolium genome only experienced tandem duplication [24,25] (Figure 1e, Supplementary Figure 6 and Supplementary Figure 7).GO enrichment analysis of tandem duplicated genes mainly showed enrichment in biological processes (metabolic process, cellular process and single-organism process) (Supplementary Table 11 and Supplementary Figure 8).

Hub genes involved in the differentiation of disc and ray floret primordia
The above studies provide a high-quality genome for C. lavandulifolium, which is one of the important ancestors of chrysanthemum.As this species has a simple capitulum structure, it has been considered the original morphology of the capitulum, and it is often used as a model plant to study the molecular mechanism underlying the diverse capitula in chrysanthemum [18,19].Here, we selected six important developmental stages of the capitulum in C. lavandulifolium that differ significantly in their developmental morphology (Figure 2a, Supplementary Tables 12  and 13).Notably, only one round of disc f loret primordia initiated on the capitulum during stage 5 (S5), and peripheral ray f loret primordia began to appear on the outer round of disc f loret primordia during stage 6 (S6), at which stage the disc and ray f loret primordia began to differentiate (Figure 2a).The differentially expressed genes among the six important developmental stages, especially S5 and S6, will provide us with a large number of candidate genes involved in capitulum development.
A total of 149.22 Gb of clean transcriptomic reads were generated from different developmental stages of the capitulum and were mapped to the C. lavandulifolium genome.The mapping ratio ranged from 89.32% to 90.40% in different samples, which indicated the high quality of the transcriptomic profiling (Supplementary Table 13).To investigate the transcriptional differences across six developmental stages of the capitulum, differentially expressed genes (DEGs) among sequenced samples were analyzed by using a Venn diagram (Supplementary Figure 9a).The numbers of differentially expressed genes between the two samples are shown in Supplementary Fig. 9b.Genes with an adjusted P value<0.01 found by DEseq were considered to be differentially expressed (Supplementary Figure 9b).To further identify the hub genes in the gene regulatory network of ray and disc floret differentiation, we used weighted gene coexpression network analysis (WGCNA) (Figure 2b).A total of nine module genes were enriched, including magenta module genes enriched in stage 5 and gray module genes enriched in stage 6 (Figure 2c and 2d, Supplementary Figure 10).The eigengene connectivity (KME) of genes in the magenta and gray modules was calculated, and the top 150 genes were selected by KME value.Finally, 21 hub genes were differentially expressed in stage 5 and stage 6, including the boundary genes NAM (EVM0043444) and LOB30 (EVM0010025) (Supplementary Figure 11).Duplication of NAM/CUC genes occurred in Asteraceae species (Supplementary Figures 12 and 13).There were six NAM-LIKE genes in the C. lavandulifolium genome, four of which were expressed in the early capitulum developmental stage.The hub gene NAM (EVM0043444) not only showed different expression patterns between stages 5 and 6 but was also highly expressed in stage 6.The expression patterns of the other three NAM-LIKE genes were different from those of NAM (EVM0043444) (Figure 2e, Supplementary Figures 11 and 12).Among the three CUC-LIKE genes, CUC2 (EVM0038344) and CUC3 (EVM0059978) showed similar expression patterns as NAM (EVM0043444) (Figure 2e, Supplementary Figures 11 and 13).Three LOB30-LIKE genes were identified in the C. lavandulifolium genome, and only LOB30 (EVM0010025) was mainly expressed in stage 6 (Supplementary Figures 11 and 14).The above studies suggested the hub roles of the NAM, CUC2/3 and LOB30 genes during capitulum development, especially for the differentiation of disc and ray florets.The distribution of these genes is shown in Supplementary Table 14.To further study the effects of NAM/CUC and LOB30 on capitulum development, the expression patterns of these genes in different capitulum types (stage 6 of discoid, radiate and ligulate capitula) were examined (Supplementary Figure 15 and Figure 3).We found that the expression of NAM and LOB30 was mainly concentrated in the radiate capitulum to regulate the differentiation of disc and ray florets (Figure 3).CUC2 and CUC3 showed different expression patterns in the discoid and ligulate capitula, indicating that CUC2/CUC3 may function in the development of different types of capitula in Asteraceae (Figure 3).

The expression of ABCE-class and CYC2-LIKE genes in capitulum development
In addition to the hub genes in the magenta and gray modules, we also found ABCE-class and CYC2-LIKE genes enriched in blue and brown modules.In recent decades, the ABCE model has been shown to be the most influential regulatory model involved in floral organ identity [9,12].A total of 16 ABCE-class genes were identified, and all of them expanded during the evolutionary history of the C. lavandulifolium genome (Supplementary Figure 16 and Supplementary Table 15).In our studies, B-and C-class genes were highly expressed when the second and third whorls of disc and ray f lorets began to initiate (Figure 4a, Supplementary Figures 17 and 19).We identified three Aclass genes and four E-class genes expressed during early capitulum development (Figure 4a, Supplementary Figures 17 and 18).Notably, the A-class gene CAULIFLOWER (CAL, EVM0046680) and E-class gene SEPELLATAa (SEPa, EVM0006753) were highly expressed at stage 5, when only disc floret primordia began to initiate (Figure 4a and Supplementary Figure 17).CAL was located on chr05, and SEPa was located on chr02 (Supplementary Table 14).The A-class gene FRUITFULL (FUL, EVM0011418) was mainly expressed at stages 5 and 6, especially at stage 6, when ray floret primordia began to initiate on the capitulum (Figure 4a and Supplementary Figure 17).These results indicated that the functions of CAL, SEPa and FUL might regulate the differentiation of disc and ray florets.
CYC2-LIKE genes are known to regulate the symmetry of flowers [26].Homology analysis showed that there  were eight CYC2-LIKE genes in the C. lavandulifolium genome; these genes were mainly distributed on chr06, chr07 and chr08 (Supplementary Figure 20, Supplementary Tables 14 and 16).Most of the CYC2-like genes showed increasing expression with the development process, except CYC2a1 (EVM0062019), which was mainly expressed at vegetative stage 1 (Figure 4a and Supplementary Figure 21).Moreover, tandem duplication events in the C. lavandulifolium genome drove the duplication of CYC2c/2d/2e/2f , and this gene set might have undergone subfunctionalization during the evolution of the C. lavandulifolium genome (Supplementary Figure 22).CYC2c and CYC2d showed different expression patterns, which indicated that they were subfunctionalized (Figure 4a and Supplementary Figure 23).The expression patterns of CYC2e and CYC2f were similar, which indicated that these two genes had redundant functions in regulating capitulum development (Figure 4a and Supplementary Figure 21).Notably, the duplication of CYC2a was unique in the genus Chrysanthemum, and both copies were expressed during the early capitulum development stage, especially CYC2a2 (EVM0076812), which was highly expressed at stage 5 (Figure 4a, Supplementary Figures 21 and 22).This result suggested that CYC2a2 might be a unique gene in regulating capitulum development in the genus Chrysanthemum.In summary, five ABCEclass and CYC2-LIKE genes were mainly expressed at stages 5 and 6, which indicated that they might be involved in disc and ray f loret differentiation (Figure 4a).However, homologous genes of ABCE-class and CYC2-LIKE in other Asteraceae species did not show obvious expression differences among different capitula types (Supplementary Figure 23).
Protein interactions among these candidate genes were predicted by STRINGS and previous studies (Figure 4b).In addition to ABCE-class and CYC2-LIKE genes, inf lorescence meristem-related genes (FT, TFL1 and LFY) are also involved in capitulum development.However, the expression levels of FT, TFL1 and TFL at stage 5 and stage 6 were relatively lower than those at other developmental stages (Supplementary Figure 24).In the discoid capitulum, CUC2 interacted with LFY and AG to regulate the initiation of disc f loret primordia (Figure 4b).When CUC2 interacted with CUC3, the expression of these two genes could promote the development of floret primordia into ray f lorets (Figure 4b).During radiate capitulum development, the existence of NAM and LOB30 contributed to the differentiation of ray and disc floret primordia.LOB30 could interact with LFY, TFL1, CUC2, CUC3 and NAM, indicating its hub role in the genetic regulatory network (Figure 4b).Overall, NAM and LOB interacted with not only inf lorescence meristemrelated genes (LFY) but also f loral organ identity genes and CYC2-LIKE genes during capitulum development, indicating the hub roles of NAM and LOB30 in the differentiation of disc and ray f lorets on the radiate capitulum.

Discussion
To date, the genomes of 16 Asteraceae species have been sequenced, including L. sativa, E. breviscapus, Artemisia annua, C. nankingense, C. seticuspe, H. annuus, and Mikania micrantha (Supplementary Table 1).In this study, a high-quality chromosomal-scale reference genome of C. lavandulifolium was successfully obtained using Nanopore and PacBio technologies with assistance from a Hi-C heatmap.The reference genome of C. lavandulifolium that we obtained displayed higher integrality and accuracy than that for C. nankingense at the chromosome level 3 .Compared with the assembled genomes of other Asteraceae species, the scaffold N50 of the C. lavandulifolium genome was the longest, at up to 300 Mb.
Flower crops with new and unique flower types could have great economic value, and flower type modification is an important goal of ornamental breeding.Genetic manipulation of floral development-related genes is an effective method to directionally modify flower types.Previous studies have found that modifying the expression of floral-related genes, such as MADSbox and TCP transcription factors, could directionally improve the floral types [15].These studies mainly concentrate on ABCE-class and CYC2-LIKE genes to influence the identity of floral organs and the symmetry of a single flower [25].However, the flowers of some plants condense to form inflorescences, which is called "pseudanthium", that usually have higher ornamental value.The regulation of this complex inflorescence remains to be clarified.Chrysanthemums, as one of the most valuable ornamental plants, are famous for their diverse capitulum types.The modification of capitulum types in chrysanthemum will also provide insights for other inflorescence type modifications.However, the molecular mechanism of capitulum development is hindered by the complexity of flower types and the lack of genomic data.
The expression patterns of ABCE-class genes in the capitulum developmental process are different from those in a single flower, but their functions are relatively conserved and mainly play roles in regulating the identity of the four floral organ whorls during disc and ray floret development [15].The present study found that most of the ABCE-class genes began to be highly expressed during stage 9 -stage 10, the stages in which the floral organs initiate on the disc and ray florets.The CYC2-LIKE genes in Asteraceae expanded significantly.CYC2-LIKE genes have been subfunctionalized and neofunctionalized in regulating the differentiation of disc and ray florets [27].Previous transgenic studies of CYC2c and CYC2d in C. lavandulifolium have changed the length of ray florets to some extent, which indicates that CYC2-LIKE regulates floret development via the control of dorsal petal elongation [27,28].Combined with our wide survey of CYC2-LIKE genes during the key developmental stage of C. lavandulifolium capitulum, CYC2e/2f may be redundant with CYC2c/2d, and CYC2a may have evolved a new function in the genus Chrysanthemum.Overall, the gene expression and evolution at the genome level showed that the ABCE-class and CYC2-LIKE genes contributed to the capitulum development of C. lavandulifolium.
However, transgenic studies of ABCE-class and CYC2-LIKE genes in Gerbera, Senecio vulgaris and C. lavandulifolium did not alter the identity of disc and ray florets, although some of these genes could change the morphology of the two kinds of f lorets or cause the complete loss of the inf lorescence meristem 15 .It appears that ABCE-class and CYC2-like genes in Asteraceae are not the hub genes in the regulatory network of disc and ray f loret differentiation.WGCNA showed that the hub genes regulating disc and ray floret differentiation in the capitulum were NAM and LOB30, and NAM/CUC and LOB30 subfamily genes not only regulate the initiation and orientation of organ primordia in early f lower development but also respond to inf lorescence meristem-related genes and floral identity genes [29][30][31].Based on the hub roles of NAM and LOB30 in stages 5 and 6, we concluded that the two genes might regulate the differentiation of disc and ray f loret primordia in early capitulum development and that they could interact with downstream genes to regulate the f loral organ identity and development of each f loret on the capitulum.Duplication of NAM and its chromosome position in the C. lavandulifolium genome indicated that NAM might be a special key gene in regulating the diverse capitula types of chrysanthemum.Protein interactions showed that these hub genes could interact with LFY, which has been proven to regulate ray floret development in Gerbera [31].Previous studies supported the idea that the capitulum was derived from a cyme in which peripheral branches were inhibited [32].The interaction of NAM/CUC and LOB30 regulated the expression of LFY to prevent the development of peripheral branches and promote peripheral ray floret primordia in different capitula types.In conclusion, the C. lavandulifolium genome presented in this study provides a powerful reference for the further assembly of complex genomes, especially for the deciphering of chrysanthemum genomes.Based on comparative genomic and transcriptomic analyses, we identified hub genes that might be involved in the identity of disc and ray f lorets, which could serve as candidate genes for further genome editing to modify the capitula types in chrysanthemum.

Plant materials and sequencing
The C. lavandulifolium G 1 line was collected and cultured at Beijing Forestry University for genomic sequencing [33].The genomic DNA of the C. lavandulifolium G 1 line was extracted using a standard CTAB protocol.The paired-end libraries were sequenced on the Illumina HiSeqTM 4000 sequencing platform (Illumina, San Diego, CA, USA).The 27-mer frequencies were generated using 135.61Gb of high-quality PE reads (51.43 ×), and a total of 1 × e 11 k-mers were obtained by a customized Perl script.The main peak value representing the average k-mer depth was 38.A modified formula was used to estimate the C. lavandulifolium genome size, G = nk-mer/daverage k-mer (G represents the genome size, nk-mer represents the k-mer total number, and daverage k-mer is the average k-mer depth).Genome sequencing was performed using SMRT sequencing on a PacBio RS II sequencer (Pacific Biosciences, Menlo Park, CA, USA) following the manufacturer's standard protocol, and 201.85 Gb PacBio data using SMRT analysis software v1.2 [34] were acquired.To further improve the genomic assembly quality, 193.88 Gb of clean data were generated using a PromethION sequencer (Pacific Biosciences, Menlo Park, CA, USA).

Genome assembly and quality assessment
For the PacBio RSII platform data, longer subreads were selected by the error correction module of canu v1.5 [35].Raw overlapping subreads were detected through the highly sensitive overlap detection program MHAP v2.1 [36], and the error correction of these data was carried out by the Falcon sense method v0.40 ("correct-edErrorRate = 0.025") [37].The error-corrected subreads were used to generate a draft assembly in WTDBG v2.5 (https://github.com/ruanjue/wtdbg).Iterative polishing by Pilon v1.22 [38] was achieved by aligning adaptertrimmed and paired-end Illumina reads to the PacBio draft genome.Clean nanopore data were acquired via sequencing on the PromethION platform, and these data were corrected with the same method described above.The draft genome assembled by WTDBG v2.5 (https:// github.com/ruanjue/wtdbg)was corrected three times by Racon v1.3.3 [39] by aligning adapter-trimmed and paired-end Illumina reads to the Nanopore draft genome.Then, the PacBio draft genome as a query input was aligned against the Nanopore draft genome using MUMmer v4.0.0 [40].The PacBio draft genome and Nanopore draft genome were then merged using quickmerge v0.3.0 [21].This merged draft genome was polished by Racon v1.3.3 [39] and Pilon v1.22 [38].The mapping depth was obtained by aligning corrected Nanopore sequencing data to the merged assembly by minimap2 v2.17 [41] with default parameters.Then, purge haplotigs v1.0.4 [22] was used to eliminate redundancy according to the coverage depth and obtain the purged haplotig genome.Ultimately, a C. lavandulifolium genome with a total length of 3.10 Gb was obtained.The second-generation sequencing data, core gene completeness and BUSCOs were evaluated to verify the accuracy of the genome assembly.

Hi-C sequencing and assistant assembly
Hi-C is a technology derived from chromosome conformation capture technology that utilizes highthroughput sequencing data and is mainly used to assist in genome assembly.We constructed Hi-C fragment libraries with insert sizes of 300-700 bp, as illustrated in Rao et al. [42], and sequenced them using the Illumina platform [42].Before chromosome assembly, we first performed a preassembly for the error correction of scaffolds, which required splitting the scaffolds into segments of 50 kb, on average.Then, the Hi-C data were mapped to these segments using BWA aligner v0.7.10-r789 [43].We retained the uniquely mapped data to assemble the genome using LACHESIS [44] with the following parameters: CLUSTER_MIN_RE_SITES = 80; CLUSTER_MAX_LINK_DENSITY = 2; CLUSTER_NONINFORMATIVE_RATIO = 2; ORDER_MIN_N_RES_IN_TRUN = 16; and ORDER_MIN_N_RES_IN_SHREDS = 16.To further address the redundant sequences, we manually checked any two segments that showed inconsistent connections with the raw scaffold.The detailed workf low schema for the assembly pipeline of the chromosome-scale C. lavandulifolium genome is shown in Supplementary Figure 24.

Transcriptome sequencing
The reproductive-stage leaf and developmental series of capitula (stage 1, stage 2, stage 5, stage 6, stage 9 and stage 10) of the C. lavandulifolium G 1 line were sampled to establish transcriptomic profiling (Figure 2a, Supplementary Table 12).The reproductive-stage leaves, vegetative buds and reproductive buds of nine other species of Asteraceae were also sampled to construct libraries for RNA-seq (Figure 2b, Supplementary Table 13 and Supplementary Figure 15).Total RNA was extracted using a Plant RNA Rapid Extraction Kit (HUAYUEYANG Biotechnology, Beijing, China) and treated with RNasefree DNase I to digest the DNA.After assessing the purity and integrity of RNA using the Agilent 2100 Bioanalyzer (Agilent Technologies, Palo Alto, Calif.) and the ABI StepOnePlus Real-Time PCR System (Applied Biosystems, Waltham, MA, USA), the constructed libraries were sequenced on an Illumina HiSeq TM 2500 sequencing platform (Illumina, San Diego, CA, UAS) [45].The clean reads were aligned to our de novo genome of C. lavandulifolium using TopHat2 (version 2.0.7)[46] and then assembled using Cuff links [47] after removing the connectors of the low-quality sequences and raw reads.The protein-coding genes were annotated against the NCBI NR (http://www.ncbi.nlm.nih.gov),SwissProt, GO, COG, KOG, eggNOG, and KEGG databases.Gene expression was calculated using Cuffquan and CuffnormGene in Cuff links [47].

Gene family identification, genome evolution analysis and species tree construction
The alignments of protein sequences were performed using Diamond v0.9.29.130 (http://www.diamondsearch.org/index.php)[61] with an E-value of 0.001.Orthologous and paralogous gene families were identified by OrthoFinder v2.3.7 [62] with default parameters.A phylogenetic tree based on the concatenated sequence alignment of 166 single-copy gene families from C. lavandulifolium and 10 other plant species was constructed using IQ-TREE with the selected optimal sequence evolution model (−m JTT + F + R5) and with ultrafast bootstrapping [63].MCMCTREE of PAML (v4.9) was used to estimate the divergence times [64].Ks-based age distributions were analyzed by using PAML to calculate the synonymous mutation rate (Ks) values.LTR sequences were identified and filtered using LTR_FINDER v1.07 (score = 6) [65].Then, the flanking sequences of the LTRs were extracted and compared using MAFFT (parameters: -localpair -maxiterate 1000) [66].The distance K was calculated by the Kimura model using EMBOSS v6.6.0 [67].The formula for calculating time is T = K/(2 × r) with the molecular clock r = 7 × 10 −9 mutations per site per year.

WGCNA of flower development in C. lavandulifolium
To analyze genes involved in the six capitulum developmental stages of C. lavandulifolium, weighted correlation network analysis (WGCNA) was performed using the R package [68].The soft thresholding power was set to 7 to construct an adjacency matrix of genes with different expression patterns, and the topological overlap matrix (TOM) similarity algorithm was used to transform the adjacency matrix into a topological overlap matrix to reduce noise and false correlations.Then, all DEGs were hierarchically clustered based on TOM similarity.Hierarchical clustering was performed by Dynamic Hybrid Tree Cut [69].The genes in different colored modules were converted to module eigengenes using the first principal component.The different capitulum development stages in C. lavandulifolium were also correlated with the eigengenes of each module to find the key module associated with capitulum development.The expression heatmap of candidate genes was constructed using TBtools [70].

Evolution and expression analysis of key candidate gene families
The evolution and expression of genes related to capitulum development in C. lavandulifolium were investigated.Based on the previously described results for transcriptome analysis, we chose key gene families for further analysis.The protein sequences of those candidate gene families were scanned using BLASTP and HMMER.For BLASTP and HMMER, initial gene sets were filtered with a default cutoff E-value of 1e-5.Then, phylogenetic trees were established using FastTree (v2.1) [71] and modified by ITOL (https://itol.embl.de/).

Figure 1 .
Figure 1.Overview and genome evolution of the C. lavandulifolium genome.a, Genomic features of the C. lavandulifolium genome and the expression profile data during capitulum development.I, density of repetitive sequences; II, GC content; III, gene density; IV, heterozygosity; V, expression of differentially expressed genes in stage 5 versus stage 6 of capitulum development (stage 5, the formation stage of disc f loret primordia, in which the f loret primordia began to initiate; stage 6, the formation stage of ray f loret primordia, in which the disc and ray f loret primordia began to differentiate on the capitulum); and VI, collinearity block between different pseudochromosomes.b, Phylogenetic gene tree of C. lavandulifolium with 10 other plant species (Amborella trichopoda, Arabidopsis thaliana, Coffee arabica, Cynara cardunculus, Lactuca sativa, Taraxacum kok-saghyz, Helianthus annuus, Artemisia annua, Erigeron breviscapus, C. nankingense, and C. lavandulifolium).c, Analysis of intact LTR numbers and insertion time in 7 Asteraceae plants.d, Analysis of Copia and Gypsy copy numbers in C. lavandulifolium.e, Synonymous substitution rate (Ks) distribution for pairs of syntenic paralogs in C. lavandulifolium and two other plants (H.annuus and C. cardunculus).

Figure 2 .
Figure 2. Transcriptomic profiling analysis of six important stages of capitulum development in C. lavandulifolium.a, The morphology of six developmental stages across samples.Stage 1 (S1), vegetative stage; stage 2 (S2), doming stage; stage 5 (S5), the initiation stage of disc f loret primordia; stage 6 (S6), the initiation stage of ray f loret primordia, at which stage disc and ray f loret primordia began to differentiate; stage 9 (S9), the middle stage of corolla primordia differentiation; and stage 10 (S10), the final stage of corolla primordia differentiation.FP, foliage primordia; Br, bract; DFP, disc f loret primordia; and RFP, ray f loret primordia.b, Weighted gene coexpression network analysis of six developmental stages of f lowers and leaves in C. lavandulifolium.c, Expression patterns of genes in gray modules that might be involved in the development of stage 6. d, Candidate hub genes involved in the genetic regulatory networks of stage 6 (gray).Yellow triangles represent the cis-regulatory motifs of those genes, green circles represent the gene ID number, and blue hexagons represent the GO terms that were enriched.e, Heatmap of NAM/CUC-LIKE and LOB30-LIKE at different developmental stages of C. lavandulifolium.

Figure 3 .
Figure 3.The expression patterns of NAM/CUC and LOB30 homologous genes in ten Asteraceae species.Notes: Transcriptomic profiling of development stages in radiate, discoid and ligulate capitula.Stage 6 of the radiate capitulum has both disc and ray f loret primordia, stage 6 of the discoid capitulum only has disc f loret primordia, and stage 6 of the ligulate capitulum only has ray f loret primordia.RF, ray f lorets; DF, disc f lorets; FP, foliage primordia; Br, bract; DFP, disc f loret primordia; and RFP, ray f loret primordia.

Figure 4 .
Figure 4.The probable gene regulation mechanism in the development of different capitula types.a, The expression patterns of ABCE-class and CYC2-LIKE genes in C. lavandulifolium during capitulum development.b, Gene and protein interactions involved in different capitula types.The dark and red solid lines represent protein interactions predicted by STRING.The blue solid lines represent the protein interaction verified by experiments in Wen et al. 2019.The red dotted lines represent protein interactions during capitulum development in Asteraceae.

Table 1 .
Assembly summary for the C. lavandulifolium genome at the chromosomal level erozygosity) for C. lavandulifolium based on the results from the K-mer distribution (Supplementary Figure1).In total, 269.39 Gb (102.04 × coverage) and 193.88 Gb (62.50 × coverage) of data were generated using the PacBio RS II platform (Pacific Biosciences, Menlo Park, CA, USA) and Oxford Nanopore sequencing technologies (Oxford Nanopore Technologies Limited, Oxford Science Park, Oxford, UK), respectively.These sequences were independently assembled by WTDBG v2.5 (https://gi thub.com/ruanjue/wtdbg), and the two draft genomes were merged and redundant reads were removed by Quickmerge v0.3.0