The whole-genome assembly of an endangered Salicaceae species: Chosenia arbutifolia (Pall.) A. Skv

Abstract Background As a fast-growing tree species, Chosenia arbutifolia has a unique but controversial taxonomic status in the family Salicaceae. Despite its importance as an industrial material, in ecological protection, and in landscaping, C. arbutifolia is seriously endangered in Northeast China because of artificial destruction and its low reproductive capability. Results To clarify its phylogenetic relationships with other Salicaceae species, we assembled a high-quality chromosome-level genome of C. arbutifolia using PacBio High-Fidelity reads and Hi-C sequencing data, with a total size of 338.93 Mb and contig N50 of 1.68 Mb. Repetitive sequences, which accounted for 42.34% of the assembly length, were identified. In total, 33,229 protein-coding genes and 11,474 small noncoding RNAs were predicted. Phylogenetic analysis suggested that C. arbutifolia and poplars diverged approximately 15.3 million years ago, and a large interchromosomal recombination between C. arbutifolia and other Salicaceae species was discovered. Conclusions Our study provides insights into the genome architecture and systematic evolution of C. arbutifolia, as well as comprehensive information for germplasm protection and future functional genomic studies.


Hi-C scaffolding 6
Hi-C technology was utilized to assist the initial assembly to generate a chromosome-scale genome of C. arbutifolia.

7
First, to filter the raw Hi-C reads, the program Hic-Pro v2.11.1 (RRID:SCR_017643) [29] was used to map the 8 Illumina short reads onto the polished temporary genome with the default parameters. Then, invalid, non-ligated, and 9 self-ligated reads were discarded. Subsequently, the genomic contigs were clustered into potential chromosomal

LTR insertion time estimation 3
The program LTR_FINDER v1.06 (RRID:SCR_015247) [34] was applied to detect LTRs in the C. arbutifolia 4 genome to estimate insertion times, with parameter settings '-D 15000 -d 1000 -L 7000 -l 100 -p 20 -C -M 0.9'. Then, 5 using the LTR_retriever (RRID:SCR_017623) pipeline, the results were integrated, and the false positives were 6 removed from the primitive predictions. The insertion time was calculated as T = K / 2r, where K and r represent the 7 divergence rate and neutral mutation rate (r = 2.5 × 10 −9 ), respectively.

Genome assembly 13
In total, 34.22 Gb with a ~101× HiFi read coverage were generated through whole-genome sequencing of C. 14 arbutifolia using the PacBio Sequel platform (Supplementary Table S1 3 core genes were identified in the OrthoDB embryophyta database, accounting for 96.6% of the total 1440 core genes, 4 among which single-copy and duplicated genes represented 85.1% and 11.5%, respectively (Supplementary Table   5 S3). The features of assembled genomes of different Salicaceae species are illustrated in Table 1.

Gene annotation 21
Through a combined prediction strategy of ab initio, homologous protein, and transcriptome, 33,229 protein-coding

2
After the speciation between C. arbutifolia and Arabidopsis thaliana (4DTv = 0.64), a common salicoid WGD event 3 occurred (4DTv = 0.13). The divergence between C. arbutifolia and P. trichocarpa emerged at the peak of 4DTv ~ 4 0.05, followed by C. arbutifolia and S. purpurea (4DTv = 0.02), which is in consistent with the results of 5 phylogenetic analysis (Fig. 2d). After the differentiation of the Salicaceae species, there was no obvious evidence of 6 a C. arbutifolia-specific WGD.

Genome collinearity analysis 9
Genome collinearity among C. arbutifolia, S. purpurea, S. suchowensis, and P. trichocarpa was analyzed. The syntenic regions showed that most chromosomes were highly conserved among the Salicaceae species, except for a 11 large interchromosomal-recombination between chromosomes one and sixteen (Fig. 3a). Furthermore, the whole 12 chromosomes of C. arbutifolia and the two Salix species were highly collinear (Fig. 3b). Together, these results comprised of the remaining part of chromosome one and the entire chromosome sixteen of P. trichocarpa [17]. chromosome sixteen of P. trichocarpa, and chromosome one of C. arbutifolia comprised the remaining part of P.
1 trichocarpa chromosome one. This difference was confirmed by collinearity analysis (Fig. 3b)