A chromosome-scale genome assembly for the holly (Ilex polyneura) provides insights into genomic adaptations to elevation in Southwest China

Abstract Southwest China is a plant diversity hotspot. The near-cosmopolitan genus Ilex (c. 664 spp., Aquifoliaceae) reaches its maximum diversity in this region, with many narrow-range and a few widespread species. Divergent selection on widespread species leads to local adaptation, with consequences for both conservation and utilization, but is counteracted by geneflow. Many Ilex species are utilized as teas, medicines, ornamentals, honey plants, and timber, but variation below the species level is largely uninvestigated. We therefore studied the widespread Ilex polyneura, which occupies most of the elevational range available and is cultivated for its decorative leafless branches with persistent red fruits. We assembled a chromosome-scale genome using approximately 100x whole genome long-read and short-read sequencing combined with Hi-C sequencing. The genome is approximately 727.1 Mb, with a contig N50 size of 5 124 369 bp and a scaffold N50 size of 36 593 620 bp, for which the BUSCO score was 97.6%, and 98.9% of the assembly was anchored to 20 pseudochromosomes. Out of 32 838 genes predicted, 96.9% were assigned functions. Two whole genome duplication events were identified. Using this genome as a reference, we conducted a population genomics study of 112 individuals from 21 populations across the elevation range using restriction site-associated DNA sequencing (RADseq). Most populations clustered into four clades separated by distance and elevation. Selective sweep analyses identified 34 candidate genes potentially under selection at different elevations, with functions related to responses to abiotic and biotic stresses. This first high-quality genome in the Aquifoliales will facilitate the further domestication of the genus.


Introduction
Tropical and subtropical Southwest China is a global hotspot of plant species diversity due to its topographic complexity, relative climatic stability, and large continuous land area at this latitude in Asia [1,2]. Ilex (Aquifoliaceae), the hollies, is one of the largest and most characteristic woody genera in this region, which supports almost one-third of the known species in this near-cosmopolitan genus [3]. Similar to many plant types, most Ilex species have relatively narrow distributions, with some species known to come from only a single location, while a few are widespread and occupy a range of habitats. Divergent selection in widespread woody species is expected to lead to local adaptation, with consequences for both utilization and conservation, and for predicting responses to climate change [4,5]. However, local adaptation can be hindered by high levels of gene flow between populations, which is expected in Ilex because of its unspecialized, beepollinated flowers, and small, bird-dispersed fruits [3].
Variation below the species level is of more than academic interest in Ilex because many different species throughout the range of the genus have been utilized as teas and medicines, ornamentals, honey plants, timbers, and other minor uses, with a total global market value of billions of US dollars annually [3]. Subspecific variation has been utilized in horticulture for the development of new cultivars and hybrids, but not yet in the huge market for holly teas and medicines. Breeding programs to produce novel ornamental cultivars, caffeine-free teas, or improved medicines would be facilitated by the availability of a high-quality reference genome, which is currently lacking not only for the genus Ilex but also for the entire order, Aquifoliales, to which it belongs.
In this study, we therefore investigated subspecific variation and local adaptation in I. polyneura, a widespread diploid member of the deciduous East Asian clade [3] that occupies most of the elevational range available in the region. It has considerable potential as a source of cut, leafless branches with persistent red fruits, similar to the commercially important Ilex decidua (possumhaw) and I. verticillata (winterberry) in North America and, increasingly, in eastern China. Hong et al. [6] reduced this taxon to a synonym of I. micrococca, but these taxa are usually distinct in our study area, so we followed Flora of China [7] to keep them separate. First, we sequenced and assembled a high-quality chromosome-scale reference genome using PacBio longread sequencing, Illumina HiSeq sequencing, and Hi-C techniques. Then, using this high-quality genome as a reference, we conducted a population genomics study of 112 individuals collected from 21 populations across the whole elevation range of the species using restriction site-associated DNA sequencing (RADseq). We looked for evidence of a genetic structure that is likely to be the outcome of spatially divergent selection and identified candidate genes potentially involved in adaptation to local differences in elevation.

DNA extraction and genome sequencing
Young trees of Ilex polyneura were transplanted from a wild population in Xishuangbanna, Yunnan Province, China (N21.5621 • , E100.3631 • ), into the Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences. Fresh plant tissues, including leaves, flowers, fruits, stems, and roots, were collected in the garden from a single female plant, frozen in liquid nitrogen, and then used for total DNA and RNA extractions. After extraction, sequencing libraries with 350 bp insert sizes were constructed using the library construction kit (Illumina, San Diego, CA, USA), and these libraries were then sequenced using an Illumina HiSeq X Ten platform. These short reads were used to assess the genome size and for genome survey and correction. The genome was assembled using whole genome long-read and short-read sequencing combined with Hi-C sequencing.

Genome assembly
The genome size of Ilex polyneura was estimated using flow cytometry with Carica papaya as the reference [8]. After removing the adapters, reads with more than 10% "N" nucleotides, and reads with more than 20% lowquality nucleotides, the clean reads were used for calculating K-mers (K = 17) by SOAPdenovo2 [9] to confirm the sequencing depth and corresponding K-mer count by determining the highest peak value of the frequency curve of the K-mer occurrence distribution. The genome size was then estimated by the K-mer count/sequencing depth. NextDenovo v2.3.0 (https://github.com/Nextomics/Ne xtDenovo) was applied to correct the single base errors of the PacBio long reads using the Illumina short reads and then to assemble the preliminary genome using default parameters except for minimap2_options_cns = −x avapb -t 10 -k17 -w17, seed_cutoff = 32 210, and min-imap2_options_raw = −x ava-pb -t 10. The preliminary genome assembly was further polished using NextPolish v1.2.4 [10] and Purge Haplotigs v1.0.4 [11]. Again, default parameters were used except lgs_minimap2_options = −x map-pb and lgs_options = −min_read_len 1 k -max_read_ len 100 k -max_depth 100 in NextPolish and -a 84 in Purge Haplotigs. Three methods were used to evaluate the quality of the assembled genome. First, we mapped the Illumina short reads onto the assembled genome using BWA-MEM [12] to check the mapping rate of reads, coverage, and average sequencing depth. Second, we mapped core eukaryotic genes onto the assembled genome using CEGMA (Core Eukaryotic Genes Mapping Approach) [13]. Third, we used the BUSCO (Benchmarking Universal Single-Copy Orthologs) test to evaluate the assembled genome's quality and searched the assembly for gene content that was conserved among all plants with the Embryophyta_odb9 database based on default parameters [14].

Pseudochromosome construction using hi-C
For the Hi-C experiments, approximately 3 g of fresh young leaves was ground into powder in liquid nitrogen. The Hi-C library was constructed following a previously published protocol for plant tissues [15] with chromatin extraction and digestion and DNA ligation, purification, and fragmentation. We constructed chromosome-level scaffolds of the Ilex polyneura genome using ALLHiC v0.9.8 software [16] with the following parameters: -CLUSTER 20, −-MaxLinkDensity 3, −-shortest 150, −-minREs 50, −-format Sanger, −-enz DpnII, −-longest 800, and -NonInformativeRatio 0. The order and directions of the contigs on the pseudochromosomes were then adjusted by examining their interactions in the Hi-C heatmap. The completeness and quality of the final assembled genome was assessed with BUSCO tests using gene content from the Embryophyta_odb9 database [14].

Repeat annotation
A combination of homology alignment and de novo search was used for repeat annotation. RepeatModeler [17] and LTR_FINDER [18] were used for de novo repeat family identification. RepeatScout [19] was used to build the de novo repeat library, only using repeats longer than 100 bp and with a string of "N" nucleotides less than 5%. The repeat library was then combined with the Repbase library (http://www.girinst.org/repbase/) for repeat detection using RepeatMasker and RepeatProteinMask [17]. Tandem repeats were extracted using TRF [20] by ab initio prediction.
Protein-coding genes were identified and annotated using a combination of ab initio prediction, homologybased prediction, and RNA-Seq assisted prediction. Different databases for genome annotation have different algorithms for detecting genes or motifs [21]. Additionally, some incompatible transcripts cannot be transferred to different databases [22], some databases include data from other databases (e.g. Pfam shares data with interPro), but others do not [23], and some databases include only completed genome sequences, while others also include genes or proteins without a completed genome [24], so annotations based on different databases give different results. Thus, several databases were used for ab initio prediction here, including Augustus v3.2.3 [25], Geneid v1.4.4 [26], Genescan [27], GlimmerHMM v3.0.4 [28], and SNAP [29]. Sequences of homologous proteins from Helianthus annuus, Lactuca sativa, Handroanthus impetiginosus, Nicotiana sylvestris, and Solanum lycopersicum were downloaded from Ensembl [30] and NCBI (https://www.ncbi.nlm.nih.gov/protein/) for homology-based prediction. These five species are either campanulids with completed genomes or model plant species. Protein sequences were aligned to the assembled Ilex polyneura genome using TblastN v2.2.26 (a maximal E-value of 1e-5), and the matching proteins were then aligned to the homologous genome sequences with GeneWise v2.4.1 [31], which was used to predict the gene structure in each protein region. For RNA-Seq assisted prediction, the RNA-Seq reads from different tissues, including leaves, flowers, fruits, stems, and roots, were aligned to the genome fasta file using TopHat v2.0.11 [32] to identify exon regions and splice positions with the default parameters. These results were used as inputs for Cufflinks v2.2.1 [33] for genome-based transcript assembly, again with default parameters. The nonredundant reference gene set was generated by merging the genes predicted by these three methods with EvidenceModeler (EVM, v1.1.1) using PASA (Program to Assemble Spliced Alignment) terminal exon support and including masked transposable elements as input into the gene prediction [34].
Gene functions were assigned by aligning the protein sequences to those in the Swiss-Prot database using Blastp (with a threshold of a maximal E-value of 1e-5) and choosing the best match. The motifs and domains were annotated using InterProScan (v5.31) by searching against publicly available databases, including Swiss-Prot [35], NCBI (http://www.ncbi.nlm.nih.gov/protein), Pfam [36], KEGG [37], and InterPro [38]. The Gene Ontology (GO) IDs for each gene were assigned according to the corresponding InterPro entry. We predicted the protein functions by transferring annotations from the closest BLAST hits (a maximal E-value of 1e-5) in the Swiss-Prot and NCBI databases. We also mapped the gene set to KEGG pathways and identified the best match for each gene.

Noncoding RNA annotation
The tRNAs were predicted using the program tRNAscan-SE [39]. For rRNAs, which are highly conserved, we used the rRNA sequences of related species in BLAST. Other noncoding RNAs (including miRNAs and snRNAs) were identified by searching against the Rfam database with default parameters using infernal software [40].

Whole-genome duplication (WGD) analysis
First, the collinearity of protein-coding genes within the genome of Ilex polyneura, as well as between I. polyneura genome and those of five species, Daucus carota, H. annuus, L. sativa, Mimulus guttatus, and S. lycopersicum, was identified using BLASTP (https://blast.ncbi.nlm.ni h.gov/Blast.cgi?PROGRAM=blastp&PAGE_TYPE=BlastSea rch&LINK_LOC=blasthome) and MCscanX with the default parameters [41]. Then, the fourfold degenerate synonymous sites were located, and the 4DTV values were calculated. The WGD events in the I. polyneura genome were determined by plotting the distribution frequency of the 4DTV values.

Positively selected genes in the genome of Ilex polyneura
The single-copy orthologous genes identified above were aligned using MUSCLE v.3.8.31 [42] and then used for calculating the ratios of the nonsynonymous (Ka) and synonymous (Ks) substitution rates using the branch-site model in the program Codeml with the default parameters in the package PAML v4.9 [43]. Ilex polyneura was set as the foreground branch, while five species (S. lycopersicum, D. carota, H. annuus, L. sativa, and M. guttatus) were set as the background branches in the analyses. The positively selected single-copy orthologous genes were identified based on the likelihood ratio test (LRT).

Restriction site-associated DNA sequencing and SNP calling
A total of 112 individuals were collected from 21 populations at the tropical-subtropical transition (21.79-26.99 • N) in Yunnan Province, Southwest China, to sample the whole elevational range of I. polyneura (675-2362 m above sea level; Supplementary Table 5). RADseq data were produced from each individual following a previously reported protocol [44]. Approximately 5 g of leaves dried in silica gel per individual was used for genomic DNA extraction. Approximately 200 ng of highquality genomic DNA per individual was used for RADseq library preparation. Genomic DNA was digested with the restriction enzyme EcoR1 and fragmented randomly. The digested DNA fragments were ligated with an adaptor for Illumina sequencing and a unique barcode. Then, polymerase chain reaction was conducted with the prepared libraries after pooling and random shearing to select the DNA within a certain length range (between 200 and 400 bp). Subsequently, paired-end sequencing (2 x 150 bp) was performed on an Illumina NovaSeq 6000 (Illumina, USA).
The raw paired-end reads for each individual were filtered with FastQC (https://www.bioinformatics.babra ham.ac.uk/projects/fastqc/) with default parameters to examine the read quality. Trimmomatic was used to remove the adaptors and bases with low quality (Phred score < 20) at the beginning or end of reads [45]. Filtered reads were mapped with the new genome as a reference using BWA-MEM with default parameters [12]. Sites in repetitive sequences were examined and removed using SAMtools [46]. SNP calling was run in SAMtools [46] and further filtered in GATK [47]. We only kept high-quality SNPs that were biallelic and had a sequencing depth > 5, minor allele frequency > 0.05, and missing call rate < 0.3.

Phylogenetic and genetic structure analyses
TreeBeST v1.9.2 (http://treesoft.sourceforge.net/treebest. shtml) was used for phylogenetic analysis based on the neighbor-joining (NJ) method. Bootstrap values were calculated based on 1000 replications. Principal component analysis (PCA) was conducted based on the filtered SNPs using GCTA [48]. The input file to the tool was produced using the population genotype file in PLINK [49]. The PED file produced by PLINK was also used for population genetic structure analysis in FRAPPE [50]. The best number of clusters (K value) was determined by the lowest cross-validation error rate, which was tested from 2 to 8.

Linkage disequilibrium and selective sweep analyses
Linkage-disequilibrium (LD) decay was calculated based on the squared correlation coefficient (r 2 ) values between pairwise SNPs using PopLDdecay [51]. In the selective sweep analyses, two approaches were used for the identification of genome-wide selective sweeps: estimating the fixation index (Fst) and the nucleotide diversity (π ) for 40 kb sliding windows with a 20 kb step size in highelevation and low-elevation populations using VCFtools v.0.1.14 [52]. Sites over the top 5% threshold of Fst and π were selected for identification of their functions using GO and KEGG pathway analyses.

Genome assembly and construction of pseudochromosomes
We first produced 70.32 Gb of Illumina paired-end short reads. The 17-mer frequency of the Illumina short reads with the highest peak occurred at a depth of 70 ( Supplementary Fig. 1). We then estimated the heterozygosity rate and percentage of repeat content, which were 1.18% and 52.95%, respectively. The genome size was estimated to be approximately 729.31 Mb, which is close to the size estimated by flow cytometric analyses (741.48 Mb).
The SMRT sequencing technology generated 120.82 Gb reads. These were corrected and assembled to produce a 727.10 Mb genome with a contig N50 size of 5 124 369 bp, contig N90 size of 575 529 bp, scaffold N50 size of 36 593 620 bp, and scaffold N90 size of 25 719 167 bp ( Table 1). The GC content of the assembled genome based on the SMRT reads was 36.08%. The total contig length was 727 102 167 bp, and the number of contigs was 483. The longest contig was 23 338 240 bp. The total scaffold length was 727 144 267 bp, and the number of scaffolds was 62. The longest scaffold was 64 863 080 bp. All three approaches to assessing the completeness of the genome supported high completeness. A total of 97.61% of the short reads could be mapped onto the assembled genome, the average sequencing depth was 90.25, and the coverage of the mapped short reads on the assembled genome was 99.18%. A total of 95.56% of the CEGMA genes were found in the assembled genome. A total of 97.6% of the universal single-copy orthologs in the BUSCO scores (1614 complete orthologs) were detected in the assembled genome ( Table 1), suggesting that the I. polyneura genome is nearly complete and of high quality.
We then anchored all these scaffolds from the SMRT reads to 20 pseudochromosomes using the Hi-C data. The Hi-C technique generated 1.60 Gb raw data, which was filtered to 1.48 Gb clean data for reconstructing physical maps by reordering and clustering the assembled scaffolds. We anchored 98.87% of the assembly onto 20 pseudochromosomes. Chromatin interaction data were used to assess the quality of the Hi-C assembly and showed that it was high ( Supplementary Fig. 2). The lengths of the pseudochromosomes ranged from 23 672 964 bp to 64 863 080 bp, with a scaffold N50 length of 36 593 620 bp (Supplementary Table S1).

Repeat and gene annotations
The method combining ab initio and homology-based approaches identified 57.62% of the assembled genome as repetitive sequences, including DNA transposons (1.6%), retrotransposons (50.04%), and other kinds. Long terminal repeat (LTR) retrotransposons accounted for 49.23% of the genome (Table 2; Supplementary Table S2). A total of 32 838 genes were annotated by combining de novo, homology-based, and transcriptome-based approaches (

Whole-genome duplication (WGD) analysis
Two peaks were identified in the distribution plot of the 4DTv values in the Ilex polyneura genome, one at 4DTv = 0.30 and the other at 4DTv = 0.66 (Fig. 1). This suggests that the ancestors of the genus Ilex underwent one ancient and one recent WGD (Fig. 1). The later WGD event occurred between the times of two WGD events in the D. carota genome, at 4DTv = 0.24 and 4DTv = 0.39 (Fig. 1). We found two peaks that were shared between I. polyneura and S. lycopersicum (4DTv = 0.42 and 0.66), while only one was shared with M. guttatus (4DTv = 0.48) (Fig. 1), suggesting that I. polyneura and S. lycopersicum shared one WGD event not present in M. guttatus. Three peaks were found in S. lycopersicum (4DTv = 0.30, 0.69, and 0.78), suggesting that S. lycopersicum experienced three WGD events.

Positively selected genes
A total of 178 of the single-copy orthologous genes were identified as positively selected in I. polyneura compared to the other five representative species using the LRT method. Chromosome 1 had the highest number of positively selected genes (16), followed by chromosome 4 (14), chromosome 2 (12), and chromosome 6 (12) (Supplementary Fig. 3). Chromosome 17  had the smallest number (2) (Supplementary Fig. 3). KEGG pathway analysis of the positively selected genes suggested that they were enriched in pathways related to resistance to herbivores and pathogens (e.g. map00240 and map00350) and photosynthesis (e.g. map00710) (Supplementary Table S5). GO enrichment analysis of the positively selected genes suggested that they were enriched in genes related to pollination (e.g. GO:0010183), protein methylation (e.g. GO:0006479), transmembrane transport (e.g. GO:0071918), DNA structure (e.g. GO:0006310), and protein activity (e.g. GO:0004824) (Supplementary Table S6).

Population genetic diversity and structure
The population genetic analyses using RADseq produced 2 486 218 696 clean reads, with a mean of 22198381.21 clean reads, from the 21 populations (Supplementary Fig.  4). These reads produced genomes with a total length of 372932.80 Gb and a mean length of 3329.76 Gb. Additionally, 21037588.55 clean reads were mapped with the reference genome on average, with a mean length of 3061.83 Gb. The maximum mapping rate was 96.09%, the  Table S7; Supplementary Fig. 4). After filtering, 862 714 SNPs remained for population genetic analyses. In the NJ phylogeny, four clades were supported, reflecting geographic proximity and elevation. Populations in Clades A and B were from low elevations in southeastern-most Yunnan. Populations in Clade C were from medium elevation environments slightly northwest of those of Clades A and B. Populations in Clade D were from medium elevations in southeastern Yunnan and high elevations in central to northern Yunnan (Fig. 2a, b; Supplementary Table S7). Most populations were monophyletic, but six individuals (out of 112) in four populations (out of 21) were not clustered with the other individuals in their populations.
The population genetic structure analysis identified four clusters (K = 4) ( Fig. 2c and f) as the best based on tests of K = 1-8 ( Supplementary Fig. 5). There were two clusters each at low and high elevations, plus a few individuals with mixed ancestry from similar elevations. All individuals from medium elevation populations had mixed ancestry from both low and high elevations (Fig. 2c). In total, 58 out of 112 individuals had mixed ancestry. The PCA also showed that medium elevation populations were intermediate between those from low and high elevations ( Fig. 2d and e).

Genomic adaptation of Ilex polyneura to different elevations
In the LD analyses, high elevation populations had the highest mean r 2 value (0.230) at 0.1 kb, followed by low elevation populations (mean r 2 = 0.277), medium elevation populations (mean r 2 = 0.275), and all populations (mean r 2 = 0.198) (Fig. 2g). Subsequently, the mean r 2 value dropped rapidly to 0.151 in high elevation populations, 0.141 in low elevation populations, 0.164 in medium elevation populations, and 0.121 in all populations at 20 kb. Overall, medium elevation populations had the highest mean r 2 value, followed by high elevation populations, low elevation populations, and all populations (Fig. 2g). A total of 1042 and 1332 genes were identified as candidate genes in the genome-wide scan of Fst and π values, respectively ( Fig. 3a, b; Supplementary Tables S8 and S9). Only 34 candidate genes were identified by both scans (Supplementary Fig. 6; Supplementary Table S10). Subsequently, 50 terms and 15 pathways were identified in the GO and KEGG pathway analyses (Supplementary Tables S11 and S12; Supplementary Figs. 7 and 8).
Chromosome a b Figure 3. Plots of selection signature distributions on the genome, identified by Fst value (a) and nucleotide diversity (θπ) reduction (π highπ low ) (b). The dashed line indicates the threshold value (the top 5%) for the identification of selection signatures.

Discussion
Using a combination of Illumina short reads, PacBio single-molecule real-time sequencing technology, and Hi-C techniques, we assembled the whole genome of Ilex polyneura. The assembled genome was approximately 727.10 Mb with a contig N50 size of 5 124 369 bp and a scaffold N50 size of 36 593 620 bp. A BUSCO analysis produced 97.6% universal single-copy orthologs, indicating high contiguity and completeness of the assembly, and 98.9% of the genome assembly was anchored to 20 pseudochromosomes (Table 1). This is the first published genome at chromosome-scale in the huge genus Ilex as well as in the order Aquifoliales. Moreover, this is the only genome available for a woody member of the campanulids (asterids II), which includes six additional orders, and it is one of the highest quality genomes on the basis of contig size, BUSCO score, and completeness of the annotation for functional analysis [53][54][55] Table S4). Population genetic analyses revealed evidence of divergent selection leading to genomic adaptation of Ilex polyneura to different elevations. Population genetic structure analysis showed that both low-and highelevation populations consisted of two genetic clusters, while medium-elevation populations were genetic mixtures between low-and high-elevation populations (Fig. 2c, d, e). The two low-elevation genetic clusters had relatively longer phylogenetic branches than the highelevation clusters in the NJ tree (Fig. 2b). Populations in Clades A and B, corresponding to the two low-elevation genetic clusters in the population genetic structure result, are separated by the Tanaka Line, which runs from northwest to southeast and separates the Sino-Japanese and Sino-Himalayan floristic subregions [56,57]. Its significance as a biogeographical boundary seems to largely reflect divergent climatic conditions associated with the monsoons. The LD analysis provided further evidence that populations at low elevations and high elevations have experienced different environmentally selective pressures (Fig. 2g).
Selective sweep analyses identified some candidate genes that were positively selected at different elevations ( Fig. 3a and b) and had functions possibly relating to elevation adaptation in the GO and KEGG pathway analyses (Supplementary Tables S11 and S12; Supplementary Figs. 7 and 8). Some of these functions were also identified in plant adaptations to the Qinghai-Tibetan Plateau (>4000 m) [58], e.g. protein phosphorylation (GO:0006468) (Supplementary Table S11). However, most of the functions were not identified in previous studies of adaptation to the QTP, but they are important in responses to environmental stresses [59,60] and to defense responses against insects [61] and pathogens [62] (Supplementary Tables S11 and S12).
This study thus provides evidence for genetic divergence and significant local adaptation in the face of the expected high levels of gene flow between populations in Ilex. This result has consequences for conservation planning for I. polyneura and other widespread species in the genus, since it will be necessary to protect multiple populations across the range to conserve the full genetic and ecological potential of such species. None of the candidate genes identified as potentially involved in adaptation to local differences in elevation are of direct relevance to the farming of this species for decorative fruiting branches, but the amount of genetic variation suggests that it will be worthwhile to screen multiple wild populations for useful characteristics, including climatic tolerance, phenology, and fruit display. Some of the other commercially important holly species have narrow natural ranges, but others, such as the important traditional medicine Isotachis chinensis, are widespread and would probably also benefit broader searches for useful traits.

Conclusion
Using a combination of Illumina short reads, singlemolecule real-time (SMRT) sequencing technology from Pacific Biosciences (PacBio), and Hi-C techniques, a chromosome-scale genome of Ilex polyneura of approximately 727.10 Mb in length is reported here. This is the first published genome at the chromosome scale in the genus Ilex as well as in the order Aquifoliales and is the only genome available for a woody member of the campanulids. The BUSCO score was 97.6%, and 98.9% of the genome assembly was anchored to 20 pseudochromosomes. Out of the 32 838 genes predicted, 31 826 (96.9%) were assigned functions in the gene annotation. The studied populations clustered into four monophyletic clades in the NJ tree in relation to geographic location and elevation. Populations at low and high elevations had simpler genetic structures, while populations at medium elevations had mixed ancestry from both low and high elevations. The selective sweep analyses identified 34 candidate genes potentially under selection at different elevations, with functions related to responses to abiotic and biotic stresses. Our results confirm the high contiguity, completeness, and annotation of the genome and demonstrate how it can provide a basis for further research into the evolution of the genus, as well as the selection of useful cultivars for horticulture and the production of holly teas and medicines.