The chromosome-level rambutan genome reveals a significant role of segmental duplication in the expansion of resistance genes

© The Author(s) 2022. Published by Oxford University Press on behalf of Nanjing Agricultural University. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Horticulture Research, 2022, 9: uhac014

Sapindaceae, two from Rutaceae, one from Meliaceae, one from Burseraceae, and three from Anacardiaceae. Arabidopsis thaliana was included as the outgroup in the phylogeny ( Figure 1A).
Whole genome duplications (WGDs) were analyzed based on paralogous gene synonymous substitution (Ks) analysis using a reciprocal best hit (RBH) approach. Unlike A. thaliana, which is known to have undergone two additional WGDs after the γ event shared by all dicot plants ( Figure 1B, red vertical line), there were no additional recent WGD events in Sapindaceae. However, all four Sapindaceae genomes have a much higher peak at Ks value <0.1 ( Figure 1B, blue vertical line), which represents recent local duplications.
Local duplications are therefore more interesting in Sapindales genomes. Unlike WGDs, local duplications often occur within the same chromosomes. We have identified two types of locally duplicated intra-chromosomal segments in five genomes with chromosome-level assemblies (Nla, Aya, Xso, Cma, and Min). The two types of local duplications are: (i) colinear gene syntenic blocks identified by WGDI [7] (Figure S2), which represent segmentally duplicated segments with Ks < 0.2, and (ii) co-localized homologous genes with Ks < 0.2 and a gene distance <= 10 (the two homologous genes are less than 10 genes apart on the chromosome), which represent tandemly duplicated segments.
In total, 10 286 (20.6% of 49 959, Figure 1C) Nla genes were found to be homologous gene pairs in 13 511 intrachromosomal syntenic blocks with Ks < 0.2, indicating that they were derived from segmental duplications. By contrast, only 2010 (4% of 49 959, Figure 1D) Tree scale: 0.5 duplicated genes (Ks < 0.2 and gene distance <= 10) were identified in Nla. Moreover, 1516 (75.4%) of this set of tandemly duplicated genes were also found in the 10 286 segmentally duplicated genes that reside in syntenic blocks. The other four Sapindales genomes (Aya, Xso, Cma, and Min) had lower percentages ( Figure 1C) of segmental gene duplications than Nla. All of them had higher percentages of segmental gene duplications ( Figure 1C, blue bars) than tandem gene duplications ( Figure 1D, cyan bars). Clearly, segmental gene duplications have had a larger impact on the evolution of Sapindales genomes than tandem gene duplications. We performed a systematic search for plant resistance genes (R genes) in the 11 Sapindales genomes using RGAugury [9]. Based on the classifications defined in RGAugury, these R genes included members of based on their other domains [8]. Here, we have adopted the RGAugury classification.
Interestingly, 40.8% ( Figure 1C, orange bars) of the R genes in Nla are located in segmentally duplicated syntenic blocks, and 7.1% ( Figure 1D, red bars) of the R genes are tandemly duplicated. Altogether, 47.3% of the R genes are locally duplicated ( Figure 1E). These results indicate that local duplications have played a very significant role in the expansion of plant resistance gene families in Nla and that segmental gene duplications have been more important than tandem duplications. Between 20% and 28% of the R genes in other Sapindales genomes ( Figure 1C) are located in segmentally duplicated syntenic blocks, much lower than the 40.8% in Nla. This finding suggests that segmental gene duplications have had a larger impact on the evolution of the Nla genome and have contributed more to the expansion of plant resistance genes in Nla than in other Sapindales genomes. It should be noted that the rate of duplication is likely to have been underestimated, given that more ancient duplicated genes are often lost. As the duplicated syntenic blocks in Sapindales genomes have very small Ks values ( Figure S2 and Figure 1B), this phenomenon may be less of an issue in the present study.
Among the 11 RGAugury gene families, TNL and CNL contain the highest percentages (70.6% for TNL and 63.0% for CNL) of locally duplicated genes ( Figure 1E) in the Nla genome. There are 197 TNL genes and 397 CNL genes, which are also the most abundant among the 12 genomes. The phylogeny of NBS-containing RGAugury genes ( Figure 1F) shows that, compared with Ath, Nla contains a few unique and significantly expanded CNL and TNL subfamilies. Locating all R gene families on the chromosomes revealed an uneven distribution (or clustering) of R genes in the genome ( Figure 1G). Chromosomes 1, 2, and 3 contain the most R genes, especially in the NBS-containing families such as NBS, CNL, NL, and TNL, whereas RLK and RLP genes are more widely distributed in all chromosomes. Clearly, local duplication has played a major role in the lineage-specific expansion of NBS-LRR genes in Nla. Comparison of the total R gene repertoires among different Sapindales genomes revealed that Nla has the largest number (2870) of R genes and the second highest R gene percentage (5.8%) of the 12 genomes ( Figure 1A), probably as a result of recent local gene duplications. These findings are consistent with and significantly expand upon what has been reported in a previous study [10] of NBS-LRR gene evolution in three Sapindaceae genomes (Aya, Dlo, and Xso).
In summary, the chromosome-level assembly of the Nla genome provides a useful reference for the study of Sapindaceae plants. The draft genome will also be a great resource for the discovery of molecular markers (e.g. SNPs) for characterizing different rambutan cultivars throughout the world. The Nla resistance genes identified in this study will help researchers to develop better strategies for improving rambutan resistance to pathogens and diseases. a start-up grant from UNL [2019-YIN] to Y.Y. Mention of trade names or commercial products in this publication is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the U.S. Department of Agriculture. The USDA is an equal opportunity provider and employer.

Author contributions
YY, DZ, LWM, RG, and TM conceived and designed the project. DZ, LWM, and RG collected the plant materials and generated the sequencing data. JZ performed all the data analysis under the supervision of YY. JZ and YY drafted the manuscript. All authors contributed to and approved the final manuscript.

Data availability
The raw DNA sequencing reads and the assembled genome of N. lappaceum cultivar R-162 have been submitted to NCBI. The BioProject ID is PRJNA766632, the BioSample ID is SAMN21855570, and the Whole Genome Shotgun accession number will be available upon publication of the paper. The genome and annotation data can also be accessed at https://bcb.unl.edu/Nla/.