Genomes and demographic histories of the endangered Bretschneidera sinensis (Akaniaceae)

Abstract Background Bretschneidera sinensis is an endangered relic tree species in the Akaniaceae family and is sporadically distributed in eastern Asia. As opposed to its current narrow and rare distribution, the fossil pollen of B. sinensis has been found to be frequent and widespread in the Northern Hemisphere during the Late Miocene. B. sinensis is also a typical mycorrhizal plant, and its annual seedlings exhibit high mortality rates in absence of mycorrhizal development. The chromosome-level high-quality genome of B. sinensis will help us to more deeply understand the survival and demographic histories of this relic species. Results A total of 25.39 Gb HiFi reads and 109.17 Gb Hi-C reads were used to construct the chromosome-level genome of B. sinensis, which is 1.21 Gb in length with the contig N50 of 64.13 Mb and chromosome N50 of 146.54 Mb. The identified transposable elements account for 55.21% of the genome. A total of 45,839 protein-coding genes were predicted in B. sinensis. A lineage-specific whole-genome duplication was detected, and 7,283 lineage-specific expanded gene families with functions related to the specialized endotrophic mycorrhizal adaptation were identified. The historical effective population size (Ne) of B. sinensis was found to oscillate greatly in response to Quaternary climatic changes. The Ne of B. sinensis has decreased rapidly in the recent past, making its extant Ne extremely lower. Our additional evolutionary genomic analyses suggested that the developed mycorrhizal adaption might have been repeatedly disrupted by environmental changes caused by Quaternary climatic oscillations. The environmental changes and an already decreased population size during the Holocene may have led to the current rarity of B. sinensis. Conclusion This is a detailed report of the genome sequences for the family Akaniaceae distributed in evergreen forests in eastern Asia. Such a high-quality genomic resource may provide critical clues for comparative genomics studies of this family in the future.

I have the opinion it is almost acceptable, but require a few additional clarifications/corrections. Especially, the author suggested a high retention rate of the duplicated genes could explain the large number of coding genes in B. sinensis, but they did not provide any supporting data or analysis. Nor they described how they calculated the LTR burst time. Reply: We have added the description about the duplicated genes in B. sinensis at Lines 243-246 and Table S11. The method of LTR burst time calculating was also added in revised manuscript (Lines: 174-175).
-Lines 71-83: the authors stated they used a Nextera kit for library prep, which is specific to the Illumina platform, but later they indicated the sequencing was performed on the MGI platform. They should clarify this point; Reply: We have proved the corrected library prepared kit after confirmed with BGI (MGIEasy Kit). (Line: 81) -Line 108: please use capital X for coverage; Reply: Done. -Line 122: the guanine+cytosine content is expressed as a ratio; Reply: Done. -Line 129: I guess the authors used Merqury, not Merquery; Reply: Done.
-Lines 292-293: Please clarify this sentence, as it does not make sense to me; Reply: We want to emphasize the population size after LGM is nearly to zero. So we have changed the sentence to "After the LGM, B. sinensis showed an extremely low historical Ne, which approximately reached to zero in spite of a very small recovery" Figure S6b: Please use histograms, not curves, as the species used represent independent observations. Reply: Done. Figure S8a: Where is the diagonal that should be observed when plotting a species against itself? Reply: Among the self-vs-self synteny block plotting, the diagonal usually represents the tandem duplication. We have deleted such results in the previous version. While, adding the diagonal maybe more suitable for displaying all the duplications. So, we have showed the raw results that contained the red diagonal according your suggestion.
References: Please carefully revise the reference list, as there is many typos and confusions between first and last names (for example ref 8; 34; 40; …); also, please check the journal format requirements, and comply to them. Reply: We are grateful for your suggestion and we have revised the reference list, which are suitable for GigaScience.  greatly decreased the population size of endangered species and due to lack of beneficial genetic 47 variations, they could not recover the original distribution at the end of the glacial period [1,[3][4][5]. 48 In addition, some species that may have developed specific adaptations to special habitats through 49 environmental interactions will likely become endangered when such suitable habitats are disrupted 50 [6][7][8]. This may be especially true for species with specialized endotrophic mycorrhizal adaptation 51  (Table S1).  The Pacbio Sequel II platform was used to produce 25.39 Gb long clean reads (Table S1). 95 The Hi-C technology was further performed to anchor contigs into pseudo-chromosomes. 96 Fresh young leaves of the same tree were used to build Hi-C libraries according to the custom 97 procedure [24]. The MboI-digested chromatin was end-labeled with dATP and then used for DNA 98 ligation. Next, the prepared DNA was purified and sheared using Qiagen MinElute PCR Purification 99 Kit (QIAGEN, Hilden, Germany). The purified concentration was detected by Qubit® dsDNA HS 100 Assay Kit (Thermo Fisher Scientific, MA, USA). After tailing, pulldown, and adapter ligation, the 101 DNA library was sequenced on an Illumina HiSeq X Ten System (Illumina HiSeq X Ten, 102 RRID:SCR_016385), and a total of 109.17 Gb raw Hi-C reads were generated (Table S1). 103 104

Estimate of genome size 105
The k-mer based method [25] was used to perform the genome size inference with clean short reads.  Figure S2 and Table S3). 124 The longest and shortest chromosomes were 166.61 and 89.86 Mb respectively in our final 125 chromosome-level assembly (Table S3). 126 To evaluate the quality of our assembly, the guanine cytosine (GC) ratio of B. sinensis was first 127 calculated, and it was found to be similar to the GC ratio of other closely related species (Table S2, 128  (Table S4) (Table S5). Compared to the other recently published plant genomes, we found that the 154 average CDS length, exon length and exon number were highly conserved in B. sinensis and other 155 species (Table S6). Moreover, 1576 (97.6%) BUSCO genes could be completely matched to our 156 predicted B. sinensis gene set (Table S4). 157 Gene functionality was predicted using BLASTP v.2.7.1+ (E-value ≤ 1e−5) by best matching 158 the protein sequences annotated in COG, KOG, NCBI's NR, SwissProt and TrEMBL databases.  (Table S7).  (Table S8). Among LTRs, copia and gypsy were the dominant types and occupied 17.81% 185 and 31.75% genome sequences, respectively. We further inferred the insertion time of the complete 186 peak at ~2 Mya, which represented a recent wave of TE burst (Fig. 2c). The other major types of 189 TEs, such as short interspersed nuclear elements (SINEs), long interspersed nuclear elements 190 (LINEs) and DNA transposons, respectively occupied 0.02%, 2.06% and 2.72% (Table S8). In 191 addition, TEs were unevenly distributed in the genome and were accumulated more in the intergenic 192 regions rather than genic regions, and accumulation was high towards introns compared to exons 193 (Fig. 2d). Furthermore, we identified that 15,426 genes have the TEs insertion. The functional 194 enrichment analyses showed that these genes were mainly involved in plant growth and 195 development (including biological process, cellular component and molecular function) ( Figure S4). and displayed in R. We found these expanded genes were mainly associated with response to auxin, 237 response to endogenous stimulus, organic transport and other process involved in plant development 238 and reproduction ( Figure S7 and Table S10). sinensis that represent the recent WGD, and many small and fragmented collinear blocks were also 255 identified that represented the ancient γ event (Fig. 3 and Figure S8). Basing on the DupGen_finder 256 [60] analyses, we found that 56.86% genes were originated from the WGD events (Table S11), 257 which showed the higher retention of WGD genes and this maybe the major reason for the larger 258 We mainly compared the gene numbers between B. sinensis and its two closely related non-278 mycorrhizal species: Moringa oleifera and Carica papaya. We found that except IPT gene family, 279 the other 12 gene families both showed an obviously expanded gene number in B. sinensis than that 280 in the other two species. MLP and NBS both play an integral role in defending plants [62,63], and 281 we identified 22 and 205 genes in B. sinensis, respectively, which is nearly double than that present 282 in the other two species (Table S12). RBOH is the main producer of reactive oxygen species (ROS), 283 which is the key molecule involved in plant growth and development, and disease resistance 284 signaling [64,65]. A total of 11 RBOH genes were identified in the B. sinensis genome, while the 285 other two species showed a conserved copy number RBOHs of seven ( Figure S10 and Table S12). 286 All the nine auxin-responsive genes families were expanded in B. sinensis and SAURs showed the 287 largest gene number change in our investigated three species (Table S12) very small recovery (Fig. 4). expansion seems to be common in other Tertiary relict trees in eastern Asia [11][12][13]. We found that 322 except for the shared whole-genome triplication for all core eudicots, this species experienced an 323 additional species-specific WGD, which generated more genes that may enhance the survival ability 324 of this species and may contribute to the historical prosperous ( Fig. 4 and Figure S9). The WGD 325 event may not be the main factor causing genome expansion in B. sinensis, as it was nearly six times 326 larger than M.oleifera and three times larger than C. papaya. Therefore, we further focused on the 327 TE activities, which have been proven to take primary responsibility for change in genome size 328 which suggest that TE activities is another possible factor for the large genome size of B. sinensis. 331 A total of 12,959 genes with TE insertions were also detected, and their functions were mainly 332 associated with growth and development in B. sinensis ( Figure S4). It should be noted that TEs 333 could change gene expression and function [76,77] and are usually considered as mildly deleterious 334 [78]. The LTR burst for B. sinensis started ~5 Mya and reached a peak around 2 Mya, and this burst 335 corresponded to contrasted demographic histories of this species inferred from the PSMC analyses. 336 It is likely that these TE insertions may partly account for the special demographic histories of this 337 endangered species although the underlying mechanisms remain unclear. based demographic analyses of this species has recovered its special Ne dynamics (Fig. 4) its closely related two species (Table S12)

Competing interests 383
The authors declare that they have no competing interests. 384

Data Availability 385
All the raw sequence reads used in this study have been deposited in the NCBI Sequence Read