Chromosome-scale haplotype-phased genome assemblies of the male and female lines of wild asparagus (Asparagus kiusianus), a dioecious plant species

Abstract Asparagus kiusianus is a disease-resistant dioecious plant species and a wild relative of garden asparagus (Asparagus officinalis). To enhance A. kiusianus genomic resources, advance plant science, and facilitate asparagus breeding, we determined the genome sequences of the male and female lines of A. kiusianus. Genome sequence reads obtained with a linked-read technology were assembled into four haplotype-phased contig sequences (∼1.6 Gb each) for the male and female lines. The contig sequences were aligned onto the chromosome sequences of garden asparagus to construct pseudomolecule sequences. Approximately 55,000 potential protein-encoding genes were predicted in each genome assembly, and ∼70% of the genome sequence was annotated as repetitive. Comparative analysis of the genomes of the two species revealed structural and sequence variants between the two species as well as between the male and female lines of each species. Genes with high sequence similarity with the male-specific sex determinant gene in A. officinalis, MSE1/AoMYB35/AspTDF1, were presented in the genomes of the male line but absent from the female genome assemblies. Overall, the genome sequence assemblies, gene sequences, and structural and sequence variants determined in this study will reveal the genetic mechanisms underlying sexual differentiation in plants, and will accelerate disease-resistance breeding in garden asparagus.


Introduction
Asparagus kiusianus is a wild relative of garden asparagus (Asparagus officinalis). While garden asparagus is a cultivated species belonging to the Asparagaceae family and is consumed as a vegetable crop around the world, A. kiusianus is native to the coastal regions of Japan. 1 Therefore, A. kiusianus might exhibit tolerances and/or resilience to abiotic and biotic stresses. Although A. kiusianus has been identified as a potential donor of stem-blight disease resistance in asparagus breeding programs, 2 neither the genetic mode nor the genetic loci of resistance have been elucidated to date.
Asparagus officinalis is a dioecious species and is widely recognized as a model for sex determination in plants. Recent studies indicate that the male-specific MYB-like gene, MSE1/AoMYB35/ AspTDF1, located at the masculinization-promoting M locus of the Y-specific region in asparagus, functions in sex determination in asparagus. [3][4][5] Since A. kiusianus is also a dioecious plant species, like garden asparagus, it is possible that both species share the same system of sex determination. Therefore, comparative genome sequence and structure analyses between the two species could provide insights into the molecular mechanisms underlying sex determination in Asparagaceae and the evolutionary processes involved therein.
Advances in sequencing technologies have enabled the wholegenome sequencing of various plant species, thus providing fundamental information required for understanding the plant biology and accelerating breeding programs. Nevertheless, while the genome sequence data of garden asparagus 3 and transcriptome data of A. kiusianus 6,7 have been made publicly available, no whole-genome sequence data have been released for A. kiusianus to date. Owing to the dioecious nature of A. kiusianus, which leads to allogamy, its genome is predicted to be highly heterozygous. Therefore, haplotypephased genome sequence data would be useful for dissecting the allelic sequence and structural variations in A. kiusianus. In this study, we employed a linked-read technology (10X Genomics, Pleasanton, CA, USA) to construct haplotype-based genome sequence assemblies of the male and female lines of A. kiusianus. The genome sequence assemblies were then used for gene prediction and sequence and structural variant discovery. Overall, the genome sequence information of A. kiusianus obtained in this study could accelerate studies on plant sex determination and facilitate asparagus breeding programs.

Plant materials
Male (K1) and female (K2) lines of A. kiusianus cultivated at Kagawa Prefectural Agricultural Experiment Station (Kagawa, Japan) were used in this study. Genomic DNA was extracted from the stems of young seedlings using the modified cetyltrimethylammonium bromide method. 8

Genome sequencing and assembly
Genomic DNA libraries of male and female lines were prepared using the Chromium Genome Library Kit v2 (10Â Genomics), and sequenced on NovaSeq 6000 (Illumina, San Diego, CA, USA) in paired-end, 150 bp mode. The sequence reads were assembled with Supernova (10Â Genomics) to construct the contig sequences, scaffold the contigs, and resolve haplotype phases. DNA library preparation, sequencing, and assembly were conducted by Takara Bio (Shiga, Japan) as an outsourcing service.
The genome sizes of male and female lines were estimated based on short reads using Jellyfish. To construct pseudomolecule sequences at the chromosome level, the assembled contigs were aligned against the sequence of 10 A. officinalis chromosomes (reference) using RaGoo.
The software tools used for data analyses are listed in Supplementary Table S1.

Repetitive sequence analysis and gene prediction
Repetitive sequences in the assemblies were identified with RepeatMasker, using repeat sequences registered in Repbase and a de novo repeat library built with RepeatModeler.
RNA-Seq reads of A. kiusianus and A. officinalis were obtained from a public DNA database (GenBank Sequence Read Archive accession number: SRA1003110). 6,7 The RNA-Seq reads, from which adapter sequences were trimmed with fastx_clipper in the FASTX-Toolkit, were aligned against the assembled sequences with HISAT2. Gene prediction was performed with BREAKER2 using the positional information of the repeats, RNAs, and peptide sequences of the predicted genes of A. officinalis (V1.1) 3 released in Phytozome.

Comparative genome structure analysis of Asparagus kiusianus and Asparagus officinalis
Chromosome-level genome sequence assemblies of A. kiusianus (this study) and A. officinalis (V1.1) 3 were compared with Minimap2, and the resultant Pairwise mApping Format files were visualized with pafr.

Haplotype-phased genome assembly
Short-read sequences of the male (143.7 Gb) and female (140.0 Gb) lines of A. kiusianus were obtained in this study, and their genome sizes were estimated at 1,563.8 Mb and 1,729.4 Mb, respectively (Fig. 1).
The short-read sequences of the male line were assembled into raw contigs (total length ¼3,724.6 Mb, N50 ¼ 7.5 kb), which included gaps and all homologous sequences of the diploid genome (Supplementary  Table S2). Then, the homologous sequences were flattened, and the gaps were filled by joining the sequence to its flanking sequence, thus producing megabubble sequences (total length ¼1,811.6 Mb, N50 ¼ 170.6 kb) (Supplementary Table S2). Finally, two haplotypephased genome assemblies (each containing 111,443 sequences) were generated from the megabubble sequences (Table 1). Haplotype 1 spanned 1,618.9 Mb in total with an N50 value of 155.5 kb, while haplotype 2 spanned 1,618.5 Mb in total with an N50 length of 155.3 kb. Complete Benchmarking Single-Copy Orthologs (BUSCO) scores were 88.4% and 88.6% for haplotypes 1 and 2, respectively ( Table 1). The male genome assemblies for haplotype 1 and 2 were designated as AKIK1p1 and AKIK1p2, respectively.
The four sets of genome sequence assemblies of A. kiusianus (haplotypes 1 and 2 of male and female lines) were aligned against the chromosome-scale genome assembly of A. officinalis. A total of 96,224 sequences (1,535.9 Mb) for haplotype 1 and 96,224 sequences for haplotype 2 (1,535.3 Mb) in the male line, and 107,875 sequences (1,491.7 Mb) for haplotype 1 and 107,864 sequences (1,489.7 Mb) for haplotype 2 in the female line, could be aligned to the 10 chromosome sequences of A. officinalis (Table 2). Complete BUSCO scores ranged from 91.2% (AKIK1p1) to 91.8% (AKIK2p2). The nomenclature of the pseudomolecule sequences was based on the chromosome names of A. officinalis (ch01-ch10), where the chromosome 1 is the sex chromosome. 3 Sequences that were unassigned to the A. officinalis genome were connected and termed chromosome 0 (ch00).
A total of 404.2 million RNA reads for 18 samples were mapped to the genome sequences. The mapping rates of A. kiusianus RNA-Seq reads were 93.7-94.1%, while those of A. officinalis reads were 82.4-82.6%. Based on the positions of RNA-Seq reads on the genome sequences, a total of 59,208, 56,706, 57,523, and 58,694 potential protein-coding genes were predicted in AKIK1p1, AKIK1p2, AKIK2p1, and AKIK2p2, respectively (Table 2), of which 365, 380, 472, and 505 genes contained premature termination codons in their internal sequences. Complete BUSCO scores ranged from 90.1% (AKIK2p1) to 91.4% (AKIK1p2).
Next, we compared the sequences of predicted genes with MSE1/AoMYB35/AspTDF1, which has been reported as the malespecific sex determinant gene in A. officinalis. [3][4][5] Two genes, K1p1ch01g28074 and K1p2ch01g47751, identified in the genomes of the male line exhibited high sequence similarity with the query; however, none of the genes in the female genome assemblies showed significant sequence similarity with the query.  [transition (Ts)/transversion (Tv) ratio ¼3.0] and 46,007 insertions/ deletions (indels) were identified between the two haplotypes of the male line (Table 4). On the other hand, 293,196 SNPs (Ts/Tv ¼ 3.0) and 35,732 indels were identified between the two haplotype sequences of the female line ( Table 4). The haplotype sequences of male and female lines were also compared, and 321,334 SNPs and 49,921 indels on average were identified across the four haplotype combinations (Table 4). Bold indicates subtotal (ch01 to ch10) and total values (ch01 to ch10 and ch00).     While the chromosome structures were conserved within A. kiusianus lines and between A. kiusianus and A. officinalis (Fig. 2), genomic rearrangements were observed at the local level. For instance, at the sex-related region including the male-specific gene MSE1/ AoMYB35/AspTDF1 of A. officinalis, sequence collinearity was disrupted by inversions and translocations between the male and female lines (Fig. 3). Although sequence similarity was low between the male haplotypes of A. kiusianus and A. officinalis, sequence collinearity was moderately conserved (Fig. 3).

Conclusion and future perspectives
We present the chromosome-level haplotype-phased genome assemblies of the male and female lines of A. kiusianus, a wild relative of garden asparagus. The genome size of A. kiusianus was estimated to be $1.6 Gb (Fig. 1), which was 300 Mb larger than that of garden asparagus (ca. 1.3 Gb). 3 This estimation was reflected in the difference between the assembly sizes of A. kiusianus (1.6 Gb) ( Table 1) and garden asparagus (1.2 Gb). Of the 1.6 Gb assembly, 1.5 Gb could be aligned to the pseudomolecule sequence of asparagus, without any structural rearrangements (Fig. 2 and Table 2). Since we determined haplotype-phased genome sequences for the male and female lines of A. kiusianus, it was possible to compare the sequence and structure of the M locus between the Y-specific region of the male line and the corresponding region of the female line (Fig. 3). The result suggested dynamic genome rearrangements between the male and female lines, similar to that reported in jojoba, 9 which might lead to presence/absence variation of the male-specific gene MSE1/AoMYB35/AspTDF1 between the male and female lines. [3][4][5] The genome of A. kiusianus harbours valuable genes that could be used for the breeding of elite garden asparagus cultivars. Because of cross-compatibility between the two species, 2,10 important genetic loci, such as those imparting resistance to stem blight, 2 which causes considerable production losses, could be transferred from A. kiusianus into garden asparagus. However, while DNA markers linked to the genes would facilitate the selection of disease-resistant lines in breeding programs, the genetic loci responsible for disease resistance have not been reported so far. The chromosome-level genome sequence of A. kiusianus presented in this study could serve as a reference for genetic mapping and the identification of resistance genes, as well as for transcriptome analysis and the determination of gene functions and mechanisms underlying the resistance and susceptible phenotypes. 6,7 Although the plant genomics era started with the whole-genome sequencing of Arabidopsis thaliana, 11 an undomesticated species, the advanced approaches of plant genomics have been applied more frequently to agronomically important crops rather than to wild plant species. 12 Wild plants have the potential to accelerate the pace of breeding programs and to further the field of plant science. 13,14 The genome sequence information of A. kiusianus generated in this study will help to reveal the genetic mechanisms underlying sexual differentiation in plants and will accelerate disease-resistance breeding in asparagus.

Data availability
The genome sequence information generated in this study is available at Plant GARDEN (https://plantgarden.jp).