Expanding the genetic variation of Brassica juncea by introgression of the Brassica rapa genome

Abstract Brassica juncea is an important vegetable and oil crop cultivated worldwide. To increase its genetic variation, we introgressed the A genome of Brassica rapa into B. juncea. We used three each of heading and semi-heading B. juncea accessions as recipient parents and a B. rapa line, B9008, as the donor parent. We obtained 101 BC1S1 lines in total with expanded phenotypic variations such as leafy head shapes. We developed 132 single-nucleotide polymorphism (SNP) markers that could distinguish the A genome of B. juncea from the B. rapa genome, and tracked the introgression of B. rapa segments in the new B. juncea germplasm. On average, 59.2% of the B. juncea A genome in the B. juncea introgression lines was covered by the donor segments. We also identified three markers whose donor genotype frequencies were significantly lower than the theoretical value, suggesting strong selection of the recipient genotype during the introgression process. We provide an effective strategy to evaluate the diversity of the new germplasm based on the combination of parental resequencing data and marker genotyping results. Further genetic analysis of 1642 SNPs showed that the genetic diversity of the new B. juncea germplasm with the introgressed B. rapa genome was significantly increased. This study illustrates the potential for expanding the genetic diversity of B. juncea through the introgression of the B. rapa genome.


Introduction
There are six species of cultivated Brassicas, three of which are diploid species, B. rapa (AA, n = 10), B. nigra (BB, n = 8), and B. oleracea (CC, n = 9), and three of which are allotetraploid species, B. juncea (AABB, n = 18), B. napus (AACC, n = 19), and B. carinata (BBCC, n = 17). The three allotetraploid Brassica species were derived from the hybridization and polyploidization of two of the three diploid Brassica species. The relationship of Brassica species has been described using the 'triangle of U' model [1]. B. juncea, also known as mustard, has been widely cultivated as an important vegetable, condiment, and oilseed crop in China and some Southern Asian countries [2]. Long-term natural and human selection has resulted in many B. juncea subspecies that possess a variety of root, stem, leaf, and seed stalk forms [3,4]. B. juncea var. capitata Hort (heading B. juncea) is a subspecies with a unique leafy head that is consumed as a fresh or pickled vegetable.
The leafy head is an important nutrient storage organ that contains substantial dietary fiber and vitamins [5]. In addition to heading B. juncea, a leafy head is also a characteristic trait of other Brassica species, including Chinese cabbage (B. rapa ssp. pekinensis) and cabbage (B. oleracea var. capitata). Even though a leafy head is a common trait for these three species, B. juncea shares its unique characteristic of leafy head structure with B. rapa and B. oleracea. The petiole of heading B. juncea is short and wide. As the plant grows, the central leaves overlap and wrap into the leafy head. The plants are often cracked because of the thickened petiole, and poorly developed heads have reduced the commercial value [6]. The formation and improvement of the leafy head have been the focus of many studies [7]. The genetic base of B. juncea is restricted by polyploidy, breeding activities, and a short evolutionary history [8][9][10][11]. The narrow genetic base has also limited the improvement and productivity of this crop [12]. It would be useful to enlarge the genetic base of B. juncea through introgression with a related species [13].
Introgression is the transfer of genes from one species to the gene pool of another by repeated hybridization [14,15]. It is an important method for the transfer of favorable genes and to develop improved varieties. Introgression has been used for generating novel allotetraploid Brassica germplasm [16,17], improving the yield of B. napus [18,19], and expanding the genetic base of B. carinata [20]. Diploid progenitors can be used as genetic resources for the introgression of beneficial genes to improve allotetraploid plant traits [21]. Chinese cabbage is a commonly consumed vegetable in China and East Asia [22]. It has been selected for high yield and a compact leafy head [23]. Consequently, Chinese cabbage has great potential for improving the leafy head of B. juncea. The identification of introgressed segments in hybrids is critical in molecular breeding. The widespread use of molecular markers has provided breeders with a powerful tool to precisely select the desired genotype [24]. Genetic introgression can also be assayed using molecular markers [25][26][27]. Single-nucleotide polymorphism (SNP), as a third-generation marker system with high polymorphism and high density across the genome, has been used for crops such as B. rapa [28] and B. napus [29]. Competitive allele-specific PCR (KASP) is currently the most popular SNP genotyping technology. It has high throughput and high accuracy, and can be used in marker-assisted selection [30]. However, KASP markers suitable for assisting large-scale interspecific introgression in Brassicaceae species have not been developed. In addition, the genetics of fragments introgressed from one of the contributor diploid species to the allotetraploid species is largely unknown.
In this study, we describe the development of new B. juncea germplasm by introgressing the B. rapa genome (B. juncea introgression lines) to broaden the genetic variation of heading B. juncea. We developed a set of 132 polymorphic SNP markers that were used to distinguish the A genome of B. juncea from that of B. rapa based on resequencing data of the parents. We demonstrate B. juncea introgression lines carrying chromosome segments of B. rapa using newly developed KASP markers. The introgression breeding approach increased the morphological variation of the leafy head in B. juncea. Based on the genotyping results of KASP markers, the genomic polymorphisms of the B. juncea introgression lines were further evaluated, providing new insights into the diversity assessment of the new germplasm.

Generation of Brassica juncea introgression lines
The initial interspecific crossing between B. rapa and B. juncea required embryo rescue to generate F 1 hybrids. The authenticity of the hybrid combination (AAB) was verified by the low number of seeds per silique. Six true F 1 plants were backcrossed to the recipient parents to generate a BC 1 population. The surviving BC 1 plants were self-pollinated and a total of 101 lines were produced (Table 1). Based on the phenotype and growth status of the derived lines in the field, we selected 107 plants for subsequent analysis.

Morphological variation was expanded in the Brassica juncea introgression lines
We divided the B. juncea parents into three categories according to the shape of the leafy head. The leafy head of the heading category was approximately spherical. The other two categories were both semi-heading, and the petiole of semi-heading II was wider and thicker than semi-heading I. The progenies of heading B. juncea that had been introduced into the B. rapa genome retained the leafy head. In the progenies of semi-heading I, the formation of the leafy head has been improved, which was evident as the heading leaves around the shoot apexes continued to fold upward and inward to form a more compact head. The leaf shapes of the progenies generated by semi-heading II B. juncea changed more profoundly than those of leafy heads. Compared with the B. juncea accessions, the B. juncea introgression lines exhibited a wide range of phenotypic variations and showed similar characteristics to B. rapa (Fig. 1).

Detection of B genome in Brassica juncea introgression lines
To detect B chromosomes in the introgression lines, a strategy of screening B-genome markers was adopted. SNPs were detected by mapping clean data of B. rapa    Fig. S1). These eight KASP markers were then used to genotype all the B. juncea introgression lines, showing that all of them could amplify the fragments from both the A genome and the B genome. This result indicated that the eight B chromosomes very likely existed in the introgression lines (Fig. 2, Supplementary Fig. S2).

Development of a set of SNP markers evenly distributed on 10 chromosomes of a genome
To distinguish the A genome of B. juncea (BjuA) from the A genome of B. rapa (BraA), SNP markers were developed. The clean reads of the parental lines were aligned to the B. rapa reference genome. A total of 186 484 SNP loci were obtained, of which 17 368 specific SNPs met the selection criteria of being homozygous and polymorphic between BjuA and BraA. For evaluation of the introgression lines, 203 SNPs were selected from 3121 loci with no variation in the flanking region; they were evenly distributed across 10 chromosomes of the A genome. A total of 132 SNP markers showed expected genotypes between B9008 and the B. juncea parents. These  Figure 5. Schematic diagram of determination of genome-wide SNPs genotype based on marker genotyping results. Red, gray, and blue squares represent genotyping results of the markers, which were A r , missing, and A j . Circles represent the SNPs of the parents used for substitution. markers also met the requirement that they could not be amplified in the B. nigra genome (KASP primer sequences are shown in Supplementary Table S2). The number of specific SNP markers on each chromosome ranged from 10 on A04, A05, and A08 to 17 on A03. The genome-wide average density of specific SNPs was 2.142 Mb, with a range of 1.481 Mb on A10 to 2.958 Mb on A05. Since physical distance could not fully ref lect the genetic recombination rate, these SNP markers were further anchored to the genetic linkage map of B. rapa. These SNP markers were first determined based on their corresponding binmarkers on the linkage map, whereupon the genetic position of the binmarker was used to represent the genetic position of SNPs (Fig. 3). Using this method, we evaluated the genetic distribution of the 132 SNP markers. The genome-wide average genetic interval was 6.72 cM, with a range of 4.11 cM on A01 to 10.66 cM on A05 (Table 2). In general, the distribution of 132 SNP markers among the chromosomes was reasonable, although there were several large gaps (>20 cM), including gaps located on both ends of A01, one on A9, and one on A10.

Brassica rapa genome segments tended be retained in introgression lines
In total, a set of 132 polymorphic SNP markers was genotyped to evaluate the introgression of B. rapa in 107 B. juncea introgression lines (Fig. 4a). The genotyping results were recorded as the following four types: genotypes consistent with B. rapa and B. juncea [A (A r A r , orange in Fig. 4) and B (A j A j , blue), respectively], heterozygous genotype [H (A r A j , green)], and the missing genotype (gray).
A total of 14 124 loci were genotyped, among which 1472 were A r A r , 5440 were A j A j , 6890 were A r A j , and 322 were missing.    indicating that the progenies tended to retain the B. rapa genome in the process of distant hybridization of B. juncea and B. rapa, and existed in the form of heterozygosity.
A violin plot was constructed to show the proportions of three genotypes in 107 B. juncea introgression lines (Fig. 4b). The proportion range of genotypes A, B, and H were 0 to 29.55%, 6.06 to 70.45%, and 13.64 to 89.39%, respectively. The loci showing the A r genotype (A r A r + A r A j ) were identified as the segments replaced by the B. rapa genome. To determine whether the rate of introgression of BraA was affected by B. juncea parental lines, the proportions of A r genotype (A r A r + A r A j ) in progenies from three B. juncea categories were compared. The average proportions of A genotype corresponding to the progenies obtained from heading, semi-heading I, and semi-heading II were 57.31, 59.57, and 60.34%, respectively. There was no obvious difference in the B. rapa segment retaining rate between heading and semiheading B. juncea parental lines. On average, 59.2% of the A genome segments in the B. juncea introgression lines were covered by the B. rapa genome.
The theoretical A r genotype frequency was 25% in the BC 1 S 1 population. The actual A r genotype frequency was analyzed by the following calculation formula: A r genotype frequency = A × 2 + H/(A + B + H) × 2. The A r genotype frequency of 129 out of 132 markers was greater than the theoretical value, indicating that 97.73% of the markers were positively selected. In contrast, the actual A r genotype frequencies of the three remaining markers were 9% for A01_364, 1% for A04_22, and 11% for A08_215 (Fig. 4d). As shown by graphical genotyping of the progenies, a large number of B. juncea segments (A j A j ) appeared in these three positions (Fig. 4a), suggesting that these B. juncea genome regions were not readily replaced by B. rapa segments.

Genetic diversity in Brassica juncea introgression lines compared with traditional accessions
To evaluate the genetic diversity of the B. juncea introgression lines, phylogenetic analysis was performed at the genome-wide level using a collection of representative B. juncea and B. rapa accessions. Each of the 10 or ten accessions were selected from each of heading mustard, potherb mustard (B. juncea var. multiceps), root mustard (B. juncea var. megarrhiza), and Chinese cabbage. These 40 accessions were resequenced for SNP calling.
Genome-wide SNPs were obtained from the alignment of clean reads of all parents and 40 accessions with the B. rapa reference genome. We divided the reference genome into 2000 regions with an interval of ∼148 kb. Overall, 1642 common SNPs from chromosomes A01 to A10 with homozygous genotype were identified in all parents and 40 accessions. The missing genotype of each B. juncea introgression line was filled according to its adjacent locus genotype, and the genotyping data of 132 markers of 107 B. juncea introgression lines was then replaced by 1642 SNPs of their parents (Fig. 5).
A dendrogram of 1642 SNPs of 154 investigated lines showed that all B. juncea introgression lines could be divided into three major types (introgression types I, II, and III), and each type was associated with its B. juncea parents. There was almost no diversification between heading mustard accessions (Hm), and the genetic diversity was much narrower than that of potherb mustard (Pm) and root mustard (Rm) accessions. The phylogenetic tree revealed that the B. juncea introgression lines were separated from the heading B. juncea accessions and exhibited rich genetic diversity (Fig. 6).
The population structure of these investigated materials was analyzed, and the minimum value of the coefficient of variation (CV) was obtained at K = 6 (Supplementary Fig. S3). At K = 6, the B. juncea introgression lines were classified into three groups, consistent with the result of the phylogenetic tree (Fig. 6).
For further comparison of the heading mustard accessions and the B. juncea introgression lines, we calculated the genetic distance. The maximum genetic distance detected in heading mustard accessions (Hm) was only 0.03. However, the maximum genetic distances of the progenies of semi-heading I B. juncea (introgression_type I), heading B. juncea (introgression_type II), and semiheading II B. juncea (introgression_type III) were 0.33, 0.31, and 0.30, respectively. Compared with heading mustard accessions, the maximum genetic distance of the introgression lines was significantly increased (Fig. 7, Supplementary Tables S3 and S4).

Discussion
We report a breeding approach for B. juncea through the introgression of one of its diploid contributor species, B. rapa. The genotyping of B. juncea introgression lines indicated large-scale segmental introgressions involving B. rapa genome segments substituting for the A genome of B. juncea. The B. juncea introgression lines exhibited expanded morphological variation and rich genetic diversity, proving that B. rapa has great potential for increasing the genetic diversity of B. juncea. The new B. juncea germplasm with known B. rapa genetic composition is a valuable resource that could be used to improve B. juncea varieties.
We developed a set of 132 specific SNP markers that could distinguish B. rapa and B. juncea parents, and these were evenly distributed across all 10 chromosomes. Some chromosomes, such as A01, which has gaps at both ends, would cause a deviation in the assessment of the introgression rate, but in general the evaluation by SNP markers could reflect the segmental introgression ratio in these introgression lines. Our work demonstrated that the analysis of BC 1 S 1 segregating families with specific SNP markers was an effective and high-throughput solution to trace the introgression segments. Different proportions of B. rapa segments were detected regardless of categories of recipient parents. It has been suggested that the A genome of B. rapa and B. juncea showed visible differentiation [31]. Therefore, this set of polymorphic SNP markers might be universal and beneficial for future relevant research.
We observed strong selection in the genomic region for the B. juncea parent genotypes. Three markers (A01_364, A04_22, A08_215) were significantly different from the other markers, and their actual genotype frequencies were much lower than the theoretical value of 25%. This may be caused by telomeres on chromosomes. During the generation of the BC 1 S 1 , we collected seeds from plants that produced more seeds without artificial pollination. This is obviously strong selection for self-compatibility. We speculated that the skewed ratios of these markers might be associated with self-compatibility selection. The reported Al S-locus on chromosome 4 of Arabidopsis thaliana has three collinear fragments in B. rapa, which were located at A01, A03, and A08 in subgenomes LF, MF1, and MF2 [32]. However, variation in these markers was not seen among B. juncea and B. rapa parents, indicating the possibility of other unknown SI (self-incompatibility) genes, but we cannot exclude other factors that may be involved.
We used three different heading types of B. juncea accessions as recipient parents, and we expected the B. juncea introgression lines to be classified into three categories related to the recipients. Grouping based on the KASP markers allowed distinct separation of the B. rapa and B. juncea accessions, while the B. juncea introgression lines were in between these two parental species. The B. juncea introgression lines from three types of parents intersected each other and showed no group corresponding to their recipient parents ( Supplementary  Fig. S4). These results indicated that our rule to ensure the universality of the markers in distinguishing B. rapa from B. juncea decreased their power in resolving the variation within the two species. To properly assess the diversity of the derived progenies, we used a set of 1642 genome-wide SNPs. Their genotypes in the introgression lines were obtained based on the sources of the fragments determined by the KASP markers and the parental genotypes of the corresponding fragments.  Fig. S5). Our strategy for evaluating genome-wide diversity avoided the need to sequence all used lines by referring to the genotype of flanking KASP markers. This strategy considerably decreased the expense of whole-genome diversity evaluation.
Several studies have noted the importance of creating more variability and broadening the genetic base of B. juncea by making crosses between genetically diverse parents [33][34][35]. However, to our knowledge, no other report has adopted the strategy of introducing genetic variations from their diploid contributor species. The present study created a set of novel B. juncea germplasms and systematically evaluated the expanded diversity at the whole-genome level. Using a set of genome-wide SNPs, we demonstrated that the genetic distance of the heading B. juncea was enhanced greatly from 0.03 to an estimated maximum of 0.33. Corresponding to increased genetic diversity, we also observed obviously increased phenotypical variations among the introgression lines. This introgression strategy can be extended to the improvement of other Brassica allotetraploid species.

Plant materials
Brassica rapa B9008 is a line that has been assembled de novo and exhibits desirable characteristics such as selfcompatibility, high seed-setting rate, early f lowering, and a compact leafy head [36]. The BC 1 S 1 population of the B. juncea introgression lines was derived from crosses between B. rapa B9008 and six accessions of heading B. juncea (three accessions of B. juncea were semi-heading; Table 3). All parent materials were provided by the Molecular Genetics Group, Institute of Vegetables and Flowers (IVF), and the Chinese Academy of Agricultural Sciences (CAAS).
Brassica juncea was used as the female parent, and B9008 was used as the male parent to produce the F 1 generation through embryo culture [37]. The F 1 hybrid plants were backcrossed with the recipient parents to generate the BC 1 population. The plants in the backcrossed population were further self-pollinated and harvested to generate B. juncea introgression lines (Fig. 8). In August 2020, 10 seedlings per line were planted in the experimental field in Beijing. The plants were selected for the experiments according to agronomic traits such as the leafy head and resistance.

Morphological observation
The leafy head of the B. juncea introgression lines and parent materials was photographed 3 months after transplanting. The outer leaves of the plants were removed for observation of the heading characteristics.

DNA extraction and sequencing
Total genomic DNA was isolated from young leaves following the modified CTAB method [38]. The DNA samples for genotyping were diluted to a final concentration of 50 ng/μl. DNA from seven parental lines was sequenced using a BGISEQ-500 sequencer and the Illumina NovaSeq 6000 platform with paired ends. We used fastp [39] to filter low-quality reads, reads with >10% undetected bases, and reads with adapters to obtain high-quality reads (clean data).

SNP calling for KASP marker design
SNP calling was conducted by aligning the clean data of each parent material to the B. rapa reference genome 'Chiifu', the B. nigra reference genome, and the B. juncea reference genome (www.brassicadb.cn) with the GATK software toolkit [40,41].
The A genome of B. rapa and that of B. juncea were named BraA and BjuA, respectively, and the B genome of B. juncea was named BjuB. The acquisition of polymorphic SNPs between the A genome (BraA and BjuA) and BjuB was based on the SNP set uniform in BraA and BjuA of the B. juncea reference genome, but polymorphic between BraA and BjuB of the B. juncea reference genome, while the B. juncea parental lines were not polymorphic in either the A or the B genome of the B. juncea reference genome.
The selected 'BraA/BjuA' allele-specific SNPs from custom-written Python scripts had the following criteria: (i) homozygous for both parents; (ii) no polymorphism between BraA and the B. rapa reference genome; (iii) polymorphic between BjuA and BraA; (iv) uniform in B. juncea parental lines; (v) no sequence variation within the 100-bp f lanking region; (vi) no homologous region in the B. nigra genome; and (vii) relatively even distribution on chromosomes.
The f lanking sequence of each selected SNP was further used to search for the unique hit to the reference genomes of B. rapa and B. juncea. Forward primers (A1/A2) and common reverse primers (C) were designed according to the manufacturer's instructions (LGC Genomics) using Primer3 software.

SNP genotyping through KASP
The KASP genotyping assay was used to detect specific SNPs. KASP amplifications were performed in a 384-well format using the Scientific QuantStudio™ 12K Flex Real-Time PCR System (Thermo Fisher Scientific, USA). A final reaction volume of 5 μl was used and contained 2.5 μl 2 × KASP V4.0 Master Mix (LGC Genomics), 2.5 μl DNA template, and 0.07 μl KASP assay mix. Non-template controls were included in each SNP reaction plate. The PCR program was performed as described by LGC. The temperature of the f luorescence measurement was 30 • C and the time was 1 min. The genotype clusters were viewed with SNPviewer software (LGC Genomics, https:// www.biosearchtech.com/support/tools/genotyping-so ftware/snpviewer).

Genetic analysis
For genetic map construction, the raw data of Chinese cabbage DH lines were aligned to the B. rapa reference genome for calling SNPs by BWA and SAMtools [42][43][44][45]. Recombination bins were constructed as genetic markers and imported into JoinMap 4.0 software, and the physical distance of the binmarkers against the genetic distance was obtained. The genetic map and graphical genotyping analysis were performed using R software. To observe the phylogenetic relationship of the B. juncea introgression lines, the phylogenetic tree was constructed by applying the neighbor-joining method and the Interactive Tree of Life (iTOL) [46]. The population structure was further inferred with ADMIXTURE. The number of subpopulations analyzed ranged from K = 1 to K = 7 [47].