Long-read assembly of the Brassica napus reference genome Darmor-bzh

Abstract Background The combination of long reads and long-range information to produce genome assemblies is now accepted as a common standard. This strategy not only allows access to the gene catalogue of a given species but also reveals the architecture and organization of chromosomes, including complex regions such as telomeres and centromeres. The Brassica genus is not exempt, and many assemblies based on long reads are now available. The reference genome for Brassica napus, Darmor-bzh, which was published in 2014, was produced using short reads and its contiguity was extremely low compared with current assemblies of the Brassica genus. Findings Herein, we report the new long-read assembly of Darmor-bzh genome (Brassica napus) generated by combining long-read sequencing data and optical and genetic maps. Using the PromethION device and 6 flowcells, we generated ∼16 million long reads representing 93× coverage and, more importantly, 6× with reads longer than 100 kb. This ultralong-read dataset allows us to generate one of the most contiguous and complete assemblies of a Brassica genome to date (contig N50 > 10 Mb). In addition, we exploited all the advantages of the nanopore technology to detect modified bases and sequence transcriptomic data using direct RNA to annotate the genome and focus on resistance genes. Conclusion Using these cutting-edge technologies, and in particular by relying on all the advantages of the nanopore technology, we provide the most contiguous Brassica napus assembly, a resource that will be valuable to the Brassica community for crop improvement and will facilitate the rapid selection of agronomically important traits.

Recently, Pacific Biosciences (PACBIO) and Oxford Nanopore (ONT) sequencing technologies were commercialized with the promise to sequence long DNA fragments (kilobases to megabases order). The introduction of these long-read technologies increased significantly the quality in terms of contiguity and completeness of genome assemblies [1][2][3] .
Although it is possible to assemble chromosomes into a single contig in the case of simple genomes [4] , the complexity of plant genomes is such that these methods are still insufficient to obtain the chromosome architecture. The resulting contigs need to be ordered and oriented with long-range information based on chromatin interactions or optical maps. This combined strategy is now a common standard. To date, several genomes of the Brassica genus have been sequenced using long-reads and brought to the chromosome scale. The two first ones, B. rapa Z1 and B. oleracea HDEM, were published in 2018 and generated using ONT reads and optical maps [2,4] . At the beginning of 2020, two genomes of B. rapa (Chiifu) and B. oleracea (D134) have been released and sequenced using PACBIO long reads [5,6] . A first long-reads assembly based on ONT of the other diploid species, B. nigra NI100 is available since the beginning of 2020 [7] . The availability of the diploid genomes first may be explained by their smaller genome size (<600Mb) compared to their tetraploid derivatives. The first genome of a tetraploid plant has been published in 2014 [8] and was sequenced using a short-reads strategy (referred here as B. napus Darmor-bzh v5) , and although being a great resource for the scientific community, this genome remains very fragmented and incomplete. In 2017, the pseudomolecules of Darmor-bzh were improved [9] using genotyping by sequencing (GBS) data (referred here as Darmor-bzh v8) , and a new gene prediction was made. Unfortunately, this annotation was not used by the Brassica community as it was considered incomplete [10] . As an illustration, only 89.6% of the core brassicale genes were found in the annotation compared to the 97.7% of the previous version (Table 1). In the first months of 2020, nine B.
napus genomes based on PACBIO sequencing were published [11,12] , and three of them were organized using long-range information. These genomes have a better continuity than the current reference (Darmor-bzh), but interestingly their N50 at the contig level is lower than the Brassica genomes that have been generated using ONT sequencing (Tables 2 and   3). Here, we report the genome sequence of the B. napus (NCBI:txid3708) reference Darmor-bzh (referred here as Darmor-bzh v10) , produced using ONT and Illumina reads supplemented by optical and genetic maps. In addition to chromosomes architecture, genes were predicted by sequencing native RNA molecules using the ONT PromethION with a focus on resistance genes. The contiguity and gene completion of this new genome assembly is among the highest in the Brassica genus (Tables 3 and S9). After the Illumina sequencing, an in-house quality control process was applied to the reads that passed the Illumina quality filters as described in [14] . These trimming and removal steps were achieved using Fastxtend tools [15] . This processing resulted in high-quality data and improvement of the subsequent analyses (Table S1).

Genome size estimation
The Brassica napus genome size was estimated using flow cytometry in a publication from Johnston et al. [22] that reported an haploid genome size of 1,132Mb. In addition, we launched Genomescope (GenomeScope, RRID:SCR_017014) version 1 using Illumina reads and a kmer value of 31, and obtained an estimated genome size of 862Mb ( Figure S1) and a very low heterozygosity rate (0.0526%). The results are highly discordant, however the genotype used for the flow cytometry experiment is not known. In addition, we observed a high divergence in terms of assembly size between the ten B. napus assemblies (Table 3) and a significant variation in genome sizes among different accessions of B. rapa has been recently described [23] .
Then, we selected the one of the assemblies based not only on contiguity metrics such as N50 but also cumulative size. The Flye (longest reads) and SMARTdenovo (all reads) assemblies were very similar in terms of contiguity (N50 and N90), but we decided to keep the Flye assembly as its cumulative size was higher. The Flye assembler using the longest reads resulted in a contig N50 of 10.0Mb and a cumulative size of 937.9Mb.

Long range genome assembly
The genome map assemblies of B. napus Darmor-bzh have been generated using the Bionano Solve Pipeline version 3.3 and the Bionano Access version 1.3 (Tables S4-S6). The assemblies were performed using the parameters "non haplotype without extend and split" and "add Pre-assembly". This parameter allows to obtain a rough assembly used as reference for a second assembly. We filtered out molecules smaller than 180Kb and molecules with less than nine labeling sites (Tables S4 and S5). The Bionano scaffolding workflow was launched with the nanopore contigs and the two Bionano maps (DLE and BspQI). As already reported [2] , we found in several cases that the nanopore contigs were overlapping (based on the optical map) and this overlap was not managed by the hybrid scaffolding procedure. We corrected these negative gaps using the BiSCoT software [30] with default parameters (Table S7 and Figure S3).

Validation and anchoring of the B. napus Darmor-bzh assembly
A total of 39,495 markers deriving from the Illumina 8K, 20K and 60K arrays were genetically mapped on the integrated map, totalling 2,777.7 cM. The sequence contexts of all SNP markers that were genetically mapped were blasted against our B. napus Darmor-bzh assembly in order to validate the quality of our assembly and to help ordering and orientating the scaffolds. Of these 39,495 genetically mapped markers, 36,391 were physically anchored on the final Darmor-bzh assembly ( Figure 1 and Table S8). The genetic and physical positions were discordant for only 618 markers (0.02%) due to an inaccurate position on the genetic map (variation of a few centimorgans in almost all cases).  [32] . Nucmer was used with default parameters, followed by delta-filter with the`-i 98 -l 1000`parameters. Using the output of nucmer and delta-filter described above, we subsequently identified Large Misassembled Inverted (LMI) regions between the v5 and v10 genome assemblies. We used show-coords from Mummer with the following parameters`-H -d -r -T`, followed by a specific method to extract LMIs. Since the show-coords output consists in many small alignments from which we need to infer LMIs, we used an iterative process based on merging the coordinates of small consecutive inversions in larger ones, followed by filtering out the small inversions. The parameters were validated by visually and manually checking the resulting LMIs coordinates against the chromosome by chromosome alignment scatter plots ( Figure 2). More precisely, we extracted LMIs from the show-coords output, followed by`bedtools merge -d 100000`using bedtools (BEDTools, RRID:SCR_006646, version 2.29.2) [33] to merge the coordinates of inversions separated by 100Kb or less, then we selected inversions of size superior to 100Kb, subsequently merged the inversions separated by 300Kb with bedtools, selected inversions larger than 700Kb, and finally merged inversions separated by 1Mb with bedtools.

Detection of modified bases
The nanopore technology has the advantage of reading native DNA molecules and can distinguish modified from natural bases because they affect the electrical current flowing through the pore differently [34] . The nanopore reads were mapped using minimap2 [35] (Minimap2, RRID:SCR_018550, version 2.17-r941 with the '-x map-ont' parameter). We  (Table S11).

Transposable elements annotation
Transposable elements (TE) were annotated using RepeatMasker (RepeatMasker, RRID:SCR_012954, version 4.1.0 with default parameters) [38] and transposable element libraries from [8] . We masked nearly 54% of the genome and LTR Copia and Gypsy elements are the most abundant (15.5% and 13.0% respectively). The three Darmor-bzh assembly releases and the fourteen other Brassica genome assemblies were masked using the same procedure (Table 5).

Gene prediction
Gene prediction was performed using several proteomes: eight from other genotype of B.
napus [11] (Westar, Zs11, QuintaA, Zheyou73, N02127, GanganF73, Tapidor3 and Shengli3), Arabidopsis thaliana (UP000006548), the B. napus pan-annotation [39] , resistance gene analogs (RGA) from Brassica [40] and the 2014 annotation of Darmor-bzh [8] . Low complexity in genomic sequences were masked with the DustMasker algorithms (version 1.0.0 from the blast 2.10.0 package) [37] . Proteomes were then aligned on the genome in a two-steps strategy. First, BLAT [41] (BLAT, RRID:SCR_011919, version 36 with default parameter) was used to fastly localize corresponding putative regions of these proteins on the genome. The best match and the matches with a score greater than or equal to 90% of the best match score have been retained. Second, alignments were refined using Genewise [42] (GeneWise, RRID:SCR_015054, version 2.2.0 default parameters), which is more accurate for detecting intron/exon boundaries. Alignments were kept if more than 75% of the length of the protein is aligned on the genome.
In addition, we used the error-corrected direct RNA reads produced using the PromethION device. As already reported [43] , the detection of splicing sites using raw reads is difficult.
For example, on a subset of 1,000 reads, the proportion of GT-AG splicing sites is 6% lower in the raw reads compared to the corrected reads ( were used as input for the gene prediction. All the transcriptomic and proteins alignments were combined using Gmove (Gmove, RRID:SCR_019132) [45] which is an easy-to-use predictor with no need of pre-calibration  Figure S3).

Correspondence between the genes of the v5 and the v10 assemblies
We calculated a correspondence table between the predicted genes on the v5 and the v10 assemblies. To that purpose, we aligned all the v5 proteins against all the v10 proteins using diamond (DIAMOND, RRID:SCR_016071, version 0.9.24 with the following parameters: "--more-sensitive -e 0.00001"). The best reciprocal hits were selected if they came from the same chromosome in both assemblies, and used as anchors. We then enriched the anchored genes using synteny and by filtering hits based on percent identity (>80%) and sequence coverage (>80% of the target or the query). Using this strategy (M1), we were able to find a correspondence for 73,560 of the 101,040 genes of the v5 assembly (72.8%). Due to the importance of such information for the Brassica community, we decided to use a second method (M2) to increase the number of genes with a correspondence. For this latter method, we first performed a reciprocal BlastP (NCBI BLAST, RRID:SCR_004870, blastp v2.9.0 with the following parameters "-evalue 1e-20") between the v5 and v10 proteins, then only conserved the blast hits with a minimum percentage of identity of 95%, an alignment length of 50%, and if they were on the same chromosome (scaffolds accepted). Using this second method, we identified about the same number of genes with a correspondence (72,314 v5 proteins). When comparing the results from these two methods, we observed that 85% were similar to both methods (62,514), about 14% were found in only one method (11,032 and 9,800 specifics to the first and second method respectively) and 0.6% were discordant (450). In this paper, we decided to provide the most complete correspondence

Localization and annotation of pericentromeric regions
We

Annotation of the Resistance Genes Analogs
We identified the Resistance Genes Analogs (RGA) in Darmor-bzh v5 and v10 gene annotations using RGAugury (commit 57d58f887ad8c70e5d4c619a3d5e207158822819) [49] and subdivided into eight different categories [51] . Finally, all the putative RGA proteins identified in Darmor-bzh v5 were blasted against those identified in this new assembly in order to establish a correspondence between these two versions, but also to identify newly annotated RGA genes in Darmor-bzh v10.

Comparison with existing assemblies and annotations
The two published releases of Darmor-bzh [8,9] were generated using 454 and Illumina reads and have a low contiguity (Table 1). These fragmented assemblies were difficult to organize at the chromosome level, and as a result only 553Mb and 690Mb of sequences were respectively anchored on the 19 chromosomes. In comparison, the 19 chromosomes of the ONT assembly contain 849Mb. The gene completion (BUSCO score) of the first release and of this one is similar (97.7% and 98.6%), showing that long-read assemblies mainly impact the repetitive compartment of the genome. However, 98.8% of the genes are now placed on their respective chromosomes rather than on unplaced scaffolds, in comparison only 80% of the predicted genes were located on pseudomolecules in Darmor-bzh v5 ( Figure 3B). These improvements will greatly help the Brassica community to identify the genes underlying agronomic traits of interest found using quantitative genetics.
We compared the short-and long-read assemblies and observed large newly assembled regions comprising centromeric sequences (Figure 2, highlighted in orange on chromosome A06). By aligning the flanking markers of B. napus centromeres [47] , we were able to identify the positions of the approximative pericentromeric regions in Darmor-bzh v5 and the present Darmor-bzh v10 assembly (Table S14 and Figure 1). Thereafter, we compared the lengths and gene contents of these regions between Darmor-bzh v5 and v10 ( Figure 3A). On average, we observed an increase of sequence assembly in pericentromeric regions by 80-fold, with some extreme cases like the A06 pericentromere that is 1,180 times larger in  (Table S15).
All these analyses highlight the improvements of the Darmor-bzh v10 genome assembly thanks to the use of ONT long-reads sequencing. Overall, this high-quality genome assembly of B. napus Darmor-bzh will be particularly useful to the Brassica community to decipher genes underlying agricultural traits of interest.

Comparison of available Brassica genome assemblies
We downloaded the 14 available Brassica long-read assemblies (and annotations) and computed usual metrics. For each assembly, contigs were generated by splitting sequences at each gap and the gene completion was evaluated on predicted genes using BUSCO [46] and the conserved genes from the Brassicales database (N=4,596 genes). Not surprisingly the gene content is high, between 97.9% and 98.8% for all the ten B. napus assemblies (Tables 3 and S9), and is lower for the diploid species (due to the presence of a single-genome), except for B. rapa Chiifu that has a surprising number of genes (>60k).
Likewise, the repetitive content of these ten genomes is similar, between 54% for Darmor-bzh and 60% for no2127. Concerning the diploid genomes and as already reported [52] , B. oleracea (C) genome assemblies contain more repetitive elements than the B. rapa (A) and B. nigra (B) genome assemblies ( Table 5).
The main observed differences are the contig length that affects the number of gaps in each assembly (from 268 gaps for Darmor-bzh to 5,460 gaps for Zs11) and the number of anchored bases (from 765 Mb for Express617 to 961 Mb for Zs11). We further investigated these two differences that we believe to be related to the technologies used.
We observed a significant difference in contiguity when comparing ONT and PACBIO assemblies. The eleven Brassica genomes sequenced using PACBIO have a contig N50 between 1.4Mb and 3.6Mb whereas the four ONT assemblies have a contig N50 higher than 5.5Mb. The ability of the nanopore technology to sequence large fragments of DNA appears to be an advantage for assembling complex genomes. In the case of Darmor-bzh, we were able to obtain more than 50,000 reads longer than 100Kb (representing 6X of coverage).
This dataset allowed us to generate a contiguous assembly with only 268 gaps. As a comparison, we found 5,460 gaps in the PACBIO assembly of the Zs11 genotype. This difference is observed in all the A and C genomes and subgenomes that have been sequenced using long reads, and may be directly related to the longer length of ONT reads ( Figure 4). Although the number of gaps is higher in the PACBIO assemblies, the total number of N's is lower at least for the assemblies that have been organized with Hi-C data.
Indeed, the Hi-C pipelines generally order contigs and add a fixed gap size between two contigs. We investigated the 500bp gaps in the Zs11 assembly and aligned their flanking regions on the Darmor-bzh assembly using blat (version 36 with default parameters) [41] .
We applied stringent criteria (alignment on the same chromosome, score > 4000, alignment of at least 1 Kb of each of the flanking regions and alignment covering less than 100Kb on the Darmor-bzh assembly) and we found a location for 367 of the 5,460 gaps of Zs11. These 367 regions covered more than 19Mb of the Darmor-bzh assembly with an average size of 52 Kb. Even if these regions could be different between the two genotypes, we examined their content in transposable elements and found a high proportion of bases annotated as repetitive elements (82.9%) and a different distribution of the classes of elements compared to the whole genome (Table S13). We observed a higher proportion of Copia elements, but especially LINE and Satellite elements. For example, we observed large LINE elements (>50Kb) as shown in Figure 5, where ultralong nanopore reads covered this element and avoided a contig breakage in this area.
Interestingly, A chromosomes seem more difficult to assemble (Figure 4). Even if A genome is smaller than C genome and contains less transposable elements (Table 5), the number of contigs (and consequently the number of gaps) in A genome, or subgenome, is twice the number of contigs in C genome, or subgenome (gap track on Figure 1 and Figure 4).
Chromosomes A05, A06 and A09 appear to be the most challenging to assemble due to the high content of transposable elements in their centromeric regions. They alone contain 43% of the gaps (and 51% of the undetermined bases) present on the 19 chromosomes.
However, these chromosomes have been shown to be highly variable in length across B.
The second difference observed between all these assemblies is the proportion of anchored sequence which is higher for the genome assembly that has been organized using Hi-C data (95.0%, Zs11) although it is the most fragmented (Table S10). Indeed, the organization of small contigs is complicated using optical or genetic maps because it requires a sufficient number of restriction sites or markers. This aspect probably explains the small proportion of the Express617 assembly (82.7%) that has been anchored on the 19 chromosomes. The other genomes have been organized using comparative genomics (synteny with the Zs11 assembly). Although it is a convenient method, it may generate false organization and the proportion of anchored bases is variable, from 89.1% to 92.7%, depending on the conservation between the two genomes. We think that if the assembly is highly contiguous, optical maps have the important advantage of estimating the size of gaps, which remains a limitation when using Hi-C data.

Sequencing of native RNA molecules
The sequencing of RNA is traditionally performed using the Illumina technology where the RNA molecules are isolated and then reverse transcribed (RT) to cDNA that is more stable and allows the amplification prior to the sequencing. The Oxford Nanopore technology is the first to propose the sequencing of native RNA molecules without the RT step. One of the main advantages is a better quantification of gene expression when compared to methods that require a RT [43] . Moreover the sequencing of native RNA molecules preserves modified bases and theoretically allows to detect them [53,54] . Here, we sequenced a mix of leaf and root samples using the PromethION device and generated 10,416,515 reads with an average size of 559 bp. This dataset was corrected using TALC as described previously and corrected reads were used in the gene prediction workflow (Table S11).

Alternative splicing events
Independently, we detected alternative splicing (AS) events, as skipping exon and intron retentions. Aligning noisy reads makes it difficult to accurately detect splice sites and therefore 5' and 3' alternative splice sites. Raw reads were aligned on the genome in a two-steps strategy. First, BLAT (version 36 with default paramet ers) [41] was used to fastly localize corresponding putative regions of these RNA reads on the genome. The best match for each read was selected and a second alignment was performed using Est2Genome [43] (version 5.2 with default parameters). Alignments with an identity percent higher than 90% were retained and we used bedtools to extract intron retention and skipping exon. We searched for predicted introns that were entirely covered by an exon of a nanopore read.
Using this conservative approach, we were able to identify 30,397 events distributed across 18,204 genes (16.8%). In Chalhoub et al. [8] , they detected an intron retention in 29% of the 108,190 annotated genes. Using the same approach and a coverage of at least 10 Illumina reads, we found intron retention in 21.2% of the genes. This proportion increased to 24.4% and 30.3% if lower coverage was used (8 and 5 respectively). We compared the intron retention predicted by Nanopore and Illumina reads and found only 56% in common and therefore 44% of the events are specific to Nanopore reads. However, using lower coverage for prediction with Illumina data increased the proportion of common events (62% with coverage >8 and 73% with coverage >5) indicating that Nanopore reads can also detect rare events. However, the smaller number of events detected using the Nanopore reads may indicate that sequencing 10M reads is not sufficient and there is a need to sequence more deeply the RNA sample. In addition, we detected skipping exon (reads with one exon that is entirely covered by an intron, smaller than 3Kb to avoid mapping errors) in 3% of the 108,190 annotated genes. For example, by inspecting mutually exclusive exons, we were able to find an already described event that is conserved between Brassica and Arabidopsis thaliana [55] (Figure 6). It is obvious that the sequencing of RNA using long-reads will allow to detect co-occurrence of splicing events that is difficult if not impossible to do with short-reads ( Figure S4).

Comparison of the Resistance Genes Analogs catalogs
Using the RGAugury pipeline [47], we annotated 2,788 and 2,952 Resistance Genes Analogs (RGA) in the gene catalogs of Darmor-bzh v5 and v10 respectively (Tables S16 and S17  Table 4 and Figure 7A). However, we observed that 108 RGAs annotation shifted from one RGA class to another between the two versions due to the identification of additional domains in these proteins ( Figure 7A) As pericentromeric region assembly was greatly improved in Darmor-bzh v10, we compared the number of RGA in such regions between the two assemblies and found 74 RGAs in Darmor-bzh v10 compared to no RGAs in Darmor-bzh v5 ( Figure 7D). To further highlight the interest of assembling such regions using ONT technologies, we compared the size and number (including RGA) of a resistance QTL to phoma stem canker overlapping the A01 centromere [47,50]. In this new assembly, we were able to identify 45 candidate RGA compared to 29 in Darmor-bzh v5 ( Figure 7E). This finding opens new ways to understand the mechanism underlying this QTL.

Conclusions
In this study, we generated the most contiguous B. napus genome assembly thanks to the ONT long reads and Bionano optical maps. In addition, the ONT dataset allows the detection of modified bases, such as C5-methylcytosine (5mC), without any specific library preparation. We have shown that the generation of ultra-long reads is a game-changer for assembling complex regions, composed of transposable elements, common in plant genomes. Consequently, the combination of ultralong reads and optical maps is today a method of choice for generating assembly of complex genomes. Without forgetting that nanopore technology is in constant evolution, in particular base calling softwares, which makes it possible to improve an assembly starting from existing data. In addition, we have predicted genes in this new assembly using direct RNA sequencing and have used this original dataset to detect splicing events and show that this technology can be used to discover complex events, such as the co-occurence of events as mutually exclusive exons.
This improved version of the B. napus Darmor-bzh reference genome and annotation will be valuable for the Brassica community, which has now been working with the short-read version for almost six years, particularly to decipher genes underlying agricultural traits of interest.

Additional files
All the supporting data are included in six additional files which contain a) Tables S1-S13 and Figures S1-S4, b) Table S14, c) Table S15, d) Table S16, e) Table S17 and f) Table   S18.

Figure 2. Genome-wide alignments of B. napus Darmor-bzh v10 (Y axis) with
Darmor-bzh v5 and ZS11 genome assemblies (X axis). Each dot corresponds to syntenic regions of the genomes that are aligning with high confidence. Blue dots are corresponding to regions aligned in the correct orientation (+) while red dots represent regions aligned in an inverted orientation (-). (Peri)centromeric regions are displayed by black boxes on the X and Y axes. Some (peri)centromeres flanking markers were not found in Zs11 due to polymorphisms. The new assembly of a centromeric region in Darmor-bzh v10 compared to Darmor-bzh v5 is highlighted in orange for the A06 chromosome. An example of a large misassembled inverted region whose orientation has been corrected within the new Darmor-bzh v10 genome assembly is highlighted in green for the chromosome C07.  Each dot represents the number of gaps in a given chromosome and genome assembly. PACBIO assemblies are in blue and ONT assemblies in orange.

Figure 5. Example of a genomic region of B. napus
Darmor-bzh assembly that corresponds to a region that contain a gap in the Zs11 genome assembly. First track represents the GC content. ONT reads (longer than 100Kb) are in the second track (blue boxes), and a 115 Kb read that spanned the whole region is surrounded by a black box. Transposable elements are shown as black boxes, and the region absent from Zs11 contain a large LINE element (80,707 bp). The alignment of the Zs11 sequence (20Kb around the 500bp gap) is represented by red boxes, with thin red lines representing missing sequences in the Zs11 genome assembly that are present in Darmor-bzh v10 (gaps). Predicted genes are in the last track, in purple. Figure 6. Example of a splicing event detected using long reads. The exon in the third introns of the gene prediction (C03p53740.1_Bna_DAR, Gene prediction track) is mutually exclusive with the third exon. This alternative splice form is detected by a single nanopore read (blue track), and maintained by the TALC error-correction (red track).

Figure 7. Resistance Genes Analogs (RGA) annotation improvement in B. napus
Darmor-bzh v10 using long reads direct RNA sequencing. A. Conservation of the RGA categories between Darmor-bzh v5 and v10 genome annotations. Shifting RGAs are genes whose sequence is matching, but that have been annotated in a different RGA class by RGAugury between the v5 and v10 genome annotations. Such shifting RGAs only represent a small fraction of the annotated RGAs. B. Total number of RGA plotted against the ratio of shifting RGAs over total RGAs by RGA class. CN and NL RGAs are the categories whose annotation shifted the most between Darmor-bzh v5 and v10. A detail of the raw data used to compute the ratio is available on the right part of the plot. C. Detail of the RGA class shifts between Darmor-bzh v5 and v10 made using ggalluvial [56] . The size of the arcs is proportional to the number of shifting RGAs from one RGA class to another between the two annotations. D. Number of pericentromeric RGAs found in Darmor-bzh v10 compared to the v5 genome annotation. E. Genome browser snapshot using pyGenomeTracks [57] of the phoma canker resistance QTL (grey) in Darmor-bzh v5 (top) and Darmor-bzh v10 (bottom). RGA genes (blue) and pericentromeric regions (orange) are displayed.  Table 2. General information about the available Brassica long-reads and older Darmor-bzh assemblies.