New High-Quality Draft Genome of the Brown Rot Fungal Pathogen Monilinia fructicola

Abstract Brown rot is a worldwide fungal disease of stone and pome fruit that is caused by several Monilinia species. Among these, Monilinia fructicola can cause severe preharvest and postharvest losses, especially for stone fruit. Here, we present a high-quality draft genome assembly of M. fructicola Mfrc123 strain obtained using both Illumina and PacBio sequencing technologies. The genome assembly comprised 20 scaffolds, including 29 telomere sequences at both ends of 10 scaffolds, and at a single end of 9 scaffolds. The total length was 44.05 Mb, with a scaffold N50 of 2,592 kb. Annotation of the M. fructicola assembly identified a total of 12,118 genes and 13,749 proteins that were functionally annotated. This newly generated reference genome is expected to significantly contribute to comparative analysis of genome biology and evolution within Monilinia species.


Introduction
Monilinia fructicola (phylum Ascomycota, family Sclerotiniaceae) is one of most important causal agents of brown rot, which is one of the main diseases of pome and stone fruit. Brown rot can cause severe yield losses during both field production and postharvest processing (Mari et al. 2012;Karaca et al. 2014;Oliveira Lino et al. 2016;Abate, Pastore, et al. 2018). Several species of the Monilinia genus are responsible for this disease, including Monilinia laxa and Monilinia fructigena (Villarino et al. 2013;Rungjindamai et al. 2014), although M. fructicola is considered the most destructive of these pathogens on stone fruit. Indeed, M. fructicola is a quarantine pathogen in the European Union and is included in List A2 of the European and Mediterranean Plant Protection Organisation (https://www. eppo.int/ACTIVITIES/plant_quarantine/A2_list; last accessed October 1, 2019). Monilinia fructicola was introduced into Europe in 2001 (Lichou et al. 2002), and then rapidly spread and became prevalent over the former indigenous species M. laxa and M. fructigena (CABI 2018;Abate, Pastore, et al. 2018). This fungus overwinters on mummified fruit or in infected plant tissues. In spring, conidia are produced under moist conditions, with wind dispersal, and infection of blossoms, causing blossom blight. However, the most severe yield losses occur in the postharvest phase, during fruit storage and transport (Feliziani et al. 2013;Martini and Mari 2014).
The development of high-throughput sequencing technologies is quickly improving biological studies on fungal pathogens (Shendure et al. 2017). Whole genome knowledge provides a complete overview in terms of their potential virulence mechanisms, with the aim being to design improved disease-control approaches (Mö ller and Stukenbrock 2017). The availability of complete genomic resources can be very useful in the study of evolutionary dynamics and genetic adaptions to highly diverse environments (Mitchell-Olds et al. 2007). The availability of complete genomic data of M. fructicola gives the opportunity to the scientific community to perform studies aimed at exploring the population biology of the pathogen in more detail and its interactions with host plants (Dowling et al. 2019). This should be of help for an understanding of the reasons underlying the fitness of the pathogen and its rapid spread to new geographical areas. For these reasons, we have generated a high-quality draft genome and annotated protein-coding genes of M. fructicola that represents a great improvement over the draft genomes already available in public repositories, which are highly fragmented with no gene annotation (Rivera et al. 2018). A hybrid and hierarchical de novo assembly strategy was used to obtain the genome of the M. fructicola Mfrc123 strain through a combination of Illumina short reads next-generation sequencing, and Pacific Biosciences (PacBio) long reads thirdgeneration sequencing, as previously reported for de novo assembly of the M. fructigena genome (Landi et al. 2018).

Materials and Methods
Sample Collection, Library Construction, and Sequencing  (Abate, Pastore, et al. 2018). The strain was grown in potato dextrose broth (infusion from 200 g peeled and sliced potatoes kept for 1 h at 60 C, with 20 g dextrose, adjusted to pH 6.5, per liter) for 48 h at 25 6 1 C, in darkness and under shaking (150 rpm). The strain was characterized at both the phenotypic and molecular levels (Abate, De Miccolis Angelini, et al. 2018;Abate, Pastore, et al. 2018;De Miccolis Angelini et al. 2018). Genomic DNA was extracted using Gentra Puregene tissue kits (Qiagen, Milan, Italy), according to the manufacturer instructions. The genomic DNA quality and quantity were determined using a Nanodrop 2000 spectrophotometer (Thermo Fisher Scientific Inc., Wilmington, DE) and a Qubit 2.0 fluorimeter (Life Technologies Ltd., Paisley, UK). DNA integrity was analyzed (Bioanalyser 2100; Agilent Technologies, Santa Clara, CA). Genome sequencing for short 2Â 100-bp paired-end reads (HiSeq 2500 platform; Illumina Sequencing Technology) and long 20-kb reads (RSII platform; PacBio Sequencing Technology) were both performed by an external service (Macrogen Inc., Next-Generation Sequencing Service, Geumcheon-gu, Seoul, South Korea). The Illumina short reads were analyzed for quality using FastX-tools and trimmed with the Trimmomatic (version 0.36) software (Bolger et al. 2014) with the following parameters: 1) LEADING and TRAILING ¼ 3, which removes bases from the two ends of the reads if they are below a threshold quality of 3, 2) SLIDING WINDOW ¼ 4:2, which cuts the reads when the average quality within the window composed of four bases falls below a threshold of 2, and 3) MINLEN ¼ 50, which removes the reads shorter than 50 bp. Adaptor-free PacBio subreads were used for the genome assembly.

De Novo Assembly of the Genome
The genome of M. fructicola Mfrc123 strain was assembled by a hybrid and hierarchical assembly strategy, using all the reads from both the Illumina and PacBio sequencing to produce scaffolds through the DBG2OLC pipeline available from https://github.com/yechengxi/DBG2OLC; last accessed October 1, 2019 . SparseAssembler (Ye et al. 2012; https://github.com/yechengxi/SparseAssembler; last accessed October 1, 2019) was used to preassemble the cleaned Illumina short reads into contigs, using the following parameters: LD, 0; NodeCovTh, 1; EdgeCovTh, 0; k, 61; g, 15; and PathCovTh, 100, GS 50000000. The overlap and layout were performed with the DBG2OLC module ; https://github.com/yechengxi/DBG2OLC; last accessed October 1, 2019) on the output contigs from SparseAssembler and the PacBio long subreads, to construct the assembly backbone from the best overlaps of reads using the following parameters: AdaptiveTh, 0.001; KmerCovTh, 2; and MiniOverlap, 20. In the final step, all of the related reads were aligned to each backbone using BLASR (Chaisson and Tesler 2012; https:// github.com/PacificBiosciences/blasr; last accessed October 1, 2019), and the most likely sequence was calculated as the polished assembly using the consensus module Sparc (Ye and Ma 2016; https://github.com/yechengxi/Sparc; last accessed October 1, 2019), both with the default settings.
The quality assessment of the draft assembly was performed by mapping RNA-Seq reads from the same M. fructicola Mfrc123 strain  using the CLC Genomics Workbench v. 7.0.3 software (CLC Bio, Aarhus, Denmark), with the default parameters. Unmapped reads were then assembled using the CLC de novo assembler, and the contigs obtained were submitted to BlastN searches against the GenBank nucleotide collection (downloaded, January 10, 2018) with local BLASTþ v. 2.3.0 (Camacho et al. 2009), setting the E-value cutoff at 10 À3 . Ribosomal DNA sequences were identified and BlastN aligned to the PacBio subreads. The subread -s1p0/78570/ 021991RQ ¼ 0.891 was selected for the best homology and used as reference for the Illumina reads alignment using the SeqMan NGen software (Lasergene v. 15.0.1; DNASTAR Inc., Madison, WI) to improve the sequence accuracy. Here, we obtained a 28,000-bp scaffold made up of three rDNA repeat unit sequences. Benchmarking Universal Single-Copy Orthologs (BUSCO v. 3.0.2) (Simão et al. 2015; http://gitlab. com/ezlab/busco; last accessed July 30, 2019) was used with the default parameters to evaluate the completeness of the de novo genome assembly based on the 290 BUSCO groups in the fungi_odb9 lineage data set (http://busco.ezlab.org/; last accessed July 30, 2019) and Botrytis cinerea as the reference species for gene finding with Augustus v. 3.3.2 (http://bioinf.uni-greifswald.de/augustus/; last accessed July 30, 2019).

Repeats Annotation
Simple repeats and low complexity elements were searched using Repeat Masking (http://www.repeatmasker.org/cgi-bin/ WEBRepeatMasker; last accessed July 30, 2019) against the Repbase-derived RepeatMasker library of repetitive elements, with the default parameters. Telomere sequences were identified by searching both ends of the scaffolds for high copy number repeats using BlastN searches and manual inspection.

Protein-Coding Gene Prediction and Functional Annotation
Gene prediction was performed with Augustus implemented in the Blast2GO PRO package (v.5.2.5) using B. cinerea as the model species and the RNA-Seq reads as a guide. Blast2GO PRO was used to obtain gene functional classification based on Gene Ontology (GO) terms in the "biological process," "molecular function," and "cellular component" categories.

Library Construction and Sequencing
A total of 11,014 Mb of Illumina reads (109,054,254 total reads with 90.16% of bases with quality score >30) were generated with an estimated 250Â average depth of sequencing coverage (http://identifiers.org/ncbi/insdc.sra:SRR8599895). A total of 1,057-Mb PacBio long reads (polymerase read N50 of 19,107, quality of 0.858; average subread length of 9.8 kb and subread N50 length of 14.34 kb) were generated with an estimated 24Â average depth of sequencing coverage (http:// identifiers.org/ncbi/insdc.sra:SRR8608008).

De Novo Assembly of the Genome
A complete draft genome without gaps of 44.05 Mb was obtained with a scaffold N50 of 2,592.8 kb (table 1). It was assembled in 16 large scaffolds (from 4.24 to 1.89 Mb; table 2), three very small scaffolds (from 0.36 to 0.26 Mb), and one scaffold made up of three repeat unit sequences of ribosomal DNA. Both telomeric sequences were detected in half of the large scaffolds, with only one telomere in the remaining half (table 2). The number of large scaffolds reconstructed in the M. fructicola Mfrc123 strain corresponded to the 16 large chromosomes of the closely related species B. cinerea and Sclerotinia sclerotiorum (Amselem et al. 2011;Derbyshire et al. 2017). This is in accordance with the short phylogenetic distances existing among these species within the Sclerotiniaceae family (Amselem et al. 2011).
Data obtained from genome completeness analysis by BUSCO showed that 88% of the predicted genes on the assembled genomic sequences were complete and single copy, and only a few were fragmented (10%) or missing (2%) BUSCO orthologs. RNA-Seq reads from the same M. fructicola Mfrc123 strain  were mapped on the draft assembled genome, and about 82% of them mapped in pairs, whereas 10% of them mapped in broken pairs on the final genome draft version.

Repeats Annotation
The genome showed 33,751 elements (4.13% of the genome size) that were identified as single repeats, whereas 3,131 elements (0.41% of the genome size) were recognized as low complexity regions (table 2). Our assembly captured a total of 29 stretches of telomeric sequences (5 0 -TTAGGG-3 0 ) at both ends of 10 scaffolds and at a single end of 9 scaffolds, with repeat numbers ranging from 9 to 23 (table 2), suggesting a good integrity of the assembled genome.

Conclusion
The hybrid and hierarchical assembly strategy, using both Illumina and PacBio sequencing technologies, was successful to produce a new genomic draft of M. fructicola Mfrc123 strain that greatly improved those of other strains of the same fungus available in GenBank (BR-32 strain, GCA_002909715.1; and LMK125 strain, GCA_002162545.1) in terms of contiguity (scaffold N50 values), gap-free sequences, and read mappability, also providing a structural and functional annotation of the genome.
The genome assembly has size and number of genes comparable to other annotated genomes of fungi belonging to the Sclerotiniaceae family available in public databases, such as B. cinerea strains B05.10 (GCA_000143535.4; van Kan et al.  Derbyshire et al. 2017). Moreover, the number of large scaffolds reconstructed in the M. fructicola Mfrc123 strain corresponds to the 16 large chromosomes of the closely related species B. cinerea and S. sclerotiorum (Amselem et al. 2011;Derbyshire et al. 2017).
The new genomic resource will allow to gain deeper knowledge on the genome biology of M. fructicola and provides valuable information for further evolutionary studies within the Monilinia genus and the Sclerotiniaceae family, which contain phytopathogenic fungi of great economical relevance worldwide.

Supplementary Material
Supplementary data are available at Genome Biology and Evolution online.