Genome Assembly of the Ragweed Leaf Beetle: A Step Forward to Better Predict Rapid Evolution of a Weed Biocontrol Agent to Environmental Novelties

Rapid evolution of weed biological control agents (BCAs) to new biotic and abiotic conditions is poorly understood and so far, only little considered both in pre-release and post-release studies, despite potential major negative or positive implications for risks of non-targeted attacks or for colonizing yet unsuitable habitats, respectively. Provision of genetic resources, such as assembled and annotated genomes, is essential to assess potential adaptive processes by identifying underlying genetic mechanisms. Here, we provide the first sequenced genome of a phytophagous insect used as a BCA, i.e. the leaf beetle Ophraella communa, a promising BCA of common ragweed, recently and accidentally introduced into Europe. A total 33.98 Gb of raw DNA sequences, representing c. 43-fold coverage, were obtained using the PacBio SMRT-Cell sequencing approach. Among the five different assemblers tested, the SMARTdenovo assembly displaying the best scores was then corrected with Illumina short reads. A final genome of 774 Mb containing 7,003 scaffolds was obtained. The reliability of the final assembly was then assessed by benchmarking universal single-copy orthologous genes (> 96.0% of the 1,658 expected insect genes) and by remapping tests of Illumina short reads (average of 98.6% ± 0.7% without filtering). The number of protein-coding genes of 75,642, representing 82% of the published antennal transcriptome, and the phylogenetic analyses based on 825 orthologous genes placing O. communa in the monophyletic group of Chrysomelidae, confirm the relevance of our genome assembly. Overall, the genome provides a valuable resource for studying potential risks and benefits of this BCA facing environmental novelties.


Introduction
Rapid evolution of weed biological control agents (hereafter: BCAs) to both biotic and abiotic environments is poorly understood and so far only little considered both in pre-release and post-release studies of weed biological control programs (Müller-Schärer, et al. 2020;Roderick, et al. 2012;Szucs, et al. 2019), nor in predictions of future occurrences under climate change using species distribution models ). This is unfortunate as consequences in term of risks and benefits may potentially be far reaching, such as for nontarget attacks of agricultural crops or native endangered species in case of the evolutionary potential for host range expansion and shifts, or for colonizing so far unsuitable habitats for the BCA that are heavily infested by the target plant invader (Messing and Wright 2006).
Since the emergence of high throughput sequencing methods, population genomic approaches have gained in popularity for studying rapid evolutionary changes in introduced species (Bock, et al. 2015;Chown, et al. 2015;Dlugosch, et al. 2015). This has become possible by the availability of reference genomes of high quality and subsequent bioinformatics processing. For instance, the recently-published genome of the Colorado potato beetle, Leptinotarsa decemlineata, highlighted a high rate of transposable elements compared to other beetle species, high levels of nucleotide diversity and recent expansion of introduced populations that might explain why and how this pest species rapidly evolves on novel host plants, under climate change and/or to overcome management practices (Schoville, et al. 2018). Unlike as for introduced pest species, genomic applications used to predict or track rapid evolutionary changes remains uncommon in weed BCAs, and therefore, the availability of genomic resources is almost non-existent. In the present work, we describe the first genome assembly of a phytophagous insect used as a BCA, i.e. the leaf beetle Ophraella communa LeSage 1986 (Coleoptera: Chrysomelidae) for the control of common ragweed, Ambrosia artemisiifolia.
The North American common ragweed, Ambrosia artemisiifolia L. (Asteraceae), is one of the most prominent plant invaders worldwide, causing economic losses due to severe impacts on human health resulting from its huge amount of highly allergenic pollen and as an important and hard-to-control agricultural crop weed (Essl, et al. 2015;Müller-Schärer, et al. 2018). Sharing the same native range as common ragweed, the oligophagous chrysomelid beetle O. communa is already successfully used as a BCA in Asia (Kim and Lee 2019), and was first reported for Europe in 2013 in the south of the European Alps. Since its arrival in Northern http://mc.manuscriptcentral.com/gbe Italy, local aerial pollen concentrations of A. artemisiifolia have significantly dropped by 80% compared to beetle-free areas (Bonini, et al. 2015;Mouttet, et al. 2018). Combining with its recent spreading, this insect became a promising candidate for long-term management of this notorious plant invader also in Europe (Müller-Schärer, et al. 2014;Müller-Schärer, et al. 2018).
However, as oligophagous, the leaf beetle was reported feeding on various plants species belonging to the Heliantheae (Asteraceae) tribe (Futuyma and McCafferty 1990), including Helianthus annuus L. (sunflower). Its presence on this important economical crop, on which the leaf beetle can even complete its life cycle, was a reason to reject O. communa as BCA for Australia (Palmer and Goeden 1991). The risk of this leaf beetle becoming a serious agricultural pest in case of rapid adaptation to crop species remains however unclear (Dernovici, et al. 2006;Lommen, et al. 2017;Zhou, et al. 2011), an uncertainty for which the supply of genomic resources and ensuing population genetic studies will provide a most helpful tool for its clarification.
The present study describes the first genome assembly of O. communa using sequencing of long-reads. We used several assemblers to get the best genome assembly and then assessed its quality by checking for bacterial contamination and for the presence of conserved insect representative genes. In addition, we evaluated the reliability of this genome assembly to perform population genomic studies by re-mapping tests, performed a preliminary annotation of functional proteins, and replace the species in the phylogeny of Polyphaga Emery, 1886 (Coleoptera). So far, mitochondrial sequences were the single genetic resources available (e.g. Song et al. 2018). Recently, the antennal transcriptome of O. communa has been sequenced to study the molecular basis of olfaction recognition in this insect (Ma, et al. 2019). The O. communa genome will be used to track the migration and spread history, and to detect potential rapid selection to both novel biotic and abiotic conditions in introduced ranges using genome scans and subsequent outlier analyses.

Material and methods
Tissue sampling, DNA isolation and sequencing http://mc.manuscriptcentral.com/gbe A small population of ca. 50 individuals of O. communa was collected in May 2018 in the Milano area in Italy (GPS coordinates: 45°34'14.63''N, 8°47'7.66''E) and reared under random mating, in the quarantine facilities at the University of Fribourg (Switzerland). After 9 months, corresponding to approximately 6 generations, four alive adults (two males and two females) were selected for genomic DNA (gDNA) isolation and preserved in Silica gel. The entire individuals were first dried at 70°C during 1h and then pooled before grinding to increase the gDNA content at the end of the DNA isolation step. We assumed that the individuals selected were more genetically similar compared to the individuals of the original population, although this similarity is not expected to be high after 6 generations of rearing under random mating.
This expectation is dependent to the homozygosity rate in the original population and the reproductive skew in the insect species. Under random-mating, the gain of homozygosity from the original population is expected to be around 6% after 6 generations of rearing. However, evidences for assortative mating have been reported in this chrysomelid species under controlled conditions (Zhou, et al. 2012), suggesting that the level of homozygosity in our reared population may thus have been reinforced according to the extent of the reproductive skew. A DNeasy® Plant Mini Kit (Qiagen, Germany) was used to extract gDNA from the pool according to the manufacturer's protocol.
Five µg of the DNA pool was then used to prepare a single library with the PacBio SMRTbell Express Template Prep kit 2.0 (Pacific Biosciences, Menlo Park, CA, USA) according to the manufacturer's protocol. The resulting library was size selected on a BluePippin system (Sage Science, Inc. Beverly, MA, USA) for molecules larger than 15 kilobases (kb). The final library was then sequenced on a PacBio Sequel II instrument (Pacific Biosciences, Menlo Park, CA, USA) using five SMRT cells v3, to reach an expectation of ca. 50-fold coverage for a genome size estimated at 600 Mb (Petitpierre, et al. 1993).

Genome assembly
To obtain the most reliable genome assembly, five different assemblers, commonly used for long-reads sequence data (Jayakumar and Sakakibara 2019), were employed: Canu, Flye, MECAT, SMARTdenovo and wtdbg (Supplementary Table 1 with references and versions used herein). The best assembly corresponds to the one of SMARTdenovo (see result and discussion part) for which no corrections was applied during the assemblage proceeding. Three rounds of scaffold polishing were therefore performed to improve the accuracy of the genome http://mc.manuscriptcentral.com/gbe assembly: for the two first rounds, we used Arrow from the GenomicConsensus package (https://github.com/pacificbiosciences/genomicconsensus) to correct sequence data into consensus sequences with long reads; for the last polishing, we employed Pilon (Walker, et al. 2014) to identify and correct inconsistencies between the genome assembly and a set of

Quality assessment
As entire individuals were used for DNA extraction, bacterial contamination of the genome assembly was first investigated using Kraken v0.10.6 (Wood and Salzberg 2014). The completeness of the genome assembly was then assessed by checking for the presence of conserved representative genes. For this purpose, we used BUSCO v3.0.2 (Simao, et al. 2015) to benchmark the scaffolds against 1,658 single-copy orthologs of the Insecta_obd9 database.
For comparison, the same analyze was performed on the three other chrysomelid genome (Table 1).
In addition, the reliability of using the genome assembly for population genomic studies was evaluated by checking for mapping quality of short reads (i.e. low-coverage wholegenome data, obtained as described above). For this, we used bwa v0.7.17 (Li and Durbin 2009)  Prior to the mapping, the raw sequences were trimmed using trimmomatic v0.36 (Bolger, et al. 2014), and only PE reads, representing ca. 97% of the trimmed data, were retained for the following analyses. The mapped reads were then filtered on their mapping quality at Q10 and Q20 values using samtools v1.8 ).
In order to estimate the proportion of transposable elements (TEs) in the genome assembly, a species-specific repeat library was first built using RepeatModeler v1.0.11 Hubley 2008-2015). This repeat library was then used with RepBase (http://www.girinst.org/repbase/) to determine the overall TEs content the genome assembly using RepeatMasker (Smit, et al. 2013(Smit, et al. -2015.

General characteristics of the final assembly
In total, 33.98 Gb of raw data were sequenced with an average length of 14.96 kb per read.
Excepted for Canu, which failed during the processing, all assemblers generated a genome assembly summarized in the Supplementary Table 2. The SMARTdenovo assembly, displaying the best scores (i.e. the lowest number of scaffolds, highest average size and N50 values and http://mc.manuscriptcentral.com/gbe longest minimum and maximum scaffold sizes), was considered as the most reliable assembly for the following proceedings.
After the polishing procedure (results in the Supplementary Table 3), the final genome assembly of the ragweed leaf beetle encompasses 7,003 scaffolds for a genome size of 774 Mb. This final genome size is slightly larger compared to the previous estimation of 600 Mb (Petitpierre, et al. 1993), a difference already reported for the Colorado potato beetle genome (Schoville, et al. 2018). Most of the global statistics of the O. communa genome, summarized in Table 1, are in the middle range of the three other chrysomelid genomes (e.g. scaffold N50, Table 1).

Quality assessment
The taxonomic classification made by Kraken assigned 94.37% of the 7003 scaffolds to the Colorado potato beetle genome, 4.71% to bacteria genomes, 0.75% to various genomes and the remaining 0.17% was unassigned. This strong majority of leaf beetle sequences supports an (almost) absence of bacterial contamination in our genome assembly.
The BUSCO assessment indicated an excellent representation of Insect genes, with 96.0% of the 1,658 single-copy orthologs identified as completed (i.e. duplicate and single-copy), while only 1.4% and 2.6% were considered as fragmented and missing, respectively, in the assembly. These percentages are similar to the three other chrysomelid genomes ( Figure 1A).
Disregarding the population level, the remapping tests showed high percentages of mapping ( Figure 1B), suggesting a high reliability of the genome assembly to map short reads.
These percentages however dropped down after filtering on mapping quality with averages of 67.3% ± 1.3 and 63.7% ± 1.5 for Q10 and Q20 filters, respectively. This decreasing is explained firstly by the multimapping of reads that generated low values of mapping quality (i.e. average of 28.0% ± 1.1 of PE reads with a mapping quality at 0) and secondly by the stage of fragmentation of the genome that prevented the mapping of reads overlapping with the borders of scaffolds. At the population level, reads of the Italian samples displayed higher mapping qualities (e.g. for Q20, average of 65.6% ± 0.9) than those of Chinese and North-American samples (e.g. averages for Q20 of 63.0 % ± 0.5 and 62.1% ± 0.6, respectively).
Knowing that the genome assembly used DNA extracted from Italian samples, this obvious difference strengthens the reliability of our genome assembly for bioinformatic proceedings. http://mc.manuscriptcentral.com/gbe

Gene prediction and phylogenetic analysis
Overall, 75,642 protein-coding genes were predicted in the genome assembly of O. communa.
This number represents 82% of the number of unigenes (92,259) predicted by Ma et al. (2019) based on antennal transcriptome analysis, which suggests a correct representation of proteincoding genes in our genome assembly. This number is however much larger compared to other chrysomelid genomes (Table 1), suggesting an overestimation of protein-coding genes probably explained by the large number of TEs in our genome assembly (see results on TEs below). Among the 75,642 protein-coding genes, OrthoDB assigned 24.6% (18,619) in orthologous clusters corresponding to those found in Polyphaga beetles proteomes, and 21.0% of them (3,904) clustered with those found in the Colorado potato beetle proteome ( Figure 1C).
A high fraction of TEs (58.2%, details per category are provided in the Supplementary Table   4) was identified in our genome assembly. Among them, 68.0% could not be attributed to any repeat sequence categories. This high percentage, already reported for the cowpea weevil genome (Sayadi, et al. 2019), suggests a high number of repeat sequences unreported in the database, probably explained by long evolutionary distances to previously known repeats.
The maximum likelihood tree obtained with 825 single-copy orthologous proteins shows that O. communa form a monophyletic group with the three other chrysomelid species included in the phylogenetic analysis ( Figure 1D). The following phylogenetic relationship with the other Polyphaga beetles is congruent with the recent update of evolutionary history of Coleoptera based on 95 protein-coding genes (Zhang, et al. 2018).

Conclusion
We provide the first genome assembly of a phytophagous insect used as a BCA, i.e. the leaf beetle Ophraella communa, a promising biocontrol candidate against common ragweed, one of the world's most noxious plant invaders. The various quality assessments showed that the genome assembly described in this study is well suited to be used as a reference genome for O. communa and probably also for the other thirteen Ophraella species. In addition to the transcriptome recently published, the preliminary annotation carried out on the present genome assembly will supply the first insights into molecular mechanisms involved in rapid evolution. Overall, this new genome provides a valuable resource to determine potential risks and benefits of weed BCA facing environmental novelties.  http://mc.manuscriptcentral.com/gbe