ARTICLE TYPE: Genome Report for Genome Biology and Evolution Comprehensive transcriptome of the maize stalk borer, Busseola fusca , from multiple tissue types, developmental stages, and parasitoid wasp exposures

Busseola fusca (Fuller) (Lepidoptera: Noctuidae), the maize stalk borer, is a widespread crop pest in sub-Saharan Africa that has been the focus of biological research and intensive management strategies. Here, we present a comprehensive annotated transcriptome of B. fusca (originally collected in the Western Province of Kenya) based on ten pooled libraries including a wide array of developmental stages, tissue types, and exposures to parasitoid wasps. Parasitoid wasps have been used as a form of biocontrol to try and reduce crop losses with variable success, in part due to differential infectivities and immune responses among wasps and hosts. We identified a number of loci of interest for pest management, including genes potentially involved in chemoreception, immunity, and response to insecticides. The comprehensive sampling design used expands our current understanding of the transcriptome of this species and deepens the list of potential target genes for future crop loss mitigation, in addition to highlighting candidate loci for differential expression and functional genetic analyses in this important pest species.


Introduction
The maize stalk borer, Busseola fusca (Fuller) (Lepidoptera: Noctuidae; Figure 1), is an important pest of cereal crops in sub-Saharan Africa. Due to its abundance and distribution, it represents a major constraint to food production where infestation rates are high (Kfir et al. 2002). Crop losses resulting from B. fusca feeding activity vary by region, but can result in a total loss in some areas (Van den Berg et al. 1991. Its impact on the food security and economic well-being of people has led to a number of studies into the physiology and ecology of the species (reviewed in Calatayud et al. 2014). With the release of the genome (from the same lineage used in this study; Hardwick et al. 2019) and the publication of the first transcriptome (derived from neonate tissue from a genotype collected in South Africa; Peterson et al. 2019), we now have basic genomic information for this species. With pest species, in particular, the identification of candidate genes related to important traits such as developmental timing, reproduction, insecticide resistance, adaptation to plant defense mechanisms, immunity, and chemoreception requires sampling multiple time points, exposures, and populations. This is particularly relevant in B. fusca, where certain strains are known to be differentially vulnerable to infection by parasitoid wasps (Kfir 1995). This information, in turn, can inform or enhance management strategies (e.g., those developed for Plutella xylostella [You et al. 2013]).
Female B. fusca typically deposit eggs ( Figure 1A) between the stem and leaf sheet of the host plant. Larvae hatch ( Figure 1B), feed on young leaves, and penetrate the plant stem during the third and fourth instar ( Figure 1C) where they remain until pupation. Feeding during the larval stage, which is also when the animals are vulnerable to parasitoid wasps ( Figure 1D), damages the host plant and reduces yield or kills the plant. After pupation, adult moths ( Figure 1E) use chemosensory cues and receptors to attract and find mates, food, and suitable places to lay eggs. Pest management strategies can include, for example, introducing substances that can interrupt sending or receiving chemosensory cues to/from mature adults or can include the introduction of biocontrol species to which the focal species is unable to mount a defense response. Because it involves fewer chemicals, biological control is a pest management strategy that has been employed with B. fusca over the last several decades. Beginning in the 1990s, a biocontrol program was launched in Kenya using the indigenous larval parasitoid of B. fusca, Cotesia sesamiae (Cameron; Hymenoptera: Braconidae; Figure  1D). Prevalence of parasitism by C. sesamiae can vary (ranging from <5% to 75% [Kfir 1995;Sallam et al. 1999;Jiang et al. 2006;Songa et al. 2007]), in part due to differences among wasp strains (Mochiah et al. 2002;Gitau et al. 2010;Branca et al. 2011). Busseola fusca are resistant to infection by C. sesamiae from Mombasa (coastal Kenya), referred to hereon as Cs-Coast, but vulnerable to C. sesamiae from Kitale (a site in inland Kenya; referred to hereon as Cs-Inland; Ngi-Song et al. 1995).
Here, we present a comprehensive transcriptome of B. fusca based on data from 10 libraries representing different tissues, developmental stages, and parasitoid-exposure treatments. Phylogeographic studies (Sezonlin et al. 2006;Dupas et al. 2014) and reports of dominant inheritance of field-evolved resistance to Bt maize (Campagne et al. 2013) represent the current frontier, in terms of studies of individual genes. We use the transcriptome to identify important candidate genes and gene families, as well as investigating the transcriptomic profiles of non-genic regions, for future research and ongoing management efforts aimed at curbing the effects of this devastating crop pest.

Methods
See Supplemental Methods for complete details. Briefly, sample collection, preparation, and sequencing were performed at the International Centre of Insect Physiology and Ecology (icipe) and Biosciences eastern and central Africa genomics facility at the International Livestock Research Institute in Nairobi, Kenya (BecA-ILRI Hub). Specimens were obtained from a colony initiated from larvae collected in maize fields from the Western Province of Kenya in 2008, and maintained on an artificial diet under laboratory conditions at icipe. RNA extractions were performed on 10 different B. fusca tissues types, (thoraces, antennae [both sexes], and female ovipositor), developmental stages (eggs, neonates, fourth instar larvae, and adults [both sexes]), and parasitoidexposure conditions (unexposed and exposed to two strains of wasp; Table S1). We used a TruSeq RNA Library Prep Kit (Illumina) to prepare libraries and performed nextgeneration sequencing using Illumina MiSeq technology, resulting in 10 libraries of 300 bp paired-end reads (Table S2). Sequence data are available on NCBI SRA (BioProject: PRJNA553865). We used the RNA-seq de novo assembly program Trinity (version 2.4; Grabherr et al. 2011) to assemble the transcriptome (Table 1), and generated a set of non-redundant unigenes from our transcripts using the program CD-HIT (version 4.7; Li and Godzick 2006;Fu et al. 2012). The transcriptome assembly is available on NCBI (BioProject: PRJNA553865). We also assembled each library individually, and generated library-specific sets of non-redundant unigenes with CD-HIT. We ran BUSCO (version 3; Waterhouse et al. 2017) to assess the completeness of our assembly and Trinotate (v. 3.0.2; Bryant et al. 2017) to annotate the transcriptome. We quantified the distribution of genes within different gene ontology (GO) categories using WEGO (version 2.0; Ye et al. 2018; Figure S1). Finally, we used OrthoFinder (version 2.3.3; Emms and Kelly 2019) to identify orthologous gene clusters among Lepidopteran transcriptomes and identified the 20 largest gene families in B. fusca. We identified the top 25 most highly expressed transcripts for each RNA library by mapping reads from each library to the pooled transcriptome assembly unigenes using the program STAR (version 2.7; Dobin et al. 2013) and generated unique lists of the 25 transcripts with the highest RPKM values for each.

Results
Our B. fusca transcriptome (Table 1 and Table S2; BioProject PRJNA553865), which contained >99% of the conserved single-copy orthologs used by BUSCO to assess completeness, contains approximately 1/3 more transcripts than previously known for this species, greatly expanding the number of genes available for further investigation (Peterson et al. 2019). The median number of isoforms per gene was one for all individual libraries, with library 5_S5_L001 (infection treatment: exposed Cs-Inland) having the highest maximum number of isoforms for a single transcript (transcript TRINITY_DN8760_c0_g1, with 45 isoforms reported by Trinity). Furthermore, the antenna libraries (4_S4_L001 and 3_S3_L001) had the highest percentage of genes with greater than one isoform per gene (21.1% and 19.4%, respectively), while the egg library (2_S2_L001) had the lowest (6.6%). We annotated the assembled B. fusca transcriptome using Trinotate and identified 22,707 protein-coding genes and generated a set of 39,445 proteins overall. We identified more protein-coding genes than were predicted in the previously published B. fusca whole genome assembly (19,417;Hardwick et al. 2019). Of the 22,707 predicted protein-coding genes found in this study, 19,401 aligned to the genome assembly. The additional 4,306 genes identified in this study may represent alternative transcripts or fusion transcripts that the transcriptome annotation software does not recognize as variants produced by a given locus or loci. Alternatively, it possible these sequences were unrecognized by the genome annotation software because the depth of sequencing coverage for the whole genome sequencing project was insufficient to fully assemble the genome given its size (~500 Mb) and repeat content (~50%). Figure S1 gives an overview of the GO terms associated with all annotated transcripts in the B. fusca transcriptome assembly after 20,118 of genes were assigned to gene ontology categories using Trinotate. Using protein sets from Bombyx mori, Manduca sexta, and Plutella xylostella, we searched for orthologs in B. fusca and characterized gene families. The top 20 largest gene families identified by OrthoFinder are listed in Table S3. We report the most abundant transcripts across all libraries (Table S4) and in each individual tissue type, developmental stage, and infection treatment (Tables S5-7) and between male and female antennae (Table S8) and thoraces (Table S9). We also characterize the non-genic transcriptomic profiles by reporting the diversity of transposable elements (TEs) transcribed in each library (Table S10) and the evidence for transcription of putatively horizontally-transferred regions of the genome based on high identity between sequences in B. fusca and Cotesia sesamiae (both strains; Table  S11).

Discussion
Because food security is an essential component of public health worldwide, managing pests that target major crop species is crucial (World Bank 2008). Here, we sequenced, assembled, and annotated the transcriptome of B. fusca to try and identify candidate genes for pest management strategies and to understand more about the biology of this species. This study expands on previously published transcriptomic resources in several ways. In particular, we analyzed data from 10 different RNA-Seq libraries spanning multiple life stages, tissue types, and sexes. The inclusion of these groups allowed us to identify additional transcripts relative to previous studies. Our full transcriptome assembly includes 240,022 sequences. In contrast, the only other B. fusca transcriptome published to date using the larval stage only identified 170,756 transcript sequences (Peterson et al. 2019). The current study also expands on the geographical range of B. fusca populations surveyed; while the previously published transcriptome is based on tissues from individuals that originated in South Africa, our work focuses on individuals originating from Western Province, Kenya. The inclusion of these additional populations, developmental stages, and tissues will aid in the development of targeted pest control based on life cycle and behavior in this destructive species.
Based on our transcriptome (see Supplemental Methods and Results for complete details), we identified genes of interest putatively involved in chemoreception. A number of the transcripts with potential chemosensory function were abundant in the antennae relative to other tissue types. Specifically, 10 out of 25 transcripts uniquely highly expressed in the antennae had significant homology to General odorant-binding proteins (Table S7). General odorant-binding proteins are distributed in the sensilla, and function to bind odorants and transport them to odorant receptors. General odorantbinding proteins have been shown to be involved in the detection of plant volatiles (Vogt et al. 2002;Nardi et al. 2003;Maida et al. 2005;Liu et al. 2015). We also observed differences between male and female libraries in genes expressed in the antennae, which may be of interest for determining sex-specific chemosensory genes. For example, while both males and females showed high expression of General odorantbinding proteins, we identified suites of uniquely abundant General odorant-binding proteins that were distinct for each sex (Table S8). In addition, the female antenna library exhibited elevated abundance of two transcripts with high similarity to Pheromone-binding protein. Pheromone binding proteins are typically associated with the detection of female sex pheromones by males, but some studies have identified high levels of expression of these genes in female antennae of moths in the family Noctuidae (e.g., Callahan et al. 2000). The abundance of these transcripts in females may suggest a role of Pheromone-binding protein genes in detecting odorants other than sex pheromone in B. fusca.
We also sought to better understand immune response in B. fusca, by identifying transcripts expressed in larvae infected by the parasitoid C. sesamiae. Busseola fusca exhibits differential susceptibility to infection by C. sesamiae from coastal and inland subpopulations. To shed light on the biological mechanisms underlying this difference, we assessed gene expression in larva exposed to Cs-Inland and Cs-Coast, with the goal of determining immune genes highly expressed in each treatment. Fourteen of the 25 most highly expressed transcripts in the Cs-Coast treatment had significant sequence similarity to the gene Arylphorin (Table S5). Arylphorin was also the largest B. fusca gene family identified via our analysis of orthologous gene clusters (Table S4). Arylphorin is a storage protein produced throughout larval instars in Lepidoptera (Riddiford 1995), and has also been proposed to play a role in humoral immune defense in response to bacterial infection (Freitak et al. 2007). In contrast to the Cs-Coast treatment, larvae in the Cs-Inland treatment showed high expression of the genes Gloverin and Lebocin, both of which have antimicrobial function in Lepidoptera (reviewed in Yi et al. 2014). These results indicate that Arylphorin, Gloverin, and Lebocin may be good targets for future research to understand why Cs-Inland are able to successfully parasitize B. fusca larva, while Cs-Coast are not.
We identified several large gene families potentially of interest for understanding B. fusca's response to insecticides. One of the largest B. fusca gene clusters we identified was the Xanthine dehydrogenase family, encompassing five genes and nine isoforms. Xanthine dehydrogenase is known to be involved in the production of nitrogenous excretory compounds (Bergmann and Dikstein 1955;Cochran 1985), and has also been implicated in phytochemical detoxification (Slansky 1993). Previous research has found that duplication of the Xanthine dehydrogenase gene is related to resistance to organophosphorus insecticide resistance in mosquitoes (Coleman and Hemingway 1997). We also identified a large B. fusca gene family corresponding to the Transient receptor potential channel pyrexia gene. This family encompassed five genes and six isoforms. Transient receptor potential (TRP) genes are related to environmental perception in insects (Venkatachalam and Montell 2007;Fowler and Montell 2013), and are common targets of insecticides (Nesterov et al. 2015;Kandasamy 2017;Kwon et al. 2010). Additional research to identify the specific functions of the large Xanthine dehydrogenase and TRP pyrexia gene families in B. fusca may shed light on the potential for insecticide resistance in this species, and inform future control efforts. Future studies may also utilize our transcriptomic resources to better understand Cry toxin resistance by looking at tissue-specific expression of genes such as alkaline phosphatase, aminopeptidase N, cadherin Cry toxin receptor, ATP-binding cassette transporters, and mitogen-activated protein kinases (Peterson et al. 2017).
We identified a number of highly abundant transcripts in the antennae which were annotated as Replicase polyproteins (Table S7). Upon further investigation, these sequences had significant BLAST hits to iflavirus sequences. Iflaviruses are singlestranded RNA viruses that to date have been shown to exclusively infect arthropods (Van Oers 2010). RNA derived from iflaviruses occur at high concentrations in the antennae of infected individuals (Dos Santos et al. 2019). To our knowledge, iflavirus infection has not previously been observed in B. fusca and the significance of infection should be investigated further. To characterize the non-genic components of the transcriptome, we used the previously curated repeat library (from Hardwick et al. 2019) to search against the transcriptomes (Table S10). The majority of transposable elements in B. fusca are expressed in each of the individual transcriptomes, with the greatest diversity transcribed in male antennae. We also used low coverage draft sequences generated from the two parasitoid wasp strains (C. sesamiae Kitale and Mombasa) to identify candidate sequences that may have been horizontally-transferred to or from B. fusca (Supplemental Data File S6) and are transcribed (14 of 17 identified candidates; Table S11). Interestingly, three of the candidate regions appeared to be of bracoviral origin, suggesting that, as with many other insects (e.g., Gasmi et al. 2015), viruses may play an instrumental role in both the genomic and transcriptomic profile of B. fusca.

Conclusion
By sequencing a comprehensive set of ten libraries representing different developmental stages, tissue types, and parasitoid exposures, our transcriptome contains the largest number of expressed transcripts available to date and has allowed us to identify a number of loci of interest for potential pest management. Namely, loci potentially involved in chemoreception, response to infection, and response to insecticides could serve as promising targets for future research. We hope insights from the B. fusca transcriptome will aid ongoing efforts to develop control measures that can be deployed as part of an integrated pest management strategy in order to reduce B. fusca's impact on food security across the continent.