Odorant Binding Proteins and Chemosensory Proteins in Episyrphus balteatus (Diptera: Syrphidae): Molecular Cloning, Expression Profiling, and Gene Evolution

Abstract Aphidophagous syrphids (Diptera: Syrphidae) are important insects in agroecosystems for pollination and biological control. Insect chemoreception is essential for these processes and for insect survival and reproduction; however, molecular determinants is not well understood for these beneficial insects. Here, we used recent transcriptome data for the common hoverfly, Episyrphus balteatus, to characterize key molecular components of chemoreception: odorant-binding proteins (OBPs) and chemosensory proteins (CSPs). Six EbalCSPs and 44 EbalOBPs were cloned from this species, and sequence analysis showed that most share the characteristic hallmarks of their protein family, including a signal peptide and conserved cysteine signature. Some regular patterns and key conserved motifs of OBPs and CSPs in Diptera were identified using the online tool MEME. Motifs were also compared among the three OBP subgroups. Quantitative real-time PCR (qRT-PCR) showed that most of these chemosensory genes were expressed in chemosensory organs, suggesting these genes have chemoreceptive functions. An overall comparison of the Ka/Ks values of orthologous genes in E. balteatus and another predatory hoverfly species to analyze the evolution of these olfactory genes showed that OBPs and CSPs are under strong purifying selection. Overall, our results provide a molecular basis for further exploring the chemosensory mechanisms of E. balteatus, and consequently, may help us to understand the tritrophic interactions among plants, herbivorous insects, and natural enemies.

Aphidophagous syrphids are an economically important insect group, whose larvae act as efficient control agents of crop aphids, and the adults are well-known pollinators of many plant species (Chambers and Adams 1986, Smith and Chaney 2007, Ssymank et al. 2008, Hopper et al. 2011, Jauker et al. 2012, Rader et al. 2016. Like other enemies of insect pests, aphidophagous hoverflies can use a range of environmental cues such as prey-derived volatiles [(E)-βfarnesene], herbivore-induced plant volatiles (monoterpenes and sesquiterpenes), or naturally occurring general leaf volatiles (GLVs; alcohols, aldehydes, and esters) to locate their prey and oviposition sites, where these processes rely heavily on their chemosensory systems (Turlings and Tumlinson 1992;Pare and Tumlinson 1997;Gilbert 2000a, 2000b;Francis et al. 2005;Harmel et al. 2007;Verheggen et al. 2008). In this context, a detailed study of the chemoreception in the hoverflies would help us to understand the plant-herbivore-natural enemy interactions and thereby maximize their function as natural enemies and pollinators.
In insects, two groups of polypeptides are commonly known to involve the process of the chemoreception: membrane-bound receptors (olfactory receptors [ORs], gustatory receptors [GRs], and ionotropic receptors [IRs]) (Buck and Axel 1991, Clyne et al. 1999, Carraher et al. 2015, He et al. 2018) and soluble binding proteins (odorant-binding proteins [OBPs] and chemosensory proteins [CSPs]) (Pelosi et al. 1981;Vogt and Riddiford 1981;Pelosi et al. 1982;He et al. 2019aHe et al. , 2019b. The latter are highly concentrated in the olfactory organs and have long been considered to participate the first step of chemical communication, acting to bind and solubilize odorant molecules across the aqueous lymph of the sensilla toward the corresponding membranebound receptors (Calvello et al. 2003, Honson et al. 2005, Pelosi et al. 2006, Sachse and Krieger 2011, Pelosi et al. 2014.
Insect OBPs display a typical conserved cysteine pattern, and based on the number of cysteines present, they can be further divided into five distinct subgroups: classical OBPs (six conserved cysteines), minus-C OBP (four conserved cysteines), plus-C OBP (eight conserved cysteines), atypical OBP (9-10 conserved cysteines and a long C-terminus), and dimer OBP (12 conserved cysteines) (Xu et al. 2003, Zhou et al. 2004, Pelosi et al. 2006, Zhou et al. 2010. Since the first OBPs were discovered in 1981 from the giant moth Antheraea polyphemus (Vogt and Riddiford 1981), they have been identified and characterized in numerous insect species across multiple orders using transcriptome and genome sequencing , Pelosi et al. 2014. Their indispensable role in insect chemoreception has also been demonstrated in vivo and in vitro for several species (e.g., Xu et al. 2005;Matsuo et al. 2007;Swarup et al. 2011;Siciliano et al. 2014;Wu et al. 2016;Zhu et al. 2016;He et al. 2017He et al. , 2019a. The chemosensory proteins (CSPs) were first identified in Drosophila melanogaster and have since been characterized from insects in a wide range of orders, and variously called olfactory specific-D-like (OS-D-like) proteins (McKenna et al. 1994), A-10 proteins (Pikielny et al. 1994), or sensory appendage proteins (SAPs) (Robertson et al. 1999). Compared with OBPs, the proteins in this family are much smaller (around 120 amino acids), have only four conserved cysteines, and share a higher amino acid identity across insect species (Pelosi et al. 2006(Pelosi et al. , 2018. As another class of small soluble binding proteins, growing experimental evidence has demonstrated that the role of some members of this family is not confined to chemical perception and may, in fact, participate in other physiological functions such as growth and development, immune response, dietary (for a review, see Pelosi et al. 2018).
As an effort to better understand the molecular basis of syrphid olfaction, in the current study, we focused on studying the two classes of odorant carrier proteins, OBPs and CSPs, in the hoverfly, Episyrphus balteatus (Diptera: Syrphidae), a relatively well-known predatory hoverflies worldwide. We first tested the validity of the putative 49 OBPs and six CSPs genes obtained from previously reported E. balteatus antennal transcriptome data (Wang et al. 2017a), by molecular cloning and sequencing, then systematically characterized the encoding genes with respect to biochemical characteristics, phylogenetic analysis, conserved cysteines, and motif patterns. We also analyzed expression patterns of these genes in different body parts using quantitative real-time PCR (qRT-PCR) and the molecular evolution of these genes in this species in comparison to another predatory hoverfly species by calculating the rate of sequence evolution (nonsynonymous to synonymous changes [Ka/Ks]).

Insect and Tissue Collection
Adult individuals of E. balteatus used in this study were originally caught in cotton fields at the Xinxiang Experimental Station of Chinese Academy of Agricultural Sciences, Henan Province, China (35.18°N, 113.52°E) in September 2017, and were then reared in our laboratory on an artificial diet at 23 ± 1°C, 65 ± 5% relative humidity (RH), and a 14:10 (L:D) h photoperiod (Jia et al. 2019).
Various tissues, including 300 antennae, 50 heads (excluding antennae), 30 thoraxes, 20 abdomens, and 100 legs of mixed sexes, were separated from newly emerged adults (<24 h old) on ice under a microscope and immediately stored at −80°C for further processing. All treatments were performed three times.

RNA Isolation and cDNA Synthesis
Total RNA was extracted from each specimen using the Trizol Reagent (Ambion, Life Technologies, Carlsbad, CA) along with the recommended protocols. RNA sample purity and concentration were determined using a spectrophotometer (NanoDrop-2000, Thermo Scientific, Wilmington, DE). Only RNA preparations with an A260/A280 ratio between 1.9 and 2.05, and A260/A230 ratio > 1.8 were used in the following experiments.
For each sample, cDNA was synthesized from 1 µg of RNA using the FastQuant RT kit with gDNA Eraser (TianGen, Beijing, China) following the manufacturer's instructions, in a 20 μl final volume. The cDNA was diluted to 200 ng/µl with nuclease-free water to be used in the further quantitative real-time PCR reaction (qPCR).

Experimental Validation of Identified OBPs and CSPs
The transcripts encoding novel EbalOBPs/CSPs were derived from E. balteatus antennae transcriptome data set, which was constructed previously by Prof. Wang Gui-Rong's group in our laboratory (Wang et al. 2017a). To examine whether these putative OBPs and CSPs were actually expressed in this species, antennal cDNA were used as a template to clone the intact or partial sequences with each specific primer pair (Supp Table 1 [online only]).
DreamTaq DNA polymerase (Thermo Fisher Scientific, Waltham, MA) was used to amplify individual sequences under the following conditions: an initial denaturation step (95°C for 3 min); followed by 38 cycles at 95°C for 1 min, 55°C for 30 s, 72°C for 1 min; and a final extension of 10 min at 72°C. PCR products were gelpurified and subcloned into the pEasy-T3 vector (TransGen, Beijing, China). Then the positive inserts were sequenced at Beijing Genomic Institute (Beijing, China).

Basic Bioinformatics Analysis
The open reading frames (ORFs) of the identified chemosensory genes were obtained by the ORF Finder Tool at the NCBI (http:// www.ncbi.nlm.nih.gov/gorf/gorf.html). Putative signal peptides and their cleavage sites were predicted with the SignalP 4.1 Server (http:// www.cbs.dtu.dk/services/SignalP/) (Petersen et al. 2011). The molecular mass (MW) and isoelectric point (pI) of predicted proteins were computed using the Compute pI/MW online program at the ExPASy proteomics server (http://www.expasy.ch/cgi-bin/pi_tool). Multiple sequence alignments were done using DNAMAN 6.0 (Lynnon Biosoft, Canada) with default gap penalty parameters, and then the results were viewed by WebLogo 3.0 for a visualized presentation (Crooks et al. 2004).

Motif Analysis
The MEME online tool (version 4.12.0, http://meme-suite.org/tools/ meme) was used to discover and analyze the OBP and CSP protein motifs (Bailey et al. 2015) based on previous similar reports (Xu et al. 2009, Gu et al. 2015, Zhao et al. 2018) using the parameters minimum width = 6, maximum = 10, and the maximum number of motifs to find = 8. All OBP and CSP sequences used in this study have intact full-length ORFs, and the translated proteins have lengths similar to those of insect OBPs and CSPs. The OBP peptide signal was removed (using PrediSi software; Hiller et al. 2004) before the alignment.

Tissue Expression Analysis
Expression in different tissues of these 49 OBPs and five CSPs was evaluated by real-time quantitative PCR (RT-qPCR). Primer pairs were designed by Beacon Designer 7.90 (Premier Biosoft International, Palo Alto, CA). The specificity of all primers was confirmed by visualization of a single band amplicon of the expected size after 2% agarose gel electrophoresis and a single peak in a qPCR melting curve; the efficiency was then calculated by analyzing standard curves with a 10-fold cDNA dilution series. In all experiments, all primers gave amplification efficiencies of 90-100%. Primer pairs selected for qPCR analyses and the results of efficiency tests are presented in Supp Table 4 (online only). MT247257 The qPCR experiments were subsequently conducted on an ABI 7500 Real-Time PCR System (Applied Biosystems, Carlsbad, CA) using SYBR Green SuperReal PreMix Plus (TianGen, Beijing, China), in a 20 μl reaction volume. The cycling parameters were one cycle of 95°C for 15 min; then 40 cycles of 95°C for 10 s and 62°C for 32 s. For data reproducibility, each reaction was done in three technical replicates on three independent biological replicates. The ribosomal protein S3 gene (rps3) exhibits a stable expression across all tissue types, was selected as the reference gene for normalizing target gene expression (Wang et al. 2017a).
The relative transcript level of each target gene among various tissues was calculated using the 2 -ΔΔCT method (Livak and Schmittgen 2001), and obtained data were subsequently compared for significant differences (P < 0.05) with a one-way nested analysis of variance (ANOVA), followed by Tukey's honest significance difference (HSD) test, using the SPSS Statistics 18.0 software (SPSS, Chicago, IL).

Evolutionary Analysis
We analyzed the phylogeny of OBPs and CSPs for two common predatory hoverflies, E. balteatus and E. corollae, through assessing three principal concepts: nonsynonymous substitutions per nonsynonymous site (Ka), synonymous substitutions per synonymous site (Ks), and the ratio between Ka and Ks (Ka/Ks). KaKs_Calculator software with the MS model was employed to estimate the Ka, Ks values and its ratio Ka/Ks, for obtained putative orthologous pairs between the two species (Zhang et al. 2006). The input files for KaKs_Calculator were prepared by ParaAT (parallel alignment and back-translation) with default settings (Zhang et al. 2012). Orthologous genes in the two closely related species were determined based on our previous phylogenetic analyses (Jia et al. 2019).

OBP and CSP Genes in E. balteatus
In total, 49 assembled transcripts encoding putative OBPs and six encoding CSPs were obtained from previously generated E. balteatus antennae transcriptome databases (Wang et al. 2017a). To confirm the validity of these sequences, we first designed specific full-length primers to amplify the ORFs of each gene. As a result, almost all the putative genes, 44 OBPs and six CSPs (Table 1), were successfully amplified and have been deposited in GenBank under the accession numbers MT247210 to MT247259. Though not all OBPs described were cloned from E. balteatus tissues, we should point out that these unidentified transcripts were partial sequences (<300 bp).
Other OBP and CSP physicochemical properties such as molecular mass, isoelectric point, and signal peptide were used to further characterize these identified proteins. Bioinformatic analysis revealed that 38 of the 44 EbalOBPs had intact ORF of 124-260 amino acids long. All identified full-length EbalOBPs had a signal peptide at the N-terminus, ranging from 15 to 27 amino acids. The full-length deduced protein sequences had theoretical molecular masses of 13.85-29.96 kDa and isoelectric points of 4.45-9.1. In the CSP family genes, aside from EbalCSP1, five EbalCSPs contained a complete ORF. EbalCSP5 had the shortest ORF (112 amino acids) with a molecular mass of 12.70 kDa, and EbalCSP4 showed the longest ORF (321 amino acids) with a molecular mass of 35.34 kDa. All the full-length EbalCSPs were predicted to possess signal peptides, varying from 18 to 26 amino acids. These results matched what has been described for other reported insect species (Vieira and Rozas 2011).
The validity of 44 EbalOBPs and six EbalCSPs was verified by cloning and sequencing. Almost all the deduced proteins had the typical characteristics of the insect OBP or CSP families, such as the presence of N-terminal signal peptides and conserved cysteines. Moreover, this species may have larger repertoires of OBPs and CSPs because some studies have proposed that some genes for OBPs and CSPs may be expressed primarily in nonantennae tissues (Sparks et al. 2014, Sun et al. 2017).

Motif Pattern Analysis of OBPs and CSPs
Conserved motifs are frequently used in analyses of insect OBPs and CSPs due to the importance for functional domains (Xu et al. 2009, Gu et al. 2015, He et al. 2017, Zhao et al. 2018. Hence, we carried out a motif-pattern analysis to compare the motif pattern among OBPs and among CSPs in dipteran insects, using 296 OBPs (from nine dipteran species) and 51 CSPs (from 12 dipteran species), respectively (Supp Table 3 [online only]).
The 68 different motif patterns discovered in the 296 dipteran OBPs, and 11 most common motif patterns present in 212 dipteran OBPs (71.62%) are listed in Fig. 2. As described earlier, three classes of OBPs (classic, minus-C, and plus-C) were found; thus, we also analyzed the motif differences between these classes. The results showed that the motif patterns were quite different among these three classes: motif patterns 3-1-8-2 and 3-1-2 were the most common and detected in 34 and 15 OBPs, respectively. For plus-C OBPs, 53% had three different motif patterns, with the most common being 1-7. In addition, we also found some interesting regular patterns: only the first, second, and third motifs were found in most OBPs. The fourth and sixth motifs were predominately contained in minus-C OBPs, while the seventh appeared in only plus-C OBPs. These structural differences might imply a functional difference between these classes. Thus, despite further functional studies are required, we speculate that these structural differences might be a potential reason the OBPs can bind to ligands of diverse sizes and shapes (Hekmat-Scafe et al. 2002, Pelosi et al. 2006Zhou et al. 2004Zhou et al. , 2010Xu et al. 2009).
Consistent with what has been reported for insects in Diptera (Xu et al. 2009, Zhao et al. 2018) and many other orders (Gu et al. 2015, He et al. 2017, the motif patterns found in CSPs were more conserved than those in OBPs among different insect species. In the 51 dipteran CSPs (from nine species), only three motif patterns were Fig. 4. Mean (±SE) transcript levels of OBP and CSP genes in different tissues of E. balteatus as evaluated by qPCR. An: antennae; He: head; Th: thorax; Ab: abdomen; Le: leg. Here, the leg was taken as the calibrator or 1× sample. Means are from three biological replicates; the different letters above the bars in a graph indicate a significant difference in expression among tissues (P < 0.05).
The antennae-enriched chemosensory proteins are thought to account for odorant recognition and perception (Pelosi et al. 2014, whereas genes that are highly expressed in gustatory organs, such as mouthparts, legs, proboscis, may be involved in gustatory behaviors. Indeed, such gustatory functions for OBPs in legs have been clearly demonstrated in some insects, such as Drosophila melanogaster and Adelphocoris lineolatus (Galindo and Smith 2001;Jeong et al. 2013;Sun et al. 2016Sun et al. , 2017. Therefore, the antennae-specific genes obtained here could be potential targets for investigating the molecular mechanisms of chemosensation in adult E. balteatus, while the three legbiased genes might be associated with the detection of nonvolatile substances.

Evolution Analysis of OBP and CSP Orthologs
An investigation of evolution of chemosensory genes in these closely related species may provide clues about the differentiation and host preference of the predatory syrphids. Positive selection between putatively homologous copies was analyzed in two commonly predatory hoverflies, E. balteatus and E. corollae. Because OBPs share low sequence similarity, similar to what has been done in other insects (Campanini et al. 2016, Jiang et al. 2017), we performed independent evolutionary analyses on each of the three OBP subfamilies identified (data not shown).
We analyzed three principal concepts that reflect the selection pressure: nonsynonymous substitution rates (Ka), synonymous substitution rates (Ks), and ω rates (Ka/Ks). The estimated ratios of nonsynonymous to synonymous substitutions are listed in Table 2. In accordance with a similar analysis (Campanini et al. 2016, Jiang et al. 2017, all the Ka/Ks ratios were far less than 1.0, indicating that these genes are under strong purifying selection pressure.

Conclusions
In this work, 44 OBPs and six CSPs from E. balteatus, based on previously reported antennal transcriptomic data, were directly cloned, and sequence alignment, phylogenetic analysis, and conserved motif identification indicated that all identified genes have typical characteristics of the insect OBP or CSP family. More importantly, some antennae-specific genes that may be involved in chemical cue recognition were identified through RT-qPCR. The next step, therefore, is functional analysis of these candidate OBPs/CSPs through electrophysiological studies and gene expression modification studies to reveal their roles in chemoreception and further explore the molecular mechanisms chemoreception in E. balteatus. Such studies should contribute to understanding the tritrophic interactions among plants, the herbivorous insects that feed on the plants, and the natural enemies of the pests.

Supplementary Data
Supplementary data are available at Journal of Insect Science online.