De novo transcriptome assemblies of four xylem sap-feeding insects

Abstract Background: Spittle bugs and sharpshooters are well-known xylem sap-feeding insects and vectors of the phytopathogenic bacterium Xylella fastidiosa (Wells), a causal agent of Pierce's disease of grapevines and other crop diseases. Specialized feeding on nutrient-deficient xylem sap is relatively rare among insect herbivores, and only limited genomic and transcriptomic information has been generated for xylem-sap feeders. To develop a more comprehensive understanding of biochemical adaptations and symbiotic relationships that support survival on a nutritionally austere dietary source, transcriptome assemblies for three sharpshooter species and one spittlebug species were produced. Findings: Trinity-based de novo transcriptome assemblies were generated for all four xylem-sap feeders using raw sequencing data originating from whole-insect preps. Total transcripts for each species ranged from 91 384 for Cuerna arida to 106 998 for Homalodisca liturata with transcript totals for Graphocephala atropunctata and the spittlebug Clastoptera arizonana falling in between. The percentage of transcripts comprising complete open reading frames ranged from 60% for H. liturata to 82% for C. arizonana. Bench-marking universal single-copy orthologs analyses for each dataset indicated quality assemblies and a high degree of completeness for all four species. Conclusions: These four transcriptomes represent a significant expansion of data for insect herbivores that feed exclusively on xylem sap, a nutritionally deficient dietary source relative to other plant tissues and fluids. Comparison of transcriptome data with insect herbivores that utilize other dietary sources may illuminate fundamental differences in the biochemistry of dietary specialization.


Background
Resource partitioning among herbivorous insects spans a continuum between specialists that feed on one or a few plant species to generalists that are able to utilize hundreds of species belonging to multiple plant families. A further element of plant partitioning involves the particular location on a plant or tissue type from which an insect feeds [1]. The diversity of plant feeding strategies has evolved along with specialized anatomical features such as mouthparts and digestive systems, unique enzyme complements for processing plant compounds, and partnerships with symbiotic microbiota that contribute to nutritional gain of the host insect. The transcriptome assemblies presented here include four species that feed exclusively on sap from xylem vessels, a relatively rare form of plant feeding from SAMN04208334 " a source that among plant tissues is the most deficient in nitrogen and carbon content [2]. Three of the transcriptomes represent sharpshooter species that are members of the subfamily Cicadellinae (Cicadellidae) that belong to the superfamily Membracoidea (leafhoppers, sharpshooters, treehoppers). The fourth transcriptome represents a spittlebug (Clastopteridae) that belongs to the superfamily Cercopoidea (spittlebugs, froghoppers). All are members of the hemipteran suborder Auchenorrhyncha [3]. Their piercing-sucking mouthparts tap into xylem vessels from which sap is consumed in copious quantities to compensate for its low nutritional value. Sharpshooters are recognized for their efficient assimilation of limited nutrients in xylem sap [4], but putative biochemical mechanisms that enable specialization on xylem sap are unknown. Also unclear is whether the respective roles in host nutrition played by the dual primary endosymbionts are consistent among xylem feeders [5]. Comparison of transcriptomes of four xylem-feeding insects will provide additional knowledge and insight into the survival of ecological specialists on a nutritionally impoverished dietary source.

Samples
The

Data filtering
The total number of reads, data quantity, and short read archive numbers for each of the four xylem-feeding insects are shown in

Transcriptome assembly
All raw data for each insect transcriptome were run through the following pipeline. Prior to assembly, the three replicate samples were concatenated and read abundance was normalized to 50× coverage using the in silico normalization tool in Trinity v. 2.0.6 [6] to improve assembly time. Each of the datasets was assembled in Trinity using the default parameters, with the addition of the '-jaccard clip' flag to reduce the generation of transcript fusions from non-strand specific data. Open reading frames were predicted using Transdecoder with all run parameters set to default [6]. The transcriptomes were filtered, sorted, and prepared for NCBI transcriptome shotgun assembly (TSA) submission as previously described [7]. To comply with NCBI TSA submission, all transcripts resulting from endosymbionts or bacteria were removed from the final assembly prior to submission.

Annotation
Functional annotation for each of the transcriptomes was performed at the peptide level using a custom pipeline [7] that defines protein products and assigns transcript names. Predicted proteins and peptides were analyzed using InterProScan 5 [8], using the ' -iprlookup' and '-goterms' flags, to search all available databases, including Gene Ontology. Each transcriptome was annotated using BLASTp against the UniProt Swiss Prot database (downloaded 11 February 2015). Annie [9], a program that crossreferences SwissProt BLAST and InterProScan5 results to extract qualified gene names and products, was used to generate the transcript annotation file. The resulting .gff3 and .tbl files were further annotated with functional descriptors in Transvestigator [10].

Transcriptome Quality and Comparisons
Assembled transcriptome metrics showed a high percentage of reads mapping back to each transcriptome ( Table 2) indicating successful assemblies. TransRate [11] scores ranging from 0.16 to 0.42 were used for quality assessment, and bench-marking universal single-copy orthologs (BUSCO) v. 1.1.b1 results using the arthropod gene set (downloaded December 19, 2015) [12] indicated that the four transcriptomes have a moderate to high level of completeness. It should be noted that both the TransRate value (0.16) and BUSCO results for H. liturata suggest this transcriptome may contain more partial transcripts than the other three assemblies. Each of the assembled transcriptomes was used in a reciprocal tBLASTx search to identify similarities between the four species and their transcriptome assemblies. The final, filtered transcriptomes were made into nucleotide BLAST databases using NCBI Blast+ (v 2.2.30) makeblastdb tool, and all tBLASTx searches were performed using an e-value cutoff of 1e −3 . The tBLASTx results (Table 3) indicate similarities between the four xylem-feeder transcriptomes, with the lowest (38%) occurring between members of the two superfamilies (spittlebug and all sharpshooters) and the highest (84%) between H. liturata and C. arida, members of the same subfamily [13].

Availability of supporting data
The filtered and annotated transcriptomes have been deposited in GenBank as a TSA under the accessions and BioProject numbers found in Table 1. Datasets further supporting the results of this article are available in the GigaScience repository, GigaDB [13].
information and does not imply recommendation or endorsement by the USDA. USDA is an equal opportunity provider and employer.