Functional Genomics of a Symbiotic Community: Shared Traits in the Olive Fruit Fly Gut Microbiota

Abstract The olive fruit fly Bactrocera oleae is a major pest of olives worldwide and houses a specialized gut microbiota dominated by the obligate symbiont “Candidatus Erwinia dacicola.” Candidatus Erwinia dacicola is thought to supplement dietary nitrogen to the host, with only indirect evidence for this hypothesis so far. Here, we sought to investigate the contribution of the symbiosis to insect fitness and explore the ecology of the insect gut. For this purpose, we examined the composition of bacterial communities associated with Cretan olive fruit fly populations, and inspected several genomes and one transcriptome assembly. We identified, and reconstructed the genome of, a novel component of the gut microbiota, Tatumella sp. TA1, which is stably associated with Mediterranean olive fruit fly populations. We also reconstructed a number of pathways related to nitrogen assimilation and interactions with the host. The results show that, despite variation in taxa composition of the gut microbial community, core functions related to the symbiosis are maintained. Functional redundancy between different microbial taxa was observed for genes involved in urea hydrolysis. The latter is encoded in the obligate symbiont genome by a conserved urease operon, likely acquired by horizontal gene transfer, based on phylogenetic evidence. A potential underlying mechanism is the action of mobile elements, especially abundant in the Ca. E. dacicola genome. This finding, along with the identification, in the studied genomes, of extracellular surface structure components that may mediate interactions within the gut community, suggest that ongoing and past genetic exchanges between microbes may have shaped the symbiosis.


Introduction
Many insects house gut microbial communities that perform essential functions related to their diet or lifestyle (Dillon and Dillon 2004). Insect species with a dependence on a specialized gut microbial community for their fitness often have specialized alimentary structures to house microbes, and these microbes are also often vertically transmitted between mother and offspring to ensure the inoculation of subsequent generations (Salem et al. 2015). The gut microbiota of wild populations of the olive fruit fly Bactrocera oleae (Tephritidae) is numerically dominated by a single Gammaproteobacterium, "Candidatus Erwinia dacicola" (Capuzzo et al. 2005;Estes et al. 2009;Kounatidis et al. 2009;Estes et al. 2012;Ben-Yosef et al. 2015). The symbiont Ca. E. dacicola resides in the digestive tract throughout the insect's lifecycle, in larvae within the midgut caeca and in adults in a specialized foregut diverticulum structure called the esophageal bulb, and is maternally transmitted between generations by egg smearing (Estes et al. 2009). Candidatus Erwinia dacicola has not been detected outside of the insect by environmental sampling, while efforts to culture it in vitro have so far been unsuccessful (Capuzzo et al. 2005). From the above, the bacterium qualifies as a specific obligate symbiont of B. oleae, but also a challenge to study experimentally. In this context, sequence data-based methods are of high value for the generation of hypotheses on the different roles of Ca. E. dacicola in host fitness.
In common with many holometabolous insects, the adult and juvenile stages of B. oleae feed on different food sources (Tzanakakis 2003), therefore the Ca. E. dacicola symbiosis is hypothesized to perform alternative functions during these developmental stages. In larvae, the symbiosis has been hypothesized to allow the insect to use ripening fruit by detoxifying defensive plant phenolic compounds, such as oleuropein (Soler-Rivas et al. 2000;Ben-Yosef et al. 2015), and sequestering amino acids for protein synthesis (Ben-Yosef et al. 2015). During adulthood, the native gut microbiota is thought to provision nitrogen to the host: Females harboring their native microbiota have a higher reproductive output on diets lacking essential amino acids (EAAs), and diets containing only nitrogen sources that are metabolically intractable to B. oleae, such as urea (Ben-Yosef et al. 2010. It is common for many insects to house microbes that increase the quantity or quality of dietary nitrogen by performing novel metabolic functions (Douglas 2009). For example, the intracellular symbiont Blochmannia within the bacteriocytes of Camponotus ants encodes urease that metabolizes dietary urea to ammonia from which it synthesizes essential and non-EAAs, subsequently transported to the hemolymph for host consumption (Feldhaar et al. 2007). A similar process is proposed to occur in B. oleae adults, which consume bird droppings containing ammonia and urea as part of their omnivorous diet (Ben-Yosef et al. 2014), but do not encode endogenous ureases (B. oleae genome accession number: GCF_001188975.1, Djambazian H, et al. 2018). However, the specific pathways and related enzymes involved in the metabolism and uptake of nitrogenous substrates remain to be elucidated in B. oleae.
Previous experimental approaches aiming to evaluate symbiotic function have tested the native microbial community of the olive fruit fly, which is dominated by, but not restricted to Ca. E. dacicola (Estes et al. 2009;Ben-Yosef et al. 2015;Blow, Vontas, et al. 2016;Blow et al. 2017;Koskinioti et al. 2019). In this study, we took advantage of recently available -omic resources for the symbiont (Blow, Gioti, et al. 2016, Pavlidi et al. 2017, Estes et al. 2018b) and further generated new sequence data from single-culture and metagenomic samples of the B. oleae gut community to: (1) investigate whether Ca. E. dacicola encodes the full gene repertoire required to provide dietary nitrogen to B. oleae and (2) identify other members of the B. oleae gut microbiota potentially capable of performing these functions, in order to investigate the functional significance of the previously observed variation in B. oleae community composition. For this purpose, we employed comparative genomic and phylogenetic analyses, considering the function of the gut community in the context of symbiosis.

Insect Material
Bactrocera oleae adults were obtained by collecting infested olives from trees in the grounds of the University of Crete (Heraklion, Greece) in October and November of 2014. Infested olives were suspended over sterile sand with a wire mesh, and third-instar larvae were allowed to emerge from the fruit. Flies pupated in the sand and were placed into Petri dishes in 10 cm 3 plastic cages prior to emergence. Flies were maintained at 25 C and 60% relative humidity and were supplied with artificial diet (19% hydrolyzed yeast, 75% icing sugar, 6% egg yolk). Each cage was provided with Milli-Q water in a clean plastic container and wax cones for oviposition.
Preparation and Sequencing of 16S rRNA Gene Amplicon Libraries DNA was extracted from a total of 52 whole B. oleae adults using the Qiagen DNeasy Blood and Tissue kit for Grampositive bacteria (Qiagen, UK) following the manufacturer's instructions. An additional bead-beating step using 3 mm carbide beads (Qiagen, UK) in a Qiagen tissue lyzer (Qiagen, UK) at 25 Hz for 30 s was employed. In order to assess the diversity and composition of the bacterial communities, bacterial 16S rRNA V4 regions were amplified by PCR using the universal primers F515 (5 0 -GTGCCAGCMGCCGCGGTAA-3 0 ) and R806 (5 0 -GGACTACHVGGGTWTCTAAT-3 0 ; Caporaso et al. 2011). Samples were dual-indexed for sequencing following the method in D' Amore et al. (2016). PCR reactions were performed in a total volume of 20 ll, containing 5 ng of template DNA, 10 ll NEBNext 2Â High-Fidelity Master Mix (New England Biolabs), 0.3 lM of each primer, and 3.4 ll PCRclean water. Thermal cycling conditions were 98 C for 2 min, 10 cycles of 98 C for 20 s, 60 C for 15 s, and 70 C for 30 s, with a final extension at 72 C for 5 min. PCR products were purified with Agencourt AMPure XP beads (Beckman Coulter Genomics) and used as template for the second PCR reaction. Purified first-round PCR products were combined with 10 ll NEBNext 2Â High-Fidelity Master Mix (New England Biolabs), and 0.3 lM of each barcoding primer containing adapters and indexes to a total volume of 20 ll. Thermal cycling conditions were 98 C for 2 min, 15 cycles of 98 C for 20 s, 55 C for 15 s and 72 C for 40 s, with a final extension at 72 C for 60 s. PCR products were purified with Agencourt AMPure XP beads (Beckman Coulter Genomics) and quantified with the Qubit dsDNA High-Sensitivity assay  (Joshi and Fass 2011) was used to trim reads based on quality: Any reads with a window quality score of <20, or were <10 bp long after trimming, were discarded. BayesHammer was used to correct reads based on quality (Nikolenko et al. 2013). Paired-end reads were merged with a minimum overlap of 50 bp using PandaSeq (Masella et al. 2012). All subsequent analyses were conducted in QIIME version 1.8.0 (Caporaso, Kuczynski, et al. 2010). Sequences were clustered into Operational Taxonomic Units (OTUs) by de novo OTU picking with USEARCH (Edgar 2010). Chimeras were detected and omitted with UCHIME (Edgar et al. 2011) and the QIIME-compatible version of the SILVA 111 release database (Quast et al. 2012). The most abundant sequence was chosen as the representative for each OTU, and taxonomy was assigned to representative sequences by BLAST (Altschul et al. 1990) against the SILVA 111 release database, which was supplemented with several reference sequences for Ca. E. dacicola. OTUs were filtered from the data set if they matched the SILVA 111 database for chloroplasts or mitochondria. OTU representative sequences were aligned against the Greengenes core reference alignment (DeSantis et al. 2006) using PyNAST (Caporaso, Bittinger, et al. 2010). Previously published 16S rRNA gene amplicons from flies collected in Israel (Ben-Yosef et al. 2015) were employed for comparison with the above data. Since data generation methods for the samples from Israel varied slightly, all data analysis methods were standardized from read-error correction onwards in this study. Tatumella sp. TA1 prevalence was calculated as the proportion of individuals where Tatumella sp. TA1 16S rRNA was >0.01% of the total 16S rRNA gene copies from that sample, and relative abundance was the proportion of the total 16S rRNA gene copies.
Isolation, Culture and Identification of Tatumella sp. TA1 from B. oleae Tatumella sp. TA1 was identified from a two-day old male fly following the below isolation and culture procedures, all conducted under sterile conditions using aseptic technique or in a laminar flow cabinet. Individual two-day old adult B. oleae collected in Crete were surface-sterilized with 70% ethanol and rinsed twice in distilled water prior to homogenization with a plastic pestle in 20 ll nuclease-free water. 10 ll of the homogenate was spotted on to Columbia agar (Oxoid, UK) supplemented with 5% defibrinated horse blood (TCS Biosciences, UK), and plates were incubated at 25 C for 48 h. Single colonies were picked and streaked onto Brain Heart Infusion (BHI) agar (Oxoid, UK) to establish pure cultures. Pure liquid cultures were generated by inoculating single colonies from BHI agar plates in to BHI liquid medium and incubating them at 25 C for 48 h. Liquid cultures were cryopreserved following the addition of a cryoprotectant (20% [v/v] glycerol final concentration) and storage at -80 C. In order to identify isolates, single colonies from BHI agar plates were picked and inoculated into 10 ll nuclease-free water in a PCR tube. Tubes were incubated at 95 C for 5 min to lyse cells and isolate DNA. A 1500 bp region of the 16S rRNA gene was amplified with universal primers 8F (5 0 -AGAGTTTGATCMTGGCTCAG-3 0 ) and 1492R (5 0 -CCCCTACGGTTACCTTGTTACGAC-3 0 ). Reactions were performed in a total volume of 25 ll containing 12.5 ll 2Â MyTaq Red (Bioline, UK), 0.5 ll of each 10 lM primer stock, 1 ll template DNA, and 10.5 ll nuclease-free water. Thermal cycling conditions were 95 C for 5 min, 30 cycles of 95 C for 30 s, 56 C for 45 s, 72 C for 90 s, and a final extension at 72 C for 7 min. PCR products were Sanger sequenced with forward primer 8F by GATC (GATC, Cologne) and resulting sequences were subjected to BLAST analysis (Altschul et al. 1990) against the GenBank database (http://www.ncbi.nlm. nih.gov/; last accessed April 25, 2016) to allow taxonomic identification by similarity.
Library Preparation and Sequencing of the Tatumella sp. TA1 Genome DNA from Tatumella sp. TA1 was extracted as follows: Cryopreserved isolates were revived by streaking on to BHI agar and incubated at 25 C for 72 h. Single colonies were inoculated into BHI broth and incubated at 25 C until cultures reached an OD600 of 0.3. Cultures were pelleted by centrifugation at 6,000 Â g for 6 min. The supernatant was removed, and cells were resuspended in DNA elution buffer at a concentration of 1 Â 10 5 Colony Forming Units (CFUs) ml À1 . DNA was extracted using the Zymo Quick DNA Universal Kit (Zymo, UK) following the manufacturer's instructions for biological fluids and cells with the following amendments to the protocol: Samples were incubated with proteinase K at 55 C for 30 min rather than 10 min. DNA was purified with Ampure beads (Agencourt) at a 1:1 ratio, and stored at 4 C until library preparation. DNA was sheared to 10 kb using Covaris G-tubes following the manufacturer's guidelines, and library preparation was performed with the SMRTbell library preparation kit (Pacific Biosciences) following the manufacturer's instructions. The Qubit dsDNA HS assay (Life Technologies, UK) was used to quantify the library, and the average fragment size was determined using the Agilent Bioanalyzer HS assay (Agilent). Size selection was performed with the Blue Pippin Prep (Sage Science) using a 0.75% agarose cassette and the S1 marker. The final SMRT bell was purified with Agencourt AMPure XP beads (Beckman Coulter Genomics) and quantified with the Qubit dsDNA High-Sensitivity assay (Life Technologies), and an Agilent Bioanalyzer High-Sensitivity DNA chip (Agilent). The SMRTbell library was annealed to sequencing primers at values predetermined by the Binding Calculator (Pacific Biosciences), and sequencing was performed on two SMRT cells using 360-min movie times. Pacific Biosciences sequencing and library preparation of the Tatumella sp. TA1 isolate was performed at the University of Liverpool Centre for Genomic Research on a Pacific Biosciences RS II sequencer.

Genome Assembly and Annotation
The Tatumella sp. TA1 draft genome was assembled from 365,445 PacBio subreads with a mean length of 6,926 bp using the Hierarchical Genome Assembly Process (HGAP) workflow (Chin et al. 2013). It is available at NCBI under the accession numbers CP033727-CP033728. The HGAP pipeline comprised pre-assembly error correction of subreads based on read length and quality, assembly with Celera, and assembly polishing with Quiver.
The assembled genome and plasmid were annotated with PROKKA version 1.5.2 (Seemann 2014). To further investigate the extracellular membrane structures and mobile elements encoded in the Ca. E. dacicola, Tatumella sp. TA1 and Enterobacter sp. OLF assemblies, genomes were reannotated with RAST (Overbeek et al. 2014). These annotations are available in supplementary table S1, Supplementary Material online (A-E).
To determine whether the pTA1 plasmid from Tatumella sp. TA1 was present in Ca. E. dacicola, we also assembled the plasmid from reads used to generate the Ca. E. dacicola Oroville assembly, which did not contain Tatumella sp. TA1 chromosomal DNA (SRA accession number SRP155530, Estes et al. 2018b). To assemble the pTA1 plasmid, reads were trimmed with Trimmomatic version 0.39 (Bolger et al. 2014) and mapped to the Tatumella sp. TA1 chromosomal and plasmid PacBio assemblies using Bowtie2 (Langmead et al. 2009). Read mapping coverage was estimated using the -depth function in Samtools (Li et al. 2009) using a mapping quality cutoff of 30.

Phylogenetic Analyses
For the species tree, allowing phylogenetic placement of Tatumella sp. TA1 and Ca. E. dacicola, taxa were selected based on (Palmer et al. 2017), which presents a robust phylogeny of the Erwinia, Pantoea, and Tatumella genera of Erwiniaceae using genome-wide orthologs. Where Palmer et al. (2017) included several conspecific taxa, we chose the most biologically relevant of the identical taxa. The Western Flower Thrip (WFT)-associated bacteria BFo1 and BFo2 were included in order to validate phylogenetic similarities with Ca. E. dacicola and Tatumella TA1 based on 16S rRNA gene sequences. We also included several symbiotic taxa identified in Chen et al. (2017) as associated with Orius, which we chose based on their phylogenetic placement: OLMDLW33, phylogenetically close to BFo1, OPLPL6, close to BFo2, as well as OLMTSP26 and OLMTSP33, two representatives of the Erwiniaceae-like clade (preliminary phylogenetic analyses showed that most sequences were identical among the seven taxa). A full list of the 50 genomes employed for this analysis can be found in supplementary table S2A, Supplementary Material online, including the seven outgroup taxa and the three available Ca. E. dacicola draft genome assemblies (ErwSC, IL, and Oroville). Genomes were annotated with PROKKA version 1.5.2 using the default settings, and 287 single-copy orthologs present in all taxa were identified using OrthoMCL version 1.4 with default parameters (Li et al. 2003). The "reference" tree for the urease phylogenetic analysis was estimated from eight genes randomly selected from the set of 287 single-copy orthologs, with a total of 5,461 informative sites; we assumed neutral evolution status for these genes based on their predicted housekeeping functions (supplementary table S2A, Supplementary Material online). For the urease subunit alpha (ureC) gene tree, homologs were retrieved based on BlastP queries against the nr database at NCBI (restricted to taxid ¼ Bacteria). We included as many taxa common to the reference and species tree as possible, to ensure meaningful comparisons. Additional homologs were retrieved by targeted searches in RefSeq (full list of species names and accession numbers in supplementary table S2B, Supplementary Material online). For all trees, amino acid sequences were aligned with MUSCLE version 3.8.31 (Edgar 2004), and informative sites were selected with Gblocks version 091b (Castresana 2000). Maximum-Likelihood trees were estimated using IQ-TREE version 1.6.0 after automated model selection (Nguyen et al. 2015), with 300 random trees as burn-in. Node support was calculated using 1,000 ultrafast bootstraps (Minh et al. 2013). The multi-locus coalescentbased species and reference trees were estimated from individual ortholog trees using Astral III, with node support calculated using 1,000 bootstraps (Zhang et al. 2018

A Novel Member of the Gut Microbiota Associated with Mediterranean Populations of B. oleae
To explore the ecology of the B. oleae gut, we examined its bacterial community composition by 16S rRNA gene amplicon sequencing of samples from olive fruit fly populations collected in Crete. This approach, combined with culturing of selected isolates, allowed identifying and isolating into axenic culture a culture-viable member of the gut microbiota that has not been characterized before, referred to here as Tatumella sp. TA1. Reanalysis of 16S rRNA gene data from a previous study demonstrated that Tatumella sp. TA1 is also present in wild populations from Israel (Ben-Yosef et al. 2015; fig. 1). We detected Tatumella sp. TA1 in 88.9-90% of individuals from populations in Israel, in both adults and larvae, and in 26.9% of individuals from populations in Crete ( fig. 1). Notably, Tatumella sp. TA1 was not detected in any of the culture-independent analyses of the gut microbiota of US olive fruit flies from two distinct geographic regions, though these data were not suitable for reanalysis in this study as they were singly-cloned 16S rRNA gene amplicons (Estes et al. 2009(Estes et al. , 2012, and so could not be compared with the community-wide 16S rRNA gene amplicon studies. Similarly, Enterobacter sp. OLF, a member of the gut community identified in US populations of B. oleae (Estes et al. 2009), was not detected in either of the Mediterranean populations studied here. Collectively, these results suggest that Tatumella sp. TA1 and Enterobacter sp. OLF are facultative members of the gut community and geographically restricted to US and Mediterranean populations of the olive fruit fly, respectively.
In addition to Tatumella sp. TA1, 16S rRNA gene diversity analyses identified three other bacterial genera (all members of the family Enterobacteriaceae) as frequently associated with B. oleae and sometimes present at high relative abundances: Pectobacterium, Klebsiella, and a Sodalis-allied bacterium (Snyder et al. 2011; fig. 1). For example, 97% of the 16S rRNA gene sequences from one adult fly from Crete originated from a Sodalis-allied bacterium, with Ca. E. dacicola 16S rRNA comprising just 1.3% of the sequenced 16S rRNA genes in this individual ( fig. 1). It is not uncommon for some individuals in the population to house bacterial communities dominated by taxa other than Ca. E. dacicola (Estes et al. 2012;Ben-Yosef et al. 2014;Koskinioti et al. 2019), or that Ca. E. dacicola may even be absent from some individuals (Kounatidis et al. 2009;Estes et al. 2012Estes et al. , 2014. Of the other taxa that colonize B. oleae, Tatumella sp. TA1 was the most frequent and the most abundant bacterial taxon after Ca. E. dacicola in B. oleae larvae ( fig. 1), indicating that Tatumella sp. TA1 may be vertically transmitted from mother to offspring, or readily acquired from the environment by B. oleae larvae. We therefore chose to perform a genomic comparison of Tatumella sp. TA1 and Ca. E. dacicola to determine whether functional redundancy with the obligate symbiont Ca. E. dacicola may enable Tatumella sp. TA1 to colonize and exploit the B. oleae gut environment.
We used PacBio RS II sequencing to generate a draft genome sequence for Tatumella sp. TA1. The assembly comprised a chromosomal sequence of 3,389,139 bp and a circular plasmid sequence of 49,211 bp, named pTA1 (total genome size 3.4 Mb). Read coverage of both the chromosome and pTA1 was $800Â and both the chromosome and plasmid sequences have been circularized. The full Tatumella sp. TA1 genome encodes a total of 3,309 protein coding genes, 72 tRNAs, and 22 rRNAs. The pTA1 plasmid had a total of 69 annotated genes (supplementary table S1D, Supplementary Material online), including Tra and Trb operons for conjugative plasmid transfer, and ccdAB and hicAB and toxin-antitoxin cassettes, which enable plasmid transfer and persistence in other systems, respectively (Wilkins and Lanka 1993;Syed and L evesque 2012). The GC content of the Tatumella sp. TA1 genome is 48.6%, and both this and the genome size are comparable to those of previously published genomes of Tatumella species, many of which are hostassociated (Hollis et al. 1981;Mar ın-Cevada et al. 2010;Chandler et al. 2014). The genome assembly is predicted to be 100% complete following the method used in Rinke et al. (2013), which detected 138/138 single-copy marker genes.

Phylogenetic Placement of Obligate and Facultative Members of the B. oleae Gut Microbiota
To resolve the phylogenetic relationships among the newly identified taxa in our and recent studies (Facey et al. 2015;Chen et al. 2017), we performed a phylogenomic analysis of 287 single-copy orthologs from a total of 50 bacterial taxa, including 41 taxa from the genera Erwinia, Pantoea, and Tatumella, along with two members of the Erwiniaceae, and seven outgroup taxa (full list in supplementary table S2A, Supplementary Material online). In agreement with the most recent comprehensive phylogeny of the Erwiniaceae (Palmer et al. 2017), the phylogenomic analysis supports the phylogenetic placement of Tatumella sp. TA1 within the Tatumella genus ( fig. 2). It further confirms that Tatumella sp. TA1 and Enterobacter sp. OLF are distinct members of the order Enterobacterales, indicating that they are unique components of the B. oleae gut microbiota. In addition, the phylogeny showed that, despite an evolutionary association with B. oleae and vertical transmission, which are expected to change genome characteristics through altered selection pressures and drift (McCutcheon and Moran 2011), Ca. E. dacicola clusters within the Erwinia genus ( fig. 2).
The closest phylogenetic relatives of Tatumella sp. TA1 and Ca. E. dacicola are the thrip symbionts BFo2 and BFo1, respectively (Facey et al. 2015). The clustering of these taxa may represent comparable lifestyles. In a result analogous to diet experiments in olive flies (Ben-Yosef et al. 2010, the fitness benefits of the WFT gut-lumen symbiont BFo1 were shown to be condition-dependent and only apparent on diets lacking a balanced source of amino acids, when gut bacteria are presumed to synthesize and provision adult WFT with amino acids required for protein synthesis (de Vries et al. 2004). Besides, both BFo1 and BFo2 also inhabit the gut lumen and are vertically transmitted between generations by an extracellular route (de Vries et al. 2001), as has been demonstrated previously for Ca. E. dacicola (Sacchetti et al. 2008;Estes et al. 2009). Interestingly, members of the genus Orius, which are omnivorous Anthrocorids and predators of the WFT in the Mediterranean (Bosco et al. 2008;Mouden et al. 2017), also harbor BFo1 and BFo2-like culturable symbionts named OLMDLW33 and OPLPL6, respectively (Chen et al. 2017); the current phylogenomic analysis demonstrates that these taxa most probably belong to the Erwinia and Tatumella genus, respectively. The co-occurrence of Erwinia and Tatumella-allied taxa in these three phylogenetically distinct insect species (olive fruit fly, WFT, Orius), merits further investigation to elucidate whether specific traits, of both the host insect and its bacterial partners, may predispose them to co-associations. We investigated the hypothesis that members of the B. oleae gut microbiota supplement the olive fruit fly diet with nitrogen by focusing on both obligate and facultative bacterial members. The annotated genomes of the obligate Ca. E. dacicola (Blow, Gioti, et al. 2016) and facultative Tatumella sp. TA1 (this study) and Enterobacter sp. OLF (Estes et al. 2018a) all encode enzymes that metabolize urea to ammonia and carbon dioxide ( fig. 3  In addition to urea hydrolysis, Ca. E. dacicola, Tatumella sp. TA1 and Enterobacter sp. OLF all encode biosynthetic operons for nonessential and EAAs, with further evidence for expression during larval feeding in Ca. E. dacicola (supplementary  table S3B, Supplementary Material online, transcript sequences available in supplementary table S1F, Supplementary Material online). Glutamine synthetase (EC 6.3.1.2) is also present in all three taxa and expressed in the Ca. E. dacicola transcriptome data. This enzyme, also encoded by the host genome (RefSeq accession GCF_001188975.1), incorporates nitrogen from ammonia into the non-EAA glutamine ( fig. 3), which can be channeled into downstream metabolic processes including EAA biosynthesis (Feldhaar et al. 2007;Sabree et al. 2009).
We also investigated the hypothesis that host nitrogenous waste, which would otherwise be excreted, is the source of nitrogen assimilated to non-EAAs by the gut microbiota. Urea could come from recycling of host waste uric acid (Ben-Yosef et al. 2014), as has been observed in other insect symbionts that upgrade the nitrogen content of the host diet (Potrikus and Breznak 1981;Sasaki et al. 1996;Kashima et al. 2006;Feldhaar et al. 2007;Sabree et al. 2009). We found no genomic evidence that urea can be produced via uricolysis in either the host or any member of the so-far identified and sequenced microbiota: BlastP searches against shotgun sequencing data of olive fruit fly gut samples and corresponding metagenomic assemblies of the gut microbial community (NCBI BioProject accession PRJNA326914) failed to identify the required enzyme allantoicase (EC 3.5.3.4;,supplementary fig. S1, Supplementary Material online, supplementary table S3A, Supplementary Material online). However, B. oleae does encode an arginase (EC 3.5.3.1), which can hydrolyze the EAA arginine to urea and ornithine and could provide urease-degrading bacteria with a supply of urea ( fig. 3). One hypothesis would be that B. oleae might enrich its gut microbiota with ureolytic bacteria such as Ca. E. dacicola through inducible delivery of urea produced by arginase activity. Signal peptides were not detected when individual urease proteins or the full operon from Ca. E. dacicola were analyzed with SignalP version 4.0 (Petersen et al. 2011), indicating that, if expressed, the urease encoded by Ca. E. dacicola is intracellular. This suggests that specific uptake of urea and excretion of its hydrolysis products may either be coordinated with the host, or that microbial cells are lysed after the hydrolysis of urea, in order for the host to gain the observed nutritional benefit when urea is added to the diet (Ben-Yosef  The Ca. E. dacicola urease enzyme is encoded, as in all ureolytic bacteria, by a cluster of eight adjacent genes arranged in an operon and denoted as ureDABCEJFG, with ureABC encoding structural proteins of the enzymatic complex, ureDEFG accessory proteins, and ureJ the transcriptional regulator. The urease operon appears incomplete at its 3 0 end in the ErwSC assembly (Blow, Gioti, et al. 2016), missing ureG. However, ureG is present in the Ca. E. dacicola genome, as confirmed by BlastP queries against the initial set of metagenomic reads obtained from shotgun sequencing (NCBI BioProject accession PRJNA326914). The urease operon is also present in two additional Ca. E. dacicola assemblies (IL and Oroville) reconstructed with different methods and from different olive fly populations (supplementary table S2A, Supplementary Material online), and is complete (5,962 bp total length) in these assemblies, including ureG. The three "versions" of the operon from different assemblies are 99% similar at the nucleotide level for the genes present. One explanation for the absence of ureG in the ErwSC assembly is overtrimming, aiming to minimize the risk of including nonCa. E. dacicola scaffolds, since metagenomic sequencing data included DNA from several species, such as Tatumella sp. TA1. The differences between the reduced ErwSC and the more complete Oroville assembly are outlined in detail in Estes, Hearn, Agrawal, et al. (2018). In any case, the presence and similarity of the operon in all three Ca. E. dacicola assemblies despite their differences and despite the presence of different facultative taxa in the gut microbiome of B. oleae from distinct geographic populations, indicate that the urease operon is highly conserved in Ca. E. dacicola.
In contrast to, for example, the Tatumella sp. TA1 urea carboxylase and allophanate hydrolase that have orthologs in other Tatumella species, Ca. E. dacicola is the only species of the genus Erwinia that encodes urease genes, to the best of our knowledge. Among the recently identified Erwiniaceae symbiotic taxa isolated from Orius insects (Chen et al. 2017), seven encode urease genes. However, a phylogenomic analysis including orthologs from two representatives of these nearly identical taxa (OLMTSP33 and OLMTSP33) showed that they do not belong to the genus Erwinia, but represent an unknown Erwiniaceae genus ( fig. 2). Moreover, the most closely related taxon to Ca. E. dacicola (OLMDLW33, fig. 2) does not encode urease (but most probably has the ability to hydrolyze urea through the urea carboxylase and allophanate hydrolase that it encodes). Examination of the phylogeny of ureC, the longest structural gene of the operon, provided evidence for the acquisition of the urease operon in Ca. E. dacicola by horizontal gene transfer (HGT): The Ca. E. dacicola ureC history ( fig. 4A), is different from that of the species history, as shown by the comparison to a "reference" tree, reconstructed from eight neutral single-copy orthologs ( fig. 4B). The reference tree accurately depicts the known phylogenetic relationships between Ca. E. dacicola and other Proteobacteria, in agreement with the recently published comprehensive phylogeny (Palmer et al. 2017) and the species tree presented in this study ( fig. 2). In contrast, the Ca. E. dacicola ureC protein groups together with proteins from more distant Gammaproteobacteria of the Enterobacterales order (G. quercinecans, B. goodwinii), and a distantly related Betaproteobacterium (Lampropedia cohaerens), and not with the Erwiniaceae taxa OLMTSP33 and OLMTSP26 from Orius, which form a well-supported outgroup to all of the above ( fig. 4A). This latter observation indicates potential convergent evolution within the Erwiniaceae in symbionts of two different insect systems. Hacker and Kaper (2000) defined genomic islands (GEIs) in bacteria as small (<10 kbp), syntenic blocks of genes acquired by HGT. It has been observed that GEIs share some common characteristics, such as the encoding of genes offering a selective advantage to the host, flanking by mobile genetic elements (MGEs) such as transposases, insertion close to tRNA genes, and a GC content different from that of the rest of the genome (Juhas et al. 2009). The ureolytic capacity conferred to Ca. E. dacicola by the urease operon is potentially an example of acquired selective advantage, since it increases the availability of nutrients to the symbiont itself or to the host, upon which it is dependent. Thus, it is tempting to argue that the urease operon represents a GEI in Ca. E. dacicola. Similarly to other GEIs, it has a distinct (lower) GC content compared with the rest of the genome (mean GC urease ¼ 50.2, mean GC genome ¼ 53.5), and this difference is statistically significant (P Mann-Whitney < 0.00001, P Welch < 0.00001). A statistically significant lower GC content was also observed when only the variable third codon position of all urease genes of the operon versus all predicted CDS of Ca. E. dacicola were compared (mean GC3 urease ¼ 45.7, mean GC3 all-CDS ¼ 59.7, P Mann-Whitney < 0.001, P Welch < 0.0001). In contrast, the first and second codon positions were not significantly different between urease genes and the rest of CDS of Ca. E. dacicola, indicating that the differences in GC content are driven by the third codon position. Moreover, the distribution of both GC and GC3 values sampled from the urease operon is significantly different from the whole-genome distribution (P Kolmogorov < 0.0001 in both tests). Unfortunately, proximity of the operon to MGEs and tRNAs (another "signature" of GEIs) could not be confirmed in Ca. E. dacicola, since its genomic architecture remains unresolved: The operon appears in a separate scaffold in all three available assemblies (e.g., IL assembly accession number: LJAM02000124.1), none of which has been further extended and ordered with the use of long insert-size libraries. MGEs are frequently associated with fragmented assemblies, and in the ErwSC assembly, we detected a transposase fragment upstream of the first gene of the operon, ureD, in the scaffold containing the urease. However, mate-pair library sequencing data from Cretan olive fruit fly guts (accession number: SRX1896451) did not support the presence of the transposase upstream of ureD and did not allow identification of the operon's flanking scaffolds.
The present data do not allow us to identify the date of the acquisition of the urease operon, nor its taxonomic origin; a further complication is that HGT events commonly occur recurrently among bacterial lineages. The observation that the GC content of the urease operon is distinct from that of its host may argue in favor of a relatively recent acquisition, as previously proposed for a different Ca. E. dacicola GEI (Estes, Hearn, Agrawal, et al. 2018), but one cannot conclude without dating the symbiosis. Similarly, regarding the donor, the low bootstrap support for the clades that link the groups comprising Ca. E. dacicola, L. cohaerens, and Brenneria to the rest of the clades indicates that the original donor of the urease cluster might be extinct or not yet sequenced. One possibility is that it is a free-living bacterium encountered by Ca. E. dacicola before its transition to obligacy with B. oleae or a yet-uncharacterized gut bacterium. In line with the latter, there is abundant evidence that members of gut microbial communities in vertebrates and invertebrates undergo HGT, and that this can subsequently influence functional traits (Petridis et al. 2006;Smillie et al. 2011;Stecher et al. 2012;Baker et al. 2018).

Exchanges of Genetic Material between Members of the B. oleae Gut Microbiota
HGT events are commonly driven by MGEs, we thus searched the annotated symbiont genomes for such elements. MGEs, including transposases, repetitive DNA regions and phages, are abundant in all three Ca. E. dacicola assemblies, representing an average 34% of the obligate symbiont's genome, in contrast to the distinctively lower MGE content of the Tatumella sp. TA1 and Enterobacter sp. OLF genomes (table 1). However, these percentages should not to be taken at absolute value due to the inherent difficulties in computational annotation of MGEs, especially from metagenomic samples (Jørgensen et al. 2014), while they might represent an overestimate due to the draft-and often fragmentednature of the assemblies. Still, this result is in line with observations of high transposase content in bacteria that recently adopted a symbiotic lifestyle (Gil et al. 2008;Ran et al. 2010), and expression of MGEs in B. oleae larvae feeding on green and black olives (Pavlidi et al. 2017). These findings overall support the idea that genetic exchange between Ca. E. dacicola and members of the B. oleae gut microbial community may occur.
An example of genetic material exchange within the gut symbiotic community possibly concerns plasmid pTA1: The plasmid, reconstructed as part of the genome of Tatumella sp. TA1, isolated in the present study from Mediterranean populations of B. oleae (GenBank accession number: CP033728), was also detected in the Ca. E. dacicola Oroville assembly, which was generated from B. oleae populations from the US, where Tatumella sp. TA1 has not yet been detected (Estes et al. 2012). We were able to successfully assemble the plasmid sequence from raw reads used in the Ca. E. dacicola Oroville assembly, and alignment of pTA1 assembled from Tatumella sp. TA1 and Ca. E. dacicola Oroville raw reads indicates that the plasmid is shared by these taxonomically distinct and geographically isolated bacterial taxa (supplementary fig. S2, Supplementary Material online). To confirm that the Tatumella sp. TA1 plasmid and not the chromosome was present in the community of bacterial cells sequenced to generate the Ca. E. dacicola Oroville assembly, we adopted two approaches: Firstly, we used BlastN to search the contigs from the Oroville assembly to identify 16S rRNA gene sequences from taxa other than Ca. E. dacicola, and in line with previous analyses by the authors of the study (Estes et al. 2018b) did not find any. Secondly, we mapped the reads used to generate the Ca. E. dacicola Oroville assembly (SRA accession SRP155530) to the Tatumella  There were some regions of the Tatumella sp. TA1 chromosome with high coverage, which are presumably either highly conserved between the Ca. E. dacicola and Tatumella sp. TA1 chromosomes, or duplicated between the chromosome and the plasmid. For example, the $50 kb region (coordinates 700,000-750,000 bp) of the Tatumella sp. TA1 chromosome that had the highest mapping coverage of the Oroville reads (6,382Â) encodes the Trb operon that is also encoded in the pTA1 plasmid (supplementary fig. S3B, Supplementary Material online). On the basis of these analyses, pTA1 may have been horizontally transferred between Tatumella sp. TA1 and Ca. E. dacicola at some point in their evolutionary history, potentially via another co-occurring member of the gut or environmental microbiota.
Exchanges of genetic material between bacteria are facilitated by extracellular surface structures (Thomas and Nielsen 2005), components of which are encoded in the genomes of the B. oleae gut microbiota: Components of the IncF Tra transfer system of conjugative plasmids, encoding F-like pili, are present in the two most complete Ca. E. dacicola assemblies, IL and Oroville (supplementary table S4B and C, Supplementary Material online). In addition, plasmid pTA1 encodes a Trb operon and the four essential components for conjugative transfer: Origin of transfer site (oriT), relaxase, type IV coupling protein, and a type IV secretion system (T4SS), in this case a Tra operon (Llosa et al. 2002;De La Cruz et al. 2010), potentiating the exchange of genetic material. The presence of these genes indicates that pTA1 is selftransmissible, in line with its detection in the Ca. E. dacicola Oroville assembly.
Some extracellular surface structures that mediate the exchange of genetic material between bacteria, such as T4SS, also facilitate adhesion and molecular exchange with eukaryotic cells (Alvarez-Martinez and Christie 2009). T4SS are diverse; VirB-like T4SS encode short and rigid pili, whereas the F-pili T4SS for conjugative transfer tend to be long and flexible (Clarke et al. 2008 (Pavlidi et al. 2017) and for the presence of pili-like structures in the esophageal bulb (Poinar et al. 1975).
Depending on their structure, pili can deliver a range of effector molecules including proteins and nucleic acids to both prokaryotic and eukaryotic cells. T4SS are involved in the transition between motility and sessility, for example in the process of biofilm formation (Christie and Vogel 2000), and can also mediate the delivery of effector proteins directly to the cytoplasm of eukaryotic cells during the process of infection in pathogenic bacteria (Cascales and Christie 2003), and root nodule colonization in symbiotic Rhizobia (Deakin and Broughton 2009); these effector proteins can induce regulatory changes in the eukaryotic cell, triggering modification of cellular conditions or conditions in the environment to facilitate growth or invasion (Burns 2003). Therefore, the T4SS system pili in Ca. E. dacicola may also play a-yet unknown-role in the B. oleae-Ca. E. dacicola symbiosis, for example by the establishment of Ca. E. dacicola biofilms (Estes et al. 2009). One potential function to explore in future studies might be the coordination of gut lumen colonization and subsequent resource exchange between the host and symbiont.

Conclusions
In this study, we aimed to gain understanding of the advantages of symbiosis for the olive fruit fly B. oleae by integrating various sources of genetic information on its microbial gut community. This led to the identification, by 16S rRNA gene sequencing and culture-dependent methods, of a novel facultative member of the gut microbiota, Tatumella sp. TA1, which associates with Mediterranean populations of B. oleae throughout the lifecycle. Tatumella sp. TA1 is phylogenetically distinct from the US-restricted facultative symbiont Enterobacter sp. OLF, highlighting the population-dependent nature of gut microbiota composition in the olive fruit fly. Comparative genomics indicated that the obligate symbiont Ca. E. dacicola, as well as Tatumella sp. TA1 and Enterobacter sp. OLF, which are all stable components of the B. oleae microbiota throughout the olive fruit fly lifecycle, encode genes that allow the use of urea as a nitrogen source. The hydrolysis of urea to ammonia is encoded by genes with distinct phylogenetic origins in each organism: HGT of a urease operon in Ca. E. dacicola, an endogenous urease operon in Enterobacter sp. OLF, and an endogenous alternative enzymatic machinery (urea carboxylase and allophanate hydrolase) in Tatumella sp. TA1. These findings provide a potential mechanistic basis for previous experimental evidence of gut microbiota-mediated dietary nitrogen provisioning during adulthood, but emphasize the need for experimental validation of metabolic cross-feeding between the host and obligate symbiont. A hypothesis in this direction, warranting focus in future studies, is that a VirB-like T4SS encoded by Ca. E. dacicola, but missing from the genomes of Enterobacter sp. OLF and Tatumella sp. TA1, may facilitate specific interactions with the host at the symbiotic interface. Besides T4SS, our study further highlighted extracellular surface structures, encoded in the genomes of the obligate and the facultative symbionts, as mediators of DNA transfer: Detection of a plasmid shared by geographically distinct Ca. E. dacicola and Tatumella sp. TA1, along with the horizontal acquisition of genes important to symbiosis function (urease in this study, genes related to amino-acid degradation in Estes, Hearn, Agrawal, et al. 2018) and the abundance of MGEs in the obligate symbiont genome, indicate that previous and on-going genetic exchanges between gut community members are important determinants of symbiotic interactions with B. oleae.

Supplementary Material
Supplementary data are available at Genome Biology and Evolution online.