Toxins from scratch? Diverse, multimodal gene origins in the predatory robber fly Dasypogon diadema indicate a dynamic venom evolution in dipteran insects

Abstract Background Venoms and the toxins they contain represent molecular adaptations that have evolved on numerous occasions throughout the animal kingdom. However, the processes that shape venom protein evolution are poorly understood because of the scarcity of whole-genome data available for comparative analyses of venomous species. Results We performed a broad comparative toxicogenomic analysis to gain insight into the genomic mechanisms of venom evolution in robber flies (Asilidae). We first sequenced a high-quality draft genome of the hymenopteran hunting robber fly Dasypogon diadema, analysed its venom by a combined proteotranscriptomic approach, and compared our results with recently described robber fly venoms to assess the general composition and major components of asilid venom. We then applied a comparative genomics approach, based on 1 additional asilid genome, 10 high-quality dipteran genomes, and 2 lepidopteran outgroup genomes, to reveal the evolutionary mechanisms and origins of identified venom proteins in robber flies. Conclusions While homologues were identified for 15 of 30 predominant venom protein in the non-asilid genomes, the remaining 15 highly expressed venom proteins appear to be unique to robber flies. Our results reveal that the venom of D. diadema likely evolves in a multimodal fashion comprising (i) neofunctionalization after gene duplication, (ii) expression-dependent co-option of proteins, and (iii) asilid lineage-specific orphan genes with enigmatic origin. The role of such orphan genes is currently being disputed in evolutionary genomics but has not been discussed in the context of toxin evolution. Our results display an unexpected dynamic venom evolution in asilid insects, which contrasts the findings of the only other insect toxicogenomic evolutionary analysis, in parasitoid wasps (Hymenoptera), where toxin evolution is dominated by single gene co-option. These findings underpin the significance of further genomic studies to cover more neglected lineages of venomous taxa and to understand the importance of orphan genes as possible drivers for venom evolution.

Abstract: Venoms and the toxins they contain represent molecular adaptations that have evolved on numerous occasions throughout the animal kingdom. However, the processes that shape venom protein evolution are poorly understood because of the scarcity of whole genome data available for comparative analyses of venomous species.
Here, we perform a broad comparative toxicogenomic analysis to gain insight into the genomic mechanisms of venom evolution in robber flies (Asilidae). We first sequenced a high-quality draft genome of the hymenopteran hunting robber fly Dasypogon diadema, analyzed its venom by a combined proteotranscriptomic approach, and compared our results to recently described robber fly venoms to assess the general composition and major components of asilid venom. We then applied a comparative genomics approach, based on one additional asilid genome, ten high-quality dipteran genomes, and two lepidopteran outgroup-genomes, to reveal the evolutionary mechanisms and origins of identified venom proteins in robber flies. While homologs were identified for 15 out of 30 predominant venom protein in the nonasilid genomes, the remaining 15 highly expressed venom proteins appear to be unique to robber flies. Our results reveal that the venom of D. diadema likely evolves in a multimodal fashion comprising 1) neofunctionalization after gene duplication, 2) expression-dependent co-option of proteins and 3) asilid lineage-specific orphan genes with enigmatic origin. The role of such orphan genes is currently being disputed in evolutionary genomics, but has not been discussed in the context of toxin evolution. Our results display an unexpected dynamic venom evolution in asilid insects, which contrasts the findings of the only other insect toxicogenomic evolutionary analysis, in parasitoid wasps (Hymenoptera), where toxin evolution is dominated by single gene co-option. These findings underpin the significance of further genomic studies to cover more neglected lineages of venomous taxa and to understand the importance of orphan genes as possible drivers for venom evolution. Venoms and the toxins they contain represent molecular adaptations that have 27 evolved on numerous occasions throughout the animal kingdom. However, the 28 processes that shape venom protein evolution are poorly understood because of the 29 scarcity of whole genome data available for comparative analyses of venomous 30 species. 31 Here, we perform a broad comparative toxicogenomic analysis to gain insight into the 32 genomic mechanisms of venom evolution in robber flies (Asilidae). We first 33 sequenced a high-quality draft genome of the hymenopteran hunting robber fly 34 Dasypogon diadema, analyzed its venom by a combined proteotranscriptomic 35 approach, and compared our results to recently described robber fly venoms to 36 assess the general composition and major components of asilid venom. We then 37 applied a comparative genomics approach, based on one additional asilid genome, 38 ten high-quality dipteran genomes, and two lepidopteran outgroup-genomes, to 39 reveal the evolutionary mechanisms and origins of identified venom proteins in 40 robber flies. 41 While homologs were identified for 15 out of 30 predominant venom protein in the 42 non-asilid genomes, the remaining 15 highly expressed venom proteins appear to be 43 specimen of D. diadema; however, in order to discount sexual dimorphism in asilid 124 venom systems, this result should be combined with a larger sampling size per sex 125 for definite conclusions. The venom apparatus of D. diadema appears generally 126 similar to the previously described structures of E. rufibarbis [12], with the exception 127 that the venom apparatus of D. diadema features more complex and elongated, sub-128 structured thoracic venom glands (Fig. 1). 129 Complementing our morphological analysis, the venom composition of D. diadema 130 was investigated by applying a combination of venom gland, proboscis and body 131 tissue transcriptomics and a proteomic analysis of venom gland extracts from both 132 sexes. Apart from a more complex morphology, the venom cocktail of D. diadema 133 showed a number of differences compared to the described venom of E. rufibarbis 134 and M. arthriticus [12]. The most striking disparity is that the venom of D. diadema 135 contained chitinase-like proteins and proteins that belong to the CAP-superfamily, 136 which were absent in the venoms of E. rufibarbis and M. arthriticus (Fig. 2). The 137 expression level of transcripts coding for chitinase-like proteins were ranked third 138 (female) and fourth (male) among all identified venom proteins (male: TPM 4.16 %; 139 female: TPM 3.85 %, percentage of the summed TPM value of all identified venom 140 proteins), while CAP-like proteins were expressed on a comparably low level in both 141 sexes (male: TPM 1.34 %; female: TPM 1.23 %) (Fig. 2). We also identified five 142 families of novel venom proteins among the 30 predominant putative toxins, which 143 we named asilidin11-15, according to existing robber fly toxin nomenclature 144 [12,17] (Fig. 2, Fig. 4 Assessing ancestral gene variants 183 The protein-coding genomes of D. diadema and P. coquilletti, ten non-robber fly 184 dipterans, and two lepidopterans were compared and sorted using the Orthofinder phylogenetic splits (Fig. 3a). 203 The split between the Diptera and Lepidoptera lineages is the oldest one considered 204 in our analyses. These two clades share 84 % (7,471) of the orthogroups assigned to 205 D. diadema (Fig. 3) [22], meaning the ancestral versions of these protein-coding 206 genes already existed in the LCA of the dipteran and lepidopteran clade. Of the 207 remaining orthogroups, 877 are unique for the clade of Diptera, 158 are unique for 208 the split between the gall midge Mayetiola destructor and the brachyceran clade, 246 209 are unique for Brachycera, and 110 orthogroups are shared only between the two 210 robber flies (Fig. 3a). Sixteen orthogroups are constituted of protein-coding genes 211 found exclusively in D. diadema (Fig. 3a). 212 The venom gland proteins identified via proteomics were sorted to their associated 213 orthogroups. We then tested whether the non-toxic ancestral version of a putative 214 toxin was already present in the protein-coding genome of the LCA of the compared 215 species, or if the protein is a unique novelty for a certain clade. 109 orthogroups, 216 which were already present in the LCA of Lepidoptera and Diptera, are associated 217 with at least one venom protein of the female and male D. diadema. Three 218 orthogroups with venom proteins were unique to each of Diptera and Brachycera, 219 while eight orthogroups with putative toxins were shared only between the two robber 220 fly genomes (Fig. 3a). The majority of proteins identified in the venom gland can be 221 assigned to protein-coding genes present in the orthogroups shared between the 222 Lepidoptera and the Diptera clade. The transcripts of venom proteins assigned to 223 orthogroups, which arise on node 2, node 3 or node 4 are expressed on a low level in 224 the venom glands of both sexes. Putative toxin transcripts of node 1, node 5 and the 225 ones assigned to no orthogroup are expressed on a high level in the venom glands of 226 both sexes (Fig. 3b, 3c, Supp. Fig. 3, Supp. Fig. 4). 227 To prevent an over-interpretation of the data, the process of venom evolution in D.

General aspects on the venom biology and composition 282
Dasypogon diadema is a widely distributed robber fly that is known to hunt honey 283 bees (Apis mellifera) and other hymenopterans (Poulton 1907;Geller-Grimm 1995). 284 To overpower such dangerous prey, venom with neurotoxic components for rapid 285 paralysis is advantageous. Trophic specialization has also been shown to affect In general, the venom of D. diadema shares the major components with E. rufibarbis, 302 and M. arthriticus. Additionally, the most dominant protein families in the venoms of 303 all three species are asilidin2 and asilidin3, and all species also express asilidin1 304 transcripts (Fig. 2). The phylogenetic distance between E. rufibarbis, M. arthriticus 305 (members of the larger subfamily Asilinae) compared to D. diadema (representative 306 of the subfamily Dasypogoninae) [16,28] suggests that these three protein classes 307 resemble lineage-specific toxin arsenal of robber flies, a conclusion that is 308 corroborated by the study of Walker and colleagues [15]. 309 In the present study the de novo assembly of transcriptome data was performed 310 using a single assembler, Trinity, which is one of the most established programs to 311 assemble transcriptome data sets [29]. Nevertheless, de-novo transcriptome 312 assembly is challenging and different assembly software often construct differing sets 313 of transcripts. It has been shown in snakes and scorpions that the number of 314 assembled toxin transcripts may vary depending on the chosen assembler [30]. 315 Thus, applying only one assembler as a base for our analyses may mean that some 316 of our putative toxins may include false positives, and that we might have missed 317 some toxins that represent false negatives. 318 To avoid false positives and an over-interpretation of our data, we used only 319 transcripts that were recovered in the proteome and then identified in the whole 320 genome as baseline to discuss possible toxins. We also used two additional 321 transcriptome assemblers, RNASpades [31] and Transabyss [32], and assessed their 322 ability to recover our top 30 predominant toxins identified using Trinity. Except for few 323 candidates, the majority of the top 30 candidate toxins were recovered with identical 324 or highly identical sequence similarity in the additional assemblies. Our conclusion is 325 therefore that the pattern of venom protein evolution we discuss here for the most 326 highly expressed, and hence ecologically probably most important, putative toxins is 327 rather robust (All details are shown in the supplementary tables 8 and 9, and all 328 visualized alignments comparing the contigs from different assemblers are provided 329 in the GigaScience data cloud). The second category of venom proteins includes putative toxins without homologs 378 outside the asilid lineage. Multi-copy genes dominate this category (asilidin2, 379 Peptidase S1), although single copy genes are also present (asilidin6). Particularly 380 asilidin2 shows a pattern of intense gene duplication, and several transcripts in this 381 family from different orthogroups are secreted in the venom glands. These single and 382 multi-copy genes are robber fly lineage-specific and their ancestry is enigmatic. 383 Intriguingly, we identified transposable elements in 11 venom proteins, including two 384 variants of the highly expressed asilidin2. Two thirds of the venom proteins do not 385 show any presence of transposable elements. We can only speculate here that the 386 evolution of single toxins might be influenced by transposable elements, and that this 387 might be an explanation for the diversity of asilin2 variants. However, to provide a 388 profound analysis on the influence of transposable elements on the evolution of 389 venom proteins, the analysis design needs to be adapted and whole genome data 390 and venom protein data of more species needs to be included. MiSeq platform. All raw reads were visually inspected in FastQC [47] and then quality 505 filtered and trimmed applying Trimmomatic v.033 with a minimum length of 70 bp and 506 a min phred score of 30 [48]. An overview of sequenced raw reads and processed 507 transcripts are given in Table 2. 508

Figure1
Click here to access/download; Figure;Figure1   We would like to submit our second revision on our manuscript:

Toxins from scratch? -Diverse, multimodal gene origins in predatory robber flies indicate dynamic venom evolution in dipteran insects
We really appreciate that in this review round Laurie Goodman stepped in to solve a miscommunication and provided some editing of our manuscript. We fully accept the editorial changes and phrasing, and thank for the effort to bring the issue of false negatives to a final point.
We only changed one word in the discussion part from Laurie Goodman, instead of missed false negatives that might replace our top 30 candidates we used the phrasing might be added to our top 30 candidates. Otherwise we fully agree with the wording and went once more over the manuscript to correct possible typos that were overlooked.
The only change, that the reviewers insisted on, was that we should minimally take out the sentences about validity of our results, which is no longer in this latest manuscript version. Other comments were not made and no further revisions demanded, so we hope that our manuscript is now in its final stage.
Thank you very much and best regards on behalf of all authors,

Cover letter
Click here to access/download;Personal Cover;Drukewitz_etal_CoverLetter_Rev3.doc