Positive and relaxed selection associated with flight evolution and loss in insect transcriptomes

Abstract The evolution of powered flight is a major innovation that has facilitated the success of insects. Previously, studies of birds, bats, and insects have detected molecular signatures of differing selection regimes in energy-related genes associated with flight evolution and/or loss. Here, using DNA sequences from more than 1000 nuclear and mitochondrial protein-coding genes obtained from insect transcriptomes, we conduct a broader exploration of which gene categories display positive and relaxed selection at the origin of flight as well as with multiple independent losses of flight. We detected a number of categories of nuclear genes more often under positive selection in the lineage leading to the winged insects (Pterygota), related to catabolic processes such as proteases, as well as splicing-related genes. Flight loss was associated with relaxed selection signatures in splicing genes, mirroring the results for flight evolution. Similar to previous studies of flight loss in various animal taxa, we observed consistently higher nonsynonymous-to-synonymous substitution ratios in mitochondrial genes of flightless lineages, indicative of relaxed selection in energy-related genes. While oxidative phosphorylation genes were not detected as being under selection with the origin of flight specifically, they were most often detected as being under positive selection in holometabolous (complete metamorphosis) insects as compared with other insect lineages. This study supports some convergence in gene-specific selection pressures associated with flight ability, and the exploratory analysis provided some new insights into gene categories potentially associated with the gain and loss of flight in insects.

We explore what types of protein-coding genes have experienced differing selective 118 pressures associated with the evolution and loss of flight using DNA sequences from a total of 119 1476 nuclear single-copy orthologous protein-coding genes and 13 mitochondrial protein-coding 120 genes obtained from transcriptomes. Firstly, we test for evidence of positive selection during the 121 time when flight originated. Secondly, we test for positive and relaxed selection among multiple 122 evolutionary losses of flight, which provide more recent and naturally replicated evidence for 123 genes potentially associated with the evolution and maintenance of flight. In addition to using 124 multiple evolutionary shifts in a biological or ecological trait to identify common genetic trends 125 associated with that shift (e.g.  Table S1) and 138 assigned transcripts to 1476 single-copy nuclear orthologous genes included in the ortholog set 139 published by Misof et al. [2]. We additionally included the 12 reference species with an official 140 gene set available and used by Misof et al. [2] to infer orthology; thus, data for 113 species were 141 available in total. Orthology assignment of transcripts, alignment, outlier check, alignment 142 refinement, and generation of nucleotide alignments followed the guidelines described in Misof 143 et al. [2] with some modifications (see Methods section). Sequences for the 13 mitochondrial 144 protein-coding genes were obtained from the associated mitochondrial transcriptome sequencing 145 project of BGI, with some substitution of sequences from mitochondrial genomes published on 146 NCBI to increase completeness (species and sources of data provided in [31]  Positive selection associated with the origin of flight 155 Tests of positive selection were performed for each lineage of interest via branch-site 156 models, which estimate dN/dS ratios at each codon site and between branches such that positive 157 selection is detected in the lineage of interest if a subset of codon sites have dN/dS ratios greater 158 than 1, while the other lineages have ratios less than 1 or equal to 1, indicating purifying 159 selection or neutral evolution, respectively. Out of 954 nuclear genes tested in the lineage leading 160 7 to the pterygote insects ('P' in Figure 1), 126 (13%) were detected to be under positive selection; 161 39 of these were uniquely detected to be under positive selection in branch 'P' and not detected 162 in either branch 'U' (upstream) or 'D' (downstream). The 39 unique candidate genes over-163 represented Gene Ontology categories related to 'spliceosome', 'protein binding', 'protease', and 164 'RNA catabolic process' ( Table 1). The candidate gene list included frayed (fray) and NADH 165 dehydrogenase (ubiquinone) 23 kDa subunit , related to wing development and the 166 mitochondrial respiratory chain, respectively, but such functional categories did not contain an 167 over-representation of genes exhibiting evidence of positive selection. When grouping multiple 168 Gene Ontology terms of potential interest related to wing or mitochondrion/ATP-169 binding/OXPHOS-related functions, neither of these grouping were significantly over-or under-170 represented by the 39 candidate genes as compared to the non-candidate genes (wing: 2.6% in 171 candidate list of genes displaying signature of positive selection vs. 3.0% in non-candidate list, 172 pFisher's Exact (1-tailed)=0.69; mitochondrion-related: 15.8% vs. 17.4%, p=0.67; only over-173 representation p values shown). Out of 13 nuclear OXPHOS genes available in the background 174 gene set of 954, only one was in the candidate list (2.6% vs. 1.3%, p=0.42). None of the 13 175 mitochondrial genes were detected to be under positive selection in the 'P' lineage after 176 Benjamini-Hochberg correction. Gene names and descriptions for candidate and background 177 genes for all analyses are provided in [31] 1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 215

Exploratory test of positive selection in lineage leading to Pterygota 505
Twenty-eight hexapod species were selected to maximize the number of shared nuclear 506 genes available for analysis as well as the phylogenetic representation of pterygotes and non-507 pterygote hexapods. Not all genes were available for all species in the candidate alignments, and 508 thus species were selected with the trade-off of number of species versus obtaining the largest 509 gene set. For this test, we excluded flightless species or orders from within Pterygota, i.e. 510 representing secondary flight losses. Species selection was performed in a phylogenetically 511 stratified way, with the final list of 28 species being those that gave the maximum gene count: 1) 512 all five apterygote orders were included, with a maximum of three species per order, but 513 allowing up to one missing sequence per gene for this set; 2) one species from Odonata and one 514 species from Ephemeroptera were included, with no missing sequences allowed; 3) one species 515 per each of five orders of Polyneoptera was included, allowing one missing sequence per gene 516 for this set; 4) one species from each of 10 orders in the clade including Thysanoptera and 517 Diptera (Fig. 1) was included, allowing up to three missing sequences per gene (species selected 518 shown in Fig. 1). This resulted in 954 genes out of 1476. Similarly, 27 species representing 519 apterygote and pterygote hexapod orders were selected for the 13 mitochondrial protein-coding 520 genes, with no missing sequences allowed. 521 1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 We tested for evidence of positive selection in these nuclear and mitochondrial genes in 522 the lineage leading to Pterygota (Fig. 1 branch 'P'). We used the branch-site method of detecting 523 positive selection [63] in the program PAML codeml version 4.8 (PAML, RRID:SCR_014932) 524 [64], with the fit of models A1 (non-synonymous-to-synonymous (dN/dS) ratio fixed at 1) vs. A 525 (dN/dS ratio free to vary) (each model with four classes of sites, each class allowing a certain 526 combination of dN/dS ratios representing positive selection, purifying selection, or neutral 527 evolution) compared for each gene separately through likelihood ratio tests [65]. For this and 528 subsequent analyses, we corrected for false discovery due to multiple genes being tested by using 529 the Benjamini-Hochberg correction [66] for each gene within a set, with a family-wise alpha of 530 0.05. 531 We repeated the tests on two additional lineages to serve as a null hypothesis to compare 532 to the results for the lineage 'P'. Branch 'U' (upstream) and 'D' (downstream) (Fig. 1) were 533 tested. Using these results, we separated out genes that were uniquely detected as being under 534 positive selection in the lineage leading to pterygote insects. These unique genes were subjected 535 to Gene Ontology (GO) analysis, described in the 'Functional analysis' section below. were available for our analysis. In the case of phasmids, flight loss occurred multiple times 544 1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 within the order [7][67]. However, all available species were flightless, and thus the losses could 545 not be represented accurately on the phylogeny; we tested the branch leading to the phasmid 546 clade to approximate the timing of early flight losses in that order. Due to incomplete 547 phylogenetic mapping of some of the flight loss events, the branches tested here likely represent 548 some flying lineage history in addition to flightless lineage history, which may cause 549 underestimation of molecular signal due to flight loss. A qualitative assessment is provided to 550 indicate the likely degree of accuracy in the mapping of each case of flight loss, considering the 551 density of taxonomic sampling in that group and how frequently flight is thought to have been 552 lost in those groups ( Fig. 1 and [ A test for positive selection was performed on each of the 11 branches of interest for each 559 sub-tree and gene separately. Those genes with significant p values (at 0.05 level after 560 Benjamini-Hochberg correction) within a sub-tree were included in further analysis. We 561 identified genes that were detected as evolving under positive selection in three or more of the 11 562 lineages tested. However, in order to eliminate those genes exhibiting a signature of selection in 563 many lineages regardless of flight state, we also tested nine flight-capable lineages that were 564 sister lineages or were closely related to the flightless lineages for positive selection using the 565 same sub-trees as the flightless lineages (trees and results in [31]). There were numerous flight 566 loss events in one sub-tree, and so there were fewer related flight-capable lineages to include, 567 1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 resulting in nine flight-capable lineages tested overall (as compared with 11 flightless lineages). 568 The counts of genes exhibiting a significant signature of positive selection (p<0.05 after 569 Benjamini-Hochberg correction) were tallied for the flying lineages, and these counts were 570 subtracted from the list of candidate genes for the flightless lineages. Those remaining genes 571 with three or more counts of positive selection in the flightless lineages were included in 572 functional analysis. This procedure was repeated for the eight cases of full flight loss (i.e. 573 excluding the three cases of female-only flight loss) as compared to seven related flying lineages. 574

575
Exploring genes under relaxed selection with flight loss 576 Nuclear and mitochondrial genes were examined for relaxed selection associated with 577 flight loss using branch models in PAML codeml to estimate dN/dS ratios for lineages of 578 interest. For nuclear genes, the total 113-species tree (Fig. 1) was used, and missing data were 579 allowed. Only genes with data for 80 or more species were included, resulting in 1285 genes 580 tested. Flightless lineages representing full flight loss (not female-only flight loss) were coded 581 one branch rate (red branches in Fig. 1) and the sister or related flight-capable lineages of similar 582 tip number and taxonomic rank were coded together a separate rate (blue branches in Fig. 1), 583 while all other lineages were coded as the background rate. 584 For each gene, a change in selection regime associated with loss of flight was concluded 585 when there was a significantly increased dN/dS ratio (between 0 and 1) in flightless lineages as 586 compared to flying lineages. Likelihood ratio tests between 3-rate trees (flightless [ 1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 across genes, with a family-wise α = 0.05. Those genes that had a significantly higher dN/dS 591 ratio in the flightless than flying lineages were examined by functional analysis (below) as 592 compared to the total gene set tested. We interpreted increased dN/dS ratios as signifying relaxed 593 selection. This interpretation of the dN/dS ratios involves the assumption that the majority of 594 non-synonymous changes across a whole gene sequence are selectively neutral or slightly 595 deleterious; by contrast, positive selection is assumed to affect a small minority of sites at which 596 mutations with beneficial effect have occurred [68]. However, given that increased dN/dS ratios 597 can be due to strong positive selection rather than relaxed selection (or in combination, in 598 different parts of the gene), as a precaution we verified whether any genes from this list 599 overlapped with those in the final candidate list for genes under positive selection in both-sexes-600 flightless lineages. 601 For mitochondrial genes, a 66-species tree adopted from Misof et al.  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 We tested for over-representation in Gene Ontology (GO) categories by the genes 613 exhibiting positive or relaxed selection as compared to each total gene set analyzed ('background 614 genes') using the DAVID version 6.  Table S14. 630 Our tests for over-representation of genes under positive or relaxed selection are in relation to 631 each of our available gene sets, i.e. 'background' sets that are each a subset of the total 1476 632 genes. In addition to GO analysis, which provides information on functional terms based on 633 over-representation available in the candidate gene set, we grouped lists of genes in similar 634 DAVID terms (those terms present with default settings in the 'chart' or 'cluster' mode) existing 635   1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65 in our background 1476 gene set. The grouped terms were those related to each 1) wing 636 development (4 chart and 1 cluster term for a total of 48 genes) and 2) mitochondrion/ATP-637 binding/respiratory-chain-related functions (7 chart and 11 cluster terms for a total of 237 genes). 638 We acknowledge that these groupings do not include all possible genes related to the wing or 639 mitochondrion-related functions of interest in the dataset but provide two larger, functionally-640
We are glad both reviewers found the manuscript well-written and the methods thorough. We have made the suggested minor changes to the manuscript, outlined further below. The main changes were to add sections of interpretation to the discussion, as suggested by the reviewers, and to provide more detail on the genes of interest. We think the changes have improved the manuscript.
We thank the reviewers and editor for their thoughtful input and hope that you will now find the manuscript acceptable for publication in GigaScience. We look forward to hearing back from you soon.

_________
Dear Dr. Mitterboeck, Your manuscript "Positive and relaxed selection associated with flight evolution and loss in insect transcriptomes" (GIGA-D-17-00053) has been assessed by our reviewers. Based on these reports, and my own assessment as Editor, I am pleased to inform you that it is potentially acceptable for publication in GigaScience, once you have carried out some essential revisions suggested by our reviewers.
The reviewers have no substantial concerns regarding the overall methodology,however, they do point out that the conclusions that can be drawn from your work are rather limited -but we agree that the work is nevertheless a useful contribution and suitable for publication in GigaScience, in principle. Please revise your paper in light of the reviewers recommendations, in particular regarding caveats and limitations of your work mentioned by the referees.
Their reports, together with any other comments, are below. Please also take a moment to check our website at http://giga.edmgr.com/ for any additional comments that were saved as attachments.
If you prepare your revised manuscript, please amend your reference list with a citation to any data that