Phylogeny of the Hawkmoth Tribe Ambulycini (Lepidoptera: Sphingidae): Mitogenomes from Museum Specimens Resolve Major Relationships


 Ambulycini are a cosmopolitan tribe of the moth family Sphingidae, comprised of 10 genera, 3 of which are found in tropical Asia, 4 in the Neotropics, 1 in Africa, 1 in the Middle East, and 1 restricted to the islands of New Caledonia. Recent phylogenetic analyses of the tribe have yielded conflicting results, and some have suggested a close relationship of the monobasic New Caledonian genus CompsulyxHolloway, 1979 to the Neotropical ones, despite being found on opposite sides of the Pacific Ocean. Here, we investigate relationships within the tribe using full mitochondrial genomes, mainly derived from dry-pinned museum collections material. Mitogenomic data were obtained for 19 species representing nine of the 10 Ambulycini genera. Phylogenetic trees are in agreement with a tropical Asian origin for the tribe. Furthermore, results indicate that the Neotropical genus Adhemarius Oiticica Filho, 1939 is paraphyletic and support the notion that OrectaRothschild & Jordan 1903 and TrogolegnumRothschild & Jordan, 1903 may need to be synonymized. Finally, in our analysis the Neotropical genera do not collectively form a monophyletic group, due to a clade comprising the New Caledonian genus Compsulyx and the African genus BatocnemaRothschild & Jordan, 1903 being placed as sister to the Neotropical genus ProtambulyxRothschild & Jordan, 1903. This finding implies a complex biogeographic history and suggests the evolution of the tribe involved at least two long-distance dispersal events.

were the first to formally recognize a phylogenetic relationship among the genera currently placed in Ambulycini. However, they did not unite them into a single monophyletic group but instead divided into three groups: 1) the genus Ambulyx Westwood, 1847 (as Oxyambulyx Rothschild and Jordan 1903); 2) a group comprising Amplypterus Hübner, [1819] (as Compsogene Rothschild and Jordan 1903), Akbesia Rothschild andJordan 1903 andBatocnema Rothschild andJordan 1903); and 3) a New World group comprising Adhemarius Oiticica Filho, 1939(as Amplypterus Hübner, [1819), Orecta Rothschild and Jordan 1903, Protambulyx Rothschild and Jordan 1903, and Trogolegnum Rothschild and Jordan 1903. From the first two of these groups, Rothschild and Jordan (1903) then considered the remaining smerinthine genera to have evolved.
The current concept of Ambulycini is based upon Kitching and Cadiou (2000) and includes 10 genera, of which 3 are restricted to South East Asia (Ambulyx, Amplypterus, and Barbourion Clark, 1934), 4 are Neotropical (Adhemarius, Orecta, Protambulyx, and Trogolegnum), 1 is Middle Eastern (Akbesia), 1 is tropical African/ Madagascan (Batocnema), and 1 is restricted to New Caledonia (Compsulyx Holloway 1979). Kitching and Cadiou (2000) diagnosed the Ambulycini based primarily on the shared presence of an anterior, ventral notch on the pupal cremaster. However, they admitted that the presence of this structure had been confirmed in only four of the genera (Akbesia, Ambulyx, Amplypterus, Protambulyx), and associated the remaining genera based on general morphological similarity. Kitching and Cadiou (2000) also indicated they considered a subgroup excluding Ambulyx, Amplypterus, and Barbourion was monophyletic but did not provide any supporting evidence (in fact, the synapomorphy was the shared presence of a spinose gnathos in the male genitalia). Recent molecular phylogenetic studies , Hamilton et al. 2019 have confirmed the monophyly of the tribe, although most did not include all described genera. Kawahara and Barber (2015) included six genera (Adhemarius, Ambulyx, Amplypterus, Batocnema, Compsulyx, and Protambulyx) and sequenced five nuclear genes (pyrimidine biosynthesis; dopa-decarboxylase; elongation factor-1α; Period; and wingless) and one mitochondrial gene (cytochrome c oxidase subunit I, COI), analyzing the data with both maximum likelihood and Bayesian inference methods. Both analyses recovered the same pattern of relationships among the six genera, with the Asian Ambulyx and Amplypterus forming the sister group of the remaining four, which were related as: Adhemarius (Protambulyx (Batocnema + Compsulyx)) ( Fig. 1). Contemporaneously with Kawahara and Barber (2015), Cardoso (2015) undertook a combined molecular and morphological analysis of the Ambulycini, based on the nuclear genes CAD and wingless, the mitochondrial gene COI and 96 characters derived from the adult external morphology. A combined analysis using Bayesian inference (BI) recovered a monophyletic Ambulycini with the following phylogenetic relationships among the 10 genera: Barbourion (Ambulyx (Amplypterus ((Compsulyx (Batocnema (Akbesia + Protambulyx))) (Adhemarius (Orecta, Adhemarius, (Adhemarius, Trogolegnum))) ( Fig. 1). In contrast, maximum parsimony (MP) analyses under both equal and implied weighting yielded a slightly different topology: Barbourion (Ambulyx (Amplypterus (Protambulyx ((Compsulyx + Akbesia) (Batocnema (Orecta (Adhemarius (Adhemarius (Adhemarius, Trogolegnum)))) ( Fig. 1). These results differed from those of Kawahara and Barber (2015) in not grouping Ambulyx and Amplypterus together, nor Batocnema and Compsulyx. Cardoso (2015) also found that Orecta and Trogolegnum were both nested within Adhemarius, rendering this genus paraphyletic.
The phylogenetic results of both Kawahara and Barber (2015) and Cardoso (2015) raise interesting questions regarding the biogeography of the African/Madagascan genus Batocnema and New Caledonian genus Compsulyx. Although the former study grouped them together and the latter had them splitting off sequentially, both studies agreed in placing these Old World genera in a clade with the New World Protambulyx (in Cardoso's analysis, this clade also included Akbesia, a genus missing from the study of Kawahara & Barber) and placing these genera together as the sister-group of the New World genus Adhemarius.
In the present study, we use full mitochondrial genomes, derived from dried museum specimens as old as 28 yr, to elucidate further the phylogenetic relationships of the genera of Ambulycini.

Samples, DNA Extraction, and Pooling
Mitochondrial genomes were sequenced from pooled genomic DNA samples (e.g., see Gillett et al. 2014, Timmermans et al. 2015. A single specimen from each of 22 species was selected for sequencing from the dry-pinned Sphingidae collection of the Natural History Museum, London (NHMUK) (

Sequencing and Quality Control
Indexed TruSeq Nano Libraries (Illumina, San Diego, CA) were prepared at the NHMUK Sequencing Facility for both gDNA pools. The DNA was expected to be highly fragmented and therefore no further shearing of gDNA was performed. Libraries were sequenced on an Illumina MiSeq (PE; 2x250 bp). Sequencing data were preprocessed using Illumina's MiSeq Control Software (MCS), version 3.1 (Illumina). Further processing largely followed Timmermans et al. (2015), which involved trimming low-quality bases at the start and end of reads (phred quality threshold 20) using TRIMMOMATIC (version 0.32) (Bolger et al. 2014), stitching paired-end reads using PEAR (default settings) (Zhang et al. 2014), and removing all stitched sequences with a minimum quality score of 20 from the dataset using prinseq-lite (Schmieder and Edwards 2011). Files were subsequently converted to fasta format using the Unix stream editor, sed. Finally, data were assembled using the de Bruijn graph assembler idba_ud (--mink=80, --maxk=150) (Peng et al. 2012).
Data for Batocnema africanus (LEP31448d) were extracted from the raw Illumina sequencing reads of a previously sequenced anchored hybrid enrichment specimen (see Hamilton et al. (2019) for methodological details). Sequences were mapped onto the newly generated Batocnema coquerelii mitochondrial genome in Geneious (maximum gap size: 50, maximum mismatches per read: 30%).

Phylogenetic Analyses
Mitochondrial genomes were filtered from the assembly data using stand-alone BLAST (Altschul et al. 1997). The blastn searches used an Orecta lycidas COI sequence (GenBank accession number: GU703851) as the query sequence. Geneious was used to manually assemble partial genomes into full ones and to check whether the mitochondrial genomes were circular. Genomes were annotated by aligning them to the publicly available Ampelophaga rubiginosa mitogenome (GenBank accession number: NC_035431) and transferring across the annotations, which were visually inspected to ensure the correct start and stop codons were selected. Sequences for each of the 13 protein-coding genes were extracted and aligned using the codon-based aligner, MACSE (Ranwez et al. 2011) using default settings (gap penalty: 7, gap extension penalty: 1, stop codon penalty: 100, frameshift penalty: 30) and the mitochondrial genetic code. The 13 alignments were then concatenated into a single data string for each species using a custom PERL script (A.P. Vogler, personal communication). To investigate saturation in the dataset, plots of the p-distance against model corrected distance (TN93; Tamura and Nei 1993) were generated for each species pair using the ape package (Paradis et al. 2004) in R (R Core Team 2013). Maximum Likelihood phylogenetic analyses were performed using IQ-TREE (Nguyen et al. 2015) and Bayesian Inference in MrBayes (Ronquist and Huelsenbeck 2003) on a partitioned supermatrix dataset (six partitions; by strand and codon position). IQ-TREE was run with the following command: iqtree -spp partitions.txt -s <FASTA FILE> -m MFP+MERGE -nt AUTO -bb 1000 -alrt 1000. This command structure tells IQ-TREE to find the best model for each partition and subsequently merge partitions until an optimal partition scheme is found. The program then uses this for phylogenetic inference and performs an Ultrafast Bootstrap (Minh et al. 2013) and SH-like approximate likelihood ratio test (SH-aLRT) (Guindon et al. 2010) with 1,000 replicates each. MrBayes analyses were run for 1 million generations (two MCMC with four chains each; GTR+I+G model; unlinked model parameters across partitions). The first 25% of trees were discarded as burn-in and Posterior Probabilities calculated. Finally, the R library phytools (Revell 2012) was used to plot the tips of the Bayesian topology onto a world map.

Mitogenome Similarity and Completeness
COI similarity between ambulycine species ranged between 86% (Protambulyx astygonus vs A. dariensis) and 93% (Compsulyx A complete, circular mitochondrial genome was obtained for 7 of the 11 species in pool 1 and 9 of the 10 species in pool 2. For one additional species (A. dentoni), a contig of 13,731 bp was assembled. For five samples, no mitochondrial genome sequence was recovered: A. davidi, B. africanus, and C. cochereaui in pool 1, and O. acuminata and the outgroup species, V. kingstoni in pool 2. These were not consistently the oldest or the smallest samples, and it remains unclear why the sequencing and assembly failed to generate useable data for these specimens.
Of particular interest was the pool 1 specimen, C. cochereaui. The species was repeated using a different specimen in a different sequencing run and a contig of 13,347 bp obtained. This contig, like that of the above-mentioned A. dentoni sequence, contained all the protein-coding genes, but lacked information on the rRNAs and the d-loop region. To confirm correct assembly of the C. cochereaui genome, it was compared to independently derived COI and CYTB sequences. Sequences were aligned to the partial mitogenome (sequence position COI: 1,690-2,346, sequence position CYTB: 11,093-11,475) and shown to be 100% identical (i.e., no mismatch was observed).
The lengths of each of the 16 full mtDNA genomes ranged from 15,304 bp (Ambulyx dohertyi) to 15,676 bp (A. dariensis) ( Table 1), slightly longer than the published genome of the sphingid A. rubiginosa (15,282 bp). As expected, gene order was highly conserved and matched the order typically observed in ditrysian Lepidoptera, with one exception in A. dariensis-a translocation of tRNA-Gln was observed from the 'tRNA-Met, tRNA-Ile, tRNA-Gln cluster' to a position in the d-loop region (Supp Fig. 1 [online only]).

Phylogenetic Analysis
Protein-coding genes were extracted, aligned, and concatenated into a single concatenated supermatrix of 11,235 bp for each species. The data matrix was supplemented with sequence data for B. africanus LEP31448d that had been extracted from the raw Illumina sequencing reads of a previously sequenced anchored hybrid enrichment specimen (94% complete). Saturation was investigated visually by plotting pairwise p-distances against pairwise TN93-corrected distances. The obtained plot revealed a strong linear relationship, suggesting saturation is negligible (Supp Fig. 2 [online only]). It was therefore decided not to recode or remove codon positions. Phylogenetic analyses performed on the partitioned dataset, using both Maximum Likelihood and Bayesian Inference, yielded identical topologies (Fig. 2).
The Ambulycini are recovered as monophyletic but with very poor support (SH-aLRT support = 53.4/Bootstrap support = 68.0; Posterior Probability = 0.74). The tropical Asian species, B. lemaii, is the first lineage to split from the rest of the samples, but its placement must be considered uncertain given that support values for the tribe as a whole on both trees are very low. In contrast, all other relationships were recovered with high support (BS ≥90.0; PP≥ 0.99), except for the pairing of P. astygonus and P. strigilis (SH-aLRT support = 78.9/Bootstrap support = 79.0; Posterior Probability = 0.99). The remaining two tropical Asian Ambulycini do not form a monophyletic group, as the genus Amplypterus splits off separately and after the genus Ambulyx. Nor were the four Neotropical genera collectively recovered as a monophyletic group. Rather, Compsulyx and Batocnema are together placed as sister to the genus Protambulyx. These three genera are sister to a group comprising the remaining For library construction, DNA extracts were pooled as indicated (Pool 1, Pool 2). For each specimen, only the country of origin is given; further data can be found on the URL links to the specimen image pages on the NHM Data Portal. Species names not in bold face are those for which no mitochondrial genome sequence was recovered. Mitogenome length, length of mitochondrial genome; n.a., not applicable.
* indicates that the mitochondrial genome is partial.

Phylogeny of Ambulycini
Previous phylogenetic analyses of the Ambulycini have yielded conflicting patterns of relationships among the genera. In the present study, Ambulycini are recovered as monophyletic but with only weak support (Fig. 2), though in both analyses, Barbourion is the sister group to the rest of the tribe. In contrast, a clade comprising the remaining genera receives very strong support. This suggests that Barbourion might perhaps be misplaced in Ambulycini, although it could also be an artifact of our limited outgroup sampling. Next to split off is Ambulyx, then Amplypterus. These are followed by Adhemarius, in which Orecta and Trogolegnum are placed. Thus, Adhemarius is paraphyletic relative to the other two genera. The final clade comprises the remaining three ambulycine genera of the present analysis, with Batocnema placed as sister to Compsulyx and these two as sister to Protambulyx. Thus, our results are almost identical to those of the combined molecular and morphological BI analysis of Cardoso (2015; Fig. 1.7), except that Batocnema and Compsulyx are sisters, rather than arising separately, and the placement of Orecta is resolved.

Classification of Adhemarius, Orecta, and Trogolegnum
With regard to the phylogenetic relationships of Adhemarius, Orecta, and Trogolegnum, the present study (Fig. 2) found a topology in which the Adhemarius donysa-group + Trogolegnum split off first, followed by the Adhemarius gannascus-group, leaving a terminal sister-group pairing of the A. sexoculata-group and Orecta. This result contrasts with Cardoso (2015), who found different patterns of relationship among these groups, depending upon the analytical method and data set used: in their IW MP analysis (Cardoso 2015: Fig. 1.6), Orecta is first to branch off, followed by the sexoculata-group, then the gannascus-group, then finally Trogolegnum as sister to the donysa-group. In contrast, in the results of their BI analysis (Cardoso 2015: Fig. 1.7), the sexoculata-group branched off first, followed by a trichotomy comprising Orecta, the gannascus-group and the donysa-group + Trogolegnum. However, all analyses agree that T. pseudambulyx is simply a member of the A. donysa species-group, albeit one with a reduced proboscis and labial palps, and, like Orecta, spinulose abdominal tergites and nonspinose abdominal sternites (Rothschild and Jordan 1903). However, the phylogenetic relationships of Orecta, also considered by Rothschild and Jordan (1903) to be a derivative of Adhemarius, remain obscure, with each of the three analyses suggesting a different placement. We therefore consider it premature to make any formal changes to the classification of the three genera. In addition, although most of the relationships were recovered with high support, we should point out that mitochondrial genomes are maternally inherited, can introgress between hybridizing species and that the genes in the mitochondrial genome are tightly linked (Avise and Ellis 1986). It is therefore possible that the trees obtained here merely represent a deviating gene history and not the actual evolutionary history of the species involved (Ballard 2000). Phylogenomic studies currently in progress, which focus on the nuclear genome using anchored hybrid enrichment (Kawahara et al. in preparation) and ultra-conserved elements (Rougerie et al. in preparation), will show whether there is any discrepancy between the mitochondrial and nuclear genomes and are expected to unambiguously resolve the placement and relationships of ambulycine taxa, finally allowing taxonomic decisions to be made.  Table 1), except for 4) B. coquerelii which was taken by Laurel Kaminsky.

Biogeography of Ambulycini
Although this study is not intended to be a formal biogeographical study of the Ambulycini, it is possible to draw some preliminary conclusions based on the results presented here. The first three genera to split from the rest of the ambulycine tree, Barbourion, Ambulyx and Amplypterus, are essentially tropical South-east Asian in distribution (although some Ambulyx species occur in more northern temperate regions) and it is likely that this region is where the tribe originated.
To date, no analysis has recovered a monophyletic group comprising only the four New World genera, Adhemarius, Orecta, Protambulyx, and Trogolegnum. Instead, in all cases, Protambulyx is placed in a clade together with the Old World genera, Akbesia, Batocnema, and Compsulyx, the sister-group of which is a clade comprising Adhemarius, Orecta, and Trogolegnum. If the Ambulycini originated in the Old World, then it is still unclear whether there were two independent dispersal events to the New World (the Protambulyx and Adhemarius/Orecta/Trogolegnum lineages), a single such dispersal event followed by a second back to the Old World by the Akbesia/Batocnema/Compsulyx group, or an even more complex scenario. The ambiguity currently surrounding the phylogenetic relationships of these genera precludes a more objective biogeographical analysis.

Taxonomy and Biogeography of Compsulyx
The monobasic genus Compsulyx is endemic to the main island of New Caledonia in the western Pacific, where it is particularly associated with ultramafic rainforest (Holloway 1979). Its only species, C. cocheraeaui, was originally described in the genus Compsogene (now Amplypterus), but Viette (1971) noted a resemblance to Ambulyx, although not in perfect agreement with either genus. A more thorough study by Holloway (1979) led to a conclusion that this species belonged in a separate genus, Compsulyx, but was nevertheless of Indo-Malayan origin: "the New Caledonian species bears closer resemblance to Oxyambulyx [Ambulyx] but certainly represents an offshoot of ambulycine [sic] stock prior to the main radiation within the other two genera [Ambulyx and Amplypterus]" (Holloway 1979: 351). Without giving explicit supporting characters, Kitching and Cadiou (2000) placed Compsulyx in a clade that also included Akbesia, Batocnema, and the four New World genera. In fact, the synapomorphy in question was a spinose gnathos, a character that was confirmed by Cardoso (2015) (but also recorded by him in the distantly related outgroup, Parum colligata [Walker 1856]), to which he added two further synapomorphies relating to the relative lengths of the diverticula on the vesica in the male genitalia and the degree of twisting of the antrum in the female genitalia, although both were rather homoplastic.
All analyses so far have recovered this clade of seven Old and New World genera (when included), but the placement of Compsulyx within it remains uncertain. Kawahara and Barber (2015) and the present study place Compsulyx as sister to Batocnema, whereas Cardoso (2015) placed it as either sister to Akbesia (MP analysis) or to a clade comprising Akbesia, Batocnema, and Protambulyx (BI analysis). It is unfortunate that we were not able to recover any mitochondrial genome sequence for Akbesia, as the absence of this genus makes a direct comparison with the results of Cardoso (2015) impossible, but all clearly reject Holloway's (1979) suggestion of a close relationship with Ambulyx and Amplypterus.
Regardless of its precise placement, the phylogenetic relationships of Compsulyx make for a highly enigmatic biogeography. To visualize this, our Bayesian topology plotted onto a map of the Earth (Fig. 3) highlights the discrepancy between the geographical distribution of Compsulyx and its phylogenetic placement, in which its closest relative would be B. coquerelii 12,000 km to the west, in Madagascar. Such long-distance sister-group relationships are rare in Lepidoptera, but not unknown. For example, Hundsdoerfer et al. (2017) reported a sistergroup pairing in the hawkmoth genus Hyles Hübner, [1819], between Hyles biguttata (Walker 1856) from Madagascar and La Réunion and Hyles livornicoides (Lucas 1892) from Australia. However, further clarification of the biogeography of Compsulyx will require additional resolution of the phylogenetic relationships within the tribe.

Supplementary Data
Supplementary data are available at Insect Systematics and Diversity online.