Phylogenomics and Fossil Data Inform the Systematics and Geographic Range Evolution of a Diverse Neotropical Ant Lineage

Abstract Recent advances in phylogenomics allow for the use of large amounts of genetic information in phylogenetic inference. Ideally, the increased resolution and accuracy of such inferences facilitate improved understanding of macroevolutionary processes. Here, we integrate ultraconserved elements (UCEs) with fossil and biogeographic range data to explore diversification and geographic range evolution in the diverse turtle ant genus Cephalotes Latreille, 1802 (Hymenoptera: Formicidae). We focus on the potential role of the uplift of the Panamanian land bridge and the putative ephemeral GAARlandia land bridge linking South America and the Antilles in shaping evolution in this group. Our phylogenetic analyses provide new resolution to the backbone of the turtle ant phylogeny. We further found that most geographic range shifts between South America and Central America regions were temporally consistent with the development of the Panamanian land bridge, while we did not find support for the GAARlandia land bridge. Additionally, we did not infer any shifts in diversification rates associated with our focal land bridges, or any other historical events (we inferred a single diversification rate regime across the genus). Our findings highlight the impact of the Panamanian land bridge for Cephalotes geographic range evolution as well as the influence of taxonomic sampling on macroevolutionary inferences.

Turtle ants (genus Cephalotes Latreille, 1802 [Hymenoptera: Formicidae]) are a diverse New World lineage of arboreal ants with a range that spans the Neotropics and adjacent areas of arid habitat in North America (de Andrade andBaroni Urbani 1999, Price et al. 2014). There are currently 123 described extant species and 16 fossil species, with close to 20 species found in many Neotropical habitats (de Andrade and Baroni Urbani 1999, Bolton 2021, Oliveira et al. 2021). This present-day range and diversity are associated with remarkable morphological diversity and an iconic morphological caste system that underpins the group's arboreal ecology (de Andrade andBaroni Urbani 1999, Powell et al. 2020). Soldiers defend the colony by using their elaborately armored heads to barricade nest entrances against invasion by nest-site competitors (Powell 2008, Powell 2009, Powell et al. 2020, while workers have body armor with spines and lateral shielding that protects them while they forage in the enemy-rich canopy environment (Coyle 1966, de Andrade andBaroni Urbani 1999). From this foundational ecology and morphology, the group has filled considerable ecological and phenotypic space. Species are found in most tropical habitats with woody vegetation, and up to elevations in excess of 1,300 m (de Andrade and Baroni Urbani 1999, Oliveira et al. 2021). They also use pre-existing wood cavities, a widely used shelter resource often produced by wood-boring beetles, with entrance areas that span two orders of magnitude (Powell 2016). Phenotypically, body length varies by a factor of five across species, and caste phenotypes show exceptional interspecific diversity, including four distinct soldier morphotypes and a four-fold difference in soldier head size across species (Powell et al. 2020).
Previous analyses of the geographic distribution and timescale of turtle ant evolution indicated a likely South American origin of crown Cephalotes in the Eocene around 50 million years ago (mya) (Price et al. 2014, Price et al. 2016). This means that major geologic events in the dynamic geologic and climatic history of the Neotropics over the last 50 million years have possibly influenced the current distribution of turtle ants, especially with respect to range expansion out of South America. These include the formation of the Isthmus of Panama (Coates et al. 1992, Montes et al. 2015, Bacon et al. 2015, O'Dea et al. 2016, the orogeny of the Andes (Allmendinger et al. 1997, Gregory-Wodzicki 2000, and Quaternary climatic and habitat changes (Haffer 1969, Prance 1982. A major research focus in this region has been on how physical connections between South America and the Antilles (GAARlandia land bridge, Iturralde-Vinent and MacPhee 1999, Hedges 2006, Ali 2012) as well as between North and South America (formation of the Isthmus of Panama) have potentially facilitated the dispersal and colonization of taxa to new areas.
The impact of land bridges as drivers of dispersal and diversification in American clades has been well-established for many taxa (Bagley and Johnson 2014), yet there remains a deal of uncertainty surrounding the dates of land bridge formation as well as the geographic complexity of emergence. For example, while there is little disagreement that the uplift of Panama was complete by around 3.5 million years ago (Mya), some geological evidence suggests that island formation could have promoted interchange between the American landmasses as early as 10 Mya (Bacon et al. 2015). Winston et al. (2017) found supporting biological evidence for this "early dynamic colonization" hypothesis with nomadic, flightless Eciton Latreille, 1804 (Hymenoptera: Formicidae) army ants, and support for early interchange has also been documented for Neotropical carnivorous mammals (Eizirik 2012), vipers (Alencar et al. 2016), anoles (Poe et al. 2017), and fungus-farming ants (Branstetter et al. 2017a), among others. The South America-Antilles land bridge is similarly controversial (Ali 2012), with some researchers finding support for the land bridge (e.g., cichlids : Říčan et al. 2013;toads: Alonso et al. 2012; ogre-faced spiders: Chamberland et al. 2018) while others argue that the totality of evidence suggests it never existed at all (Hedges 2006, Graham 2018. The combination of their continental range, diversity, and biological characteristics makes the turtle ants a particularly interesting group for a variety of evolutionary studies that integrate phylogenetic and ecological information (e.g., Kempf 1951, de Andrade and Baroni Urbani 1999, Price et al. 2014, Price et al. 2016, Powell et al. 2020. Each study has helped improve our understanding of the phylogenetic relationships within the turtle ants and the factors shaping the evolution of the lineage. Nevertheless, the best phylogenetic hypothesis to date, based on analyses that use molecular data from six gene fragments and integrate 131 morphological characters (Price et al. 2016), includes some nodes in the backbone of the turtle ant phylogeny that remain unresolved. For example, the placement of the clypeatus group was inferred with below-average support in both Price et al. (2014) and Price et al. (2016), and in each study over 10% of the nodes are inferred with relatively low support (posterior probability < 0.95). A phylogenomic approach, which incorporates significantly more genetic information than gene fragment methods, would allow for the comparison between phylogenies inferred from different amounts of molecular data and could contribute to the resolution of backbone nodes.
Phylogenomic approaches have in many ways advanced the field of phylogenetics due to the relative affordability of high-throughput sequencing, the development of analytical tools for genome-scale datasets, and the application of phylogenomic markers to a broad range of taxa Good 2016, McKain et al. 2018). These logistical advances have aided in the resolution of previously intractable nodes in many lineages (land plants: Qiu et al. 2006;turtles: Crawford et al. 2012;ray-finned fishes: Faircloth et al. 2013a;birds: Jarvis et al. 2014;PACMAD grasses: Cotton et al. 2015;snakes: Streicher and Wiens 2017;smurf-weevils: Van Dam et al. 2017;placental mammals: Esselstyn et al. 2017;parasitoid wasps: Cruaud et al. 2021), although even large increases in genetic information sometimes fail to resolve challenging phylogenetic relationships (Brown and Thomson 2017). The higher degree of confidence in the topology of phylogenies reconstructed with phylogenomic data can lead to more robust inferences about macroevolutionary patterns and processes. These include divergence dates (dos Reis et al. 2012), rates of speciation and extinction (Friedman et al. 2019), morphological trait evolution (Leaché et al. 2014), and geographic range evolution (Pouchon et al. 2018). Thus, with a better-resolved backbone for the turtle ant phylogeny, advances in our understanding of the group's range evolution may also be possible.
Until relatively recently, model-based methods for conducting historical biogeographic analyses were lacking, but now, most analyses are conducted with models that include parameters for dispersal and localized extinction (Ree and Smith 2008, Matzke 2013, Silvestro et al. 2016. However, despite potentially greater confidence in phylogenies reconstructed with phylogenomic data and recently developed models for assessing historical biogeography, performing analyses of ancestral area estimation that only include extant taxa may lead to inaccurate results. In particular, fossil occurrences may extend the ancestral range of a lineage, demonstrating a pattern of localized extinction (Ree and Sanmartin 2009, Matzke 2013, Silvestro et al. 2016, Barden and Ware 2017. Without fossils and an understanding of the true ancestral ranges inhabited by a lineage, the biotic and abiotic factors that facilitate and hinder dispersal may not be recovered. The turtle ants are an exemplary system to test the influence of Neotropical land bridges in a phylogenomic context due to their species richness, geographic distribution, and age of the clade. A particularly appealing aspect of the turtle ant system is its relatively rich amber fossil record (de Andrade and Baroni Urbani 1999). Currently identified fossils are from Dominican and Mexican amber and are dated from 15-20 million years ago (Iturralde-Vinent and MacPhee 1996, Grimaldi andEngel 2005, Kraemer 2007). Seven of the sixteen fossils are assigned with moderate to strong support (posterior probability ≥ 0.9) to extant clades (de Andrade and Baroni Urbani 1999, Price et al. 2016; the placement of the remaining fossils is not well supported). Interestingly, most fossils are from areas where there are no extant species in those clades. For example, Dominican amber contains representatives of up to five clades, but no extant representatives of any of these clades are currently found on Hispaniola. Instead, Hispaniola is now home to extant endemics from two other clades not captured in the amber fossil deposits (de Andrade andBaroni Urbani 1999, Price et al. 2016). Thus, the locations of fossil and extant taxa may shed light on the patterns of turtle ant geographic range evolution and, through fossils, highlight both the colonization history and the role of localized extinction in this group.
In this study, we take a phylogenomic approach using ultraconserved elements (UCEs) to infer the phylogeny and divergence dates of turtle ant evolution. We then use the timetree and available fossil data to better understand the geographic range evolution of turtle ants, with a particular focus on the potential role of emergent land bridges in shaping diversification in this group. More specifically, we address the potential influence of the Isthmus of Panama and the South America-Antilles (GAARlandia) land bridge on the colonization and dispersal of turtle ant species between South America, Central America, and the Antilles, and assess whether either of these potential dispersal events is associated with evolutionary rate shifts. Our hypotheses were that (1) the timing of the majority of inferred geographic range shifts into Central America and the Antilles will coincide with the formation of the Panamanian and GAARlandia land bridges as expected if these land bridges significantly increased Cephalotes dispersal probability, and (2) diversification rate shifts will be inferred at nodes in the Cephalotes phylogeny that are subsequent to the formation of land bridges as expected if novel niche opportunity and significant geographic expansion facilitated by novel land bridges drove speciation in this group.

Taxon Sampling
To maximize the number of species included in our study, we attempted to include specimens of different collection ages (up to seventy years old) and different preservation methods (i.e. stored in ethanol or dried and pinned) in the phylogeny. High-throughput sequencing methods and target enrichment for a large number of UCEs usually lead to successful capture of 100's of loci, even from less than optimally preserved material which previously posed challenges for targeting gene fragments with Sanger sequencing (Blaimer et al. 2016, Prosser et al. 2016. The initial molecular dataset consisted of 172 samples including 87 described species, 26 undetermined Cephalotes taxa, and one outgroup species (Procryptocerus scabriusculus Forel, 1899 [Hymenoptera: Formicidae]). Specimens were designated as undetermined if they did not robustly match the full character list of published species descriptions, such that some are likely to represent new, undescribed species, whereas others likely represent specimens of morphologically variable described species. All undetermined species were, however, readily placed in previously described morphological species groups (de Andrade and Baroni Urbani (1999). P. scabriusculus was used as the outgroup, as Procryptocerus is the well-established sister genus to Cephalotes (Moreau et al. 2006, Moreau and Bell 2013, Price et al. 2014, Ward et al. 2015. After removal of some samples due to an insufficient number of captured UCE loci (see "Concatenated analyses", below), our dataset consisted of 134 in-group samples representing 72 described species (~60% of known extant species) as well as 21 undetermined taxa, incorporating recent species descriptions from our material. Species determinations for this final dataset were verified by SP using de Andrade and Baroni Urbani (1999) as the primary source of keys and character descriptions. Our sampling spanned all recognized Cephalotes species groups, with the exception of three monotypic species groups, and one additional species group with only two described species that was excluded from the final data matrix due to insufficient UCE loci capture. To test species monophyly we included multiple individuals, when possible, from across the geographic range of each species. Specimen codes, voucher information (including specimen age and preservation method), locality data, and the number of UCE loci recovered can be found in Supp Table S1 (online only). Voucher specimens are housed at the Cornell University Insect Collection, the Field Museum of Natural History, the collection of Scott Powell at the George Washington University, the John T. Longino Collection at the University of Utah, and the insect collection at the Universidade Federal de Uberlândia, Brazil.
Library Preparation, Target Enrichment, and UCE Sequencing DNA extractions and library preparation were conducted at the Field Museum's Pritzker DNA Laboratory. When possible, the remaining DNA extracts from Price et al. (2014) were used for UCE library prep (Supp Table S1 [online only]). For specimens new to this study, DNA was extracted in one of three ways: destructively from whole ants preserved in ethanol using the DNeasy Kit (Qiagen), destructively from the gaster of one ant using the DNeasy/MoBio PowerSoil Kit (Qiagen), or nondestructively with the specimen remaining intact (body wall pierced) after extraction using the DNeasy Kit. DNA was quantified using a Qubit fluorometer (Life Technologies Inc.) and sheared to a target size of 500 bp using a sonicator (Covaris). One to 100 ng of DNA was sheared per specimen (average 25 ng) and used as input to a modified genomic DNA library preparation protocol (Kapa Hyper Prep Library Kit, Kapa Biosystems, Faircloth et al. 2015) that used a generic SPRI bead substitute (speedbeads; Rohland and Reich 2012) and custom iTru dual-indexing barcodes (Glenn et al. 2016).
We grouped libraries into pools (4-10 libraries/pool) at equimolar ratios based on similarities of postamplification Qubit values. Values ranged from 0.099-58 ng/uL (with two samples containing concentrations too low to be detected by the Qubit). We then enriched pooled libraries using a set of 9,446 probes (MYcroarray, Inc.) targeting 2,524 ant-specific UCE loci (Branstetter et al. 2017b). We followed library enrichment protocols for the MYcroarray MYBaits kit (Blumenstiel et al. 2010), using a 0.2X concentration of the standard probes. A with-bead protocol was used for PCR recovery of enriched libraries (Faircloth et al. 2015). Reactions were purified using 1.0× speedbeads. Enriched pools were rehydrated in 30 µl of elution buffer. Pools were then run on an Agilent Bioanalyzer 2100 Instrument to assess concentration. Pools with insufficient concentrations (less than 4 nM) were not included in the final pool of pools. We created an equimolar pool of libraries at 4 nM concentration. The pools of pooled libraries were sequenced on two separate 150-bp paired-end Ilumina HiSeq runs (HiSeq 2500 at the DNA Sequencing Core Facility at the University of Utah and HiSeq 4000 at the Genomics and Cell Characterization Core Facility at the University of Oregon).

Processing and Alignment of UCE Data
We used Illumiprocessor (Faircloth et al. 2013b), a parallelwrapper for the package Trimmomatic (Bolger et al. 2014), to trim the demultiplexed FASTQ data of low-quality bases and adaptor indexes. The Phyluce package was used for further data processing (Faircloth 2016). We assembled cleaned reads using parallel wrappers around Trinity (Grabherr et al. 2011). Contig assemblies were aligned to a FASTA file of all baits to identify contigs representing enriched UCE loci from each taxon, and sequence coverage statistics for contigs containing UCE loci were calculated. We created FASTA files for each locus containing sequence data for taxa present at that locus and aligned them using MAFFT (Katoh et al. 2002). Alignments were edge trimmed using a wrapper around GBlocks (Castresana 2000). We then generated a subset of data that was 75% complete, meaning that each alignment (1,326 UCE loci) contained data for 75% of the taxa (following exploratory analyses with varying taxon completeness ranging from 50 to 90%). Each UCE alignment was concatenated into a separate PHYLIP-formatted matrix for concatenated analyses. Raw sequence reads in FASTQ format are deposited in the NCBI Sequence Read Archive (accession PRJNA #725541) and Trinity-assembled contig files (as well as the UCE data matrix, Downloaded from https://academic.oup.com/isd/article/6/1/9/6514763 by guest on 25 January 2022 tree files, and other data files) are available in the Dryad data repository (URL: https://doi.org/10.5061/dryad.4b8gthtcg).

Data Partitioning
To find the best partitioning scheme for the data, we implemented Sliding-Window Site Characteristics (SWSC-EN), which takes into account differing rates and patterns of molecular evolution within each UCE locus (Tagliacollo and Lanfear 2018). SWSC-EN divides each locus into three partitions using a sliding window approach based on the entropy of a site. Entropy serves as a proxy for the rate of evolution of a site in the absence of a phylogeny (Tagliacollo and Lanfear 2018). SWSC-EN is run through a python script (py_script/ analysis.py), the output of which is used as a configuration file for PartitionFinder 2 (Lanfear et al. 2012). In PartitionFinder 2, we used the rcluster search algorithm, the GTR+G model of nucleotide substitution, and model selection by Bayesian information criterion (BIC).

Concatenated Analyses
We performed ML analyses in RAxML 8.2.10 (Stamatakis 2014) through the CIPRES Science Gateway (Miller et al. 2010). We ran a series of analyses on an unpartitioned dataset in order to determine whether certain taxa should be excluded in the subsequent analyses. Though each UCE alignment contained data for 75% of the taxa, some taxa were only represented by small numbers of loci. Analyses with unpartitioned data indicated the most seemingly robust results, in terms of branch lengths, when taxa with fewer than 900 loci were discarded. We thus only retained samples with more than 900 UCE loci sequenced. We then used the partitioning scheme on the resulting dataset of 136 taxa and 995,575 sites, which defined 77 data partitions. For all ML analyses, we implemented the GTRCAT rapid bootstrapping model with 500 bootstrap replicates. We then outputted the best-fitting ML tree with bootstrap replicates.
We implemented Bayesian inference in MrBayes v3.2.6 (Ronquist et al. 2012) using a reduced dataset. We used the program AMAS (Borowiec 2016) to calculate the number of parsimony-informative sites for each locus. We used the 250 most parsimony-informative loci, for a total of 83,385 sites, and to reduce the computational burden, we included only one representative per species, yielding 105 taxa. Thirty-two data partitions were defined based on output from PartitionFinder (partitioned by locus), and we applied the GTR+G model of evolution to all partitions. MrBayes was run twice for 20 million generations, sampling every 2,000 generations, with a burn-in of 25%. A majority rule consensus tree was outputted.

Estimation of Gene Trees and Species Tree
We performed a species tree analysis by inferring a gene tree for each UCE locus. Alignments were filtered for length to retain only loci with more than 200 bp for gene tree analyses, and we included only one representative per species. We partitioned the resulting 1,239 UCE loci using the SWSC-EN algorithm and then reconstructed gene trees by performing analyses in IQ-TREE v1.6.1 (best tree and 1,000 ultrafast bootstraps; Nguyen et al. 2015). We used IQ-TREE rather than RAxML for this step as the former is more efficient at running analyses for this large number of trees. Coalescent analyses were performed in ASTRAL-III v5.6.3 (Zhang et al. 2018), and a summary species tree with quartet support values (local posterior probabilities) was generated using the maximum likelihood best trees from all loci. In a separate analysis, the bootstrap trees generated with IQ-TREE for the 1,239 loci were used to perform the multi-locus bootstrapping procedure available in ASTRAL. We chose to not perform statistical binning (Bayzid et al. 2015) prior to gene tree estimation due to recent concerns about this procedure pertaining to widespread phylogenetic model misspecification that may bias species tree inference (Adams and Castoe 2019).

Divergence Time Estimation
We placed constraints on clade ages using a node-calibration approach. Four nodes, representing seven fossil species, were constrained based on a previous phylogenetic analysis of morphological characters for extant and fossil species (Price et al. 2016). Fossil species were included when their placement was well-supported (posterior probability [PP] ≥ 0.90) in the former analysis. Our current phylogeny includes 12 undetermined species that were not included in the Price et al. (2016) study. However, our species determinations placed them robustly in existing species groups and our subsequent analyses indicate that the phylogenetic placements of the undetermined species are highly supported. In particular, the undetermined species of C. "coff. clade sp. nov.", C. "coff. clade undet.", C. "pin. sp. 15", and C. "pin. dark" have strong support in the clades where the fossils are located, so we included the undetermined species as part of the node calibrations. The following are the node calibrations used in the analysis: (i) Cephalotes bloosi de Andrade and Baroni Urbani, 1999 (Hymenoptera: Formicidae) (Dominican amber) was used to calibrate the node uniting the coffeae group; (ii) Cephalotes caribicus de Andrade and Baroni Urbani, 1999 (Hymenoptera: Formicidae) (Dominican amber) was used to calibrate the node uniting Cephalotes kukulcan Snelling, 1999 (Hymenoptera: Formicidae), Cephalotes scutulatus (Smith, 1867) (Hymenoptera: Formicidae), Cephalotes frigidus (Kempf, 1960) (Vierbergen and Scheven, 1995) (Hymenoptera: Formicidae) (Dominican amber) were used to calibrate the node uniting the multispinosus group. The estimated age of Cephalotes amber is 15-20 million yrs (Iturralde-Vinent and MacPhee 1996, Grimaldi andEngel 2005, Kraemer 2007), so we assigned fossil calibrated nodes an exponential prior distribution with a zero offset value of 15 million yrs, corresponding to the youngest suggested age of the fossils. We assigned a mean of 10 million yrs to account for the fact that fossils represent minimum ages and that the true node age is likely to be older than the fossil.
For our divergence dating inference in MCMCTree, which is part of PAML v4.9e (Yang 2007), we used the "approximate likelihood" method to calculate the likelihood function during MCMC iteration (Reis and Yang 2011). Using this method and given the generally efficient nature of the MCMCTree program, we were able to include all species and all loci in our 75%-taxa-complete UCE dataset. Prior to running the program, we removed the two outgroup samples and reduced sampling to one tip per species or unidentified morphospecies to avoid uneven sampling across the tree (retaining independent tips for currently described species inferred as polyphyletic, e.g., Cephalotes basalis (Smith, 1876)). We used the sister R package "MCMCtreeR" (Puttick 2019), designed to more easily set and visualize intuitive node priors, to assign "skewNormal" node priors based on the same four fossil calibrations described above, as well as a uniform prior on the root age of crown Cephalotes. MCMCTree utilizes hard minimum and soft maximum bounds, with 95% of the prior distribution falling within designated age ranges. For skewNormal fossil calibrations, we assigned a minimum age of 15 Mya and a maximum age of 30 Mya. We assigned the root node a prior age range of 26 Mya to 55 Mya, based on age ranges inferred from previous studies (Price et al. 2014, Ward et al. 2015. These priors were assigned to a starting tree with the topology from our RAxML inference. Due to failure of convergence using the independent rates relaxed clock model, we implemented the correlated rates clock model in MCMCTree. After initial test runs, we set the "rgene_gamma" (α = 2, β = 30, αD = 1) and "sigma2_gamma" (α = 1, β = 10, αD = 1) priors to values with means close to initial preliminary run values for these parameters. To ensure adequate convergence, we ran five separate MCMC iterations for 10 million generations with 10,000 samples, checked ESS values and convergence in Tracer v1.7.1 (Rambaut et al. 2018), and combined the MCMC output from each run into a single file using LogCombiner v2.4.0 (Bouckaert et al. 2019) removing the first 10% from each run as burn-in. We also conducted identical runs with the priors only (i.e., no data) to assess the impact of our data on the posterior distribution.
We used the posterior tree with RAxML-inferred topology and mean node ages from the 95% HPD distribution inferred in MCMCTree for downstream macroevolutionary analyses.

Shifts in Species Diversification
In order to detect significant shifts in diversification across our timescaled Cephalotes phylogeny, we used BAMM v2.5.0 and BAMMtools v2.1.7 (Rabosky 2014). We used a global sampling fraction of 0.6 to reflect incomplete sampling of the entire genus under an expectation that our sampling was relatively random across the phylogeny, and set values for the "expectedNumberofShifts", "lambdaInitPrior", "lambdaShiftPrior", and "muInitPrior" priors using the "setBAMMpriors" function in BAMMtools, which is designed to select appropriate values for priors based on features of the input dataset. All other run parameters were left at the default. We ran the MCMC chain for 10 million generations, with a write frequency of 10,000, and assessed convergence in Tracer. Using BAMMtools, we produced an average phylorate plot displaying the diversification rate(s) inferred by BAMM as well as identified the 95% Credible Set of distinct shift configurations (i.e., the set of distinct shift configurations that account for 95% of the probability of the data).

Historical Biogeographic Range Inference
We inferred the biogeographic history of Cephalotes using BioGeoBEARS v1.1.2 (Matzke 2013) with our time-scaled MCMCTree phylogeny. We implemented the Dispersal, Extinction, Cladogenesis (DEC) model without the jump dispersal/founder-event speciation parameter ("J"), due to concerns about the DEC+J model pertaining to the overweighting of the contribution of the "jump" parameter to model likelihood and increased probability of erroneous inferences that find that the data are entirely explained by cladogenetic events (Ree and Sanmartín 2018). For the broad-scale biogeographic patterns we are focused on here, we defined three biogeographic regions: north of and including the Isthmus of Panama ("Central America"), South of the Isthmus of Panama ("South America"), and the Caribbean including the Antilles ("Antilles"). These regions allowed for the testing of whether the putative land connections of the Isthmus of Panama and the South America-Antilles land bridge may have facilitated the colonization and dispersal of the turtle ants. Species were assigned to each region based primarily on locality records from de Andrade and Baroni Urbani (1999), but also from AntWeb 2019 (antweb.org; accessed June 2019), specimen locality data, and data from S. Powell's collections at the George Washington University. We included undetermined species in the analysis and felt this was robust given that biogeographic regions were so broad and because a large majority of species (>80% spp.) inhabit only one region. We also have a relatively unique system where we can leverage fossil information for several nodes. Because we have fossil placement from the previous molecular + morphological analysis mentioned in the Divergence Time Estimation section above, we can include fossil constraints in our biogeographic reconstructions.
We set up two analyses. In both analyses, we did not constrain what areas a species could inhabit, and we allowed ancestral ranges to occur in all three regions. The first analysis was an unconstrained analysis (dispersal between all areas was equally likely) that did not include fossil constraints. The second analysis did include fossil constraints but, like the first analysis, was otherwise unconstrained in terms of dispersal between regions. We used the same fossil node constraints as with the divergence dating analysis. Cephalotes fossils are from either Dominican or Mexican amber, so they were assigned to either the Antilles region or the Central America region. To qualitatively assess the potential impact of putative land bridges on turtle ant diversification, we identified branches along which expansions into a new region occurs and evaluated if the inferred dates of such expansions consistently correspond to the putative age ranges of land bridge formations.
The dates of putative connections between these regions are somewhat controversial and not fully established, so we used date ranges to account for this uncertainty. The uplift of Panama could have begun as early as 10 Mya, with new island formation around this time, increasing connectivity between the South America and Central America regions and resulting in a well-established complete closure of the land bridge approximately 3.5 Mya (Bacon et al. 2015, Winston et al. 2017). Thus, we use the age range from 10 Mya to the present for the Central America-South America land bridge. The short-lived South America-Antilles land bridge (Iturralde-Vinent and MacPhee 1999), the historical existence of which is controversial (Graham 2018), is purported to have connected South America to the Antilles from around 35 Mya to 32 Mya, and thus we use this date range for our assessment. We conducted historical biogeographic range inference analyses with our MCMCTree phylogeny.

UCE Capture Statistics
Our 75% taxon-complete matrix includes 136 samples representing 72 described Cephalotes species (~60% of all described), 12 undetermined Cephalotes species, and one outgroup Procryptocerus species. The raw sequence data contained a mean of 6.3 million reads per sample with a mean length of 130 bp (Supp Table S2 [online only]). Assembly using Trinity produced a mean of 63,950 contigs (mean length = 391 bp; mean coverage = 8.1X), and through the UCE loci step we captured a mean of 1,801 UCE contigs per sample (mean contig length = 746 bp; mean coverage per UCE loci = 27.5×). The final, concatenated UCE loci data matrix was 995,575 bp long with 1,326 UCE loci alignments containing 181,312 informative sites (with 20.22% missing nucleotides, including gaps).

Phylogenomic Inference and Evolutionary Relationships
Our RAxML topologyinferred with the SWSC-EN partitioned dataset confirms species group monophyly in many species groups Downloaded from https://academic.oup.com/isd/article/6/1/9/6514763 by guest on 25 January 2022 that we sampled and that were previously designated based solely on morphology (de Andrade and Baroni Urbani 1999), while failing to support monophyly in several current species groups (see Fig. 1, which shows the current concept inferred in this study with reference to previous species group designations). Of the 19 sampled species groups, our results support monophyly in ten groups, as well as general monophyly of the basalis species group with the sole exception of C. manni (Hymenoptera: Formicidae), which here forms its own, putatively novel manni species group. As found by Price et al. (2014), we establish support for the division of the pinelli group into South American and Central American clades, with the Central American clade joining the texanus and bimaculatus groups and the South American clade joining the grandinosus group. Also consistent with Price et al. (2014), we find support for an angustus + fiebrigi + bruchi clade as well as a laminatus + pusillus clade. One of the four samples identified as C. minutus (Fabricius, 1804) places in the atratus group rather than the laminatus + pusillus clade. The specimen identity here was rechecked and confirmed, most likely suggesting an untraceable mislabeling of these sequence data. Additionally, after reverifying the identity of the original specimens, the placement of C_betoi_pow390a in the texanus + bimaculatus + pinelli (C. American) clade, C_undescribed_C1050419A in the laminatus + pusillus clade, and C_undescribed_pow420a in the angustus + fiebrigi + bruchi clade are also likely to represent mislabeled sequence data. Overall, our highly supported RAxML topology adds to the evidence of some species group paraphyly and polyphyly reported by Price et al. (2014), further demonstrating a likely need for taxonomic recategorization of current species groups. A couple relationships inferred here also differ from the previous topology inferred using a tip-dating approach and combined molecular and morphological data (Price et al. 2016), namely the placement of C. manni (described previously) and Cephalotes bimaculatus, which here is inferred as sister to Cephalotes scutulatus and Cephalotes betoi De Andrade, 1999 (Hymenoptera: Formicidae) but in Price et al. (2016) is inferred as sister to a larger subclade including Cephalotes texanus (Santschi, 1915) (Hymenoptera: Formicidae).
Thespecies group topologies of our Bayesian inference in MrBayes and our RAxML topology are nearly identical, with only three minor differences (Supp Fig. S1 [online only]): (1) the placement of the Cephalotes jheringi (Emery, 1894) (Hymenoptera: Formicidae)/ Cephalotes specularis (Brandão et al., 2014) (Hymenoptera: Formicidae) sister group within the angustus + fiebrigi + bruchi clade differs, inferred in RAxML as sister to a larger clade including Cephalotes fiebrigi; (2) in the RAxML inference, "tiny ang" clade sp. is sister to C. targionii (Emery, 1894) (Hymenoptera: Formicidae) and "2angcl" sp. whereas in the MrBayes inference the "tiny ang" clade sp. is sister to the other two species; and (3) the placements of the hamulus and clypeatus species groups are rotated, separated by only a very small branch in both inferences. Our gene-tree approach in ASTRAL also resulted in a phylogeny with a highly similar topology to our RAxML inference, with the following exceptions (Supp Fig. S2 (Oliveira et al. 2021) (Hymenoptera: Formicidae) and two undetermined species in ASTRAL but only sister to C. monicaulyssea in RAxML; and (2) the hamulus and clypeatus species groups are inferred as a sister clade.

Divergence Dating
Divergence dating with MCMCTree inferred a crown node age for extant Cephalotes of 53.64 Mya (Fig. 2, Supp Fig. S3 [online only]). This age is relatively similar to the crown age estimated from the most recent Cephalotes phylogenies inferred using a limited set of gene markers (46 Mya: Price et al. 2014;43 Mya/48.2 Mya: Price et al. 2016), but still pushes the origin of the group back approximately 8 My. We also ran our MCMCTree runs without data to assess the impact of priors on our inference. The priors-only runs suggest that the priors have considerable influence on the posterior distribution, but the runs including data confirm that prior distributions are indeed updated and informed by the data, which significantly impact the inferred age of the crown node and influence other node ages throughout the tree. In particular, the data push the age of the crown node (along with other deep nodes) to a significantly older date than that inferred in the priors-only runs (40.85 Mya vs. 53.64 Mya; Supp Figs. S3 and S4 [online only]). Furthermore, while inferred priors-only node ages plotted against inferred with-data node ages show a linear relationship, supporting a significant influence of the priors, there is nevertheless a moderate amount of deviation from a tight straight line, suggesting considerable influence of the data on inferred posterior node ages (Supp Fig. S5 [online only]).

Historical Biogeographic Range Inference and Species Diversification
Our historical biogeographic range inference analyses in BioGeoBEARS demonstrate numerous phylogenetically independent range shifts, contractions, and expansions relative to our three focal biogeographical regions. Across the MCMCTree time-scaled phylogeny, both the unconstrained analysis and the four-fossil constrained analysis inferred 17 range shifts or expansions between the South American region and the Central America region (Fig. 2, Supp Fig. S7 [online only]). In the unconstrained analysis, 14 (~82%) of these shifts/expansions occur on a branch that overlaps the extended putative time period of Panamanian land bridge establishment, and in the four-fossil constrained analysis, 13 (~76%) of shifts overlap with this period. In both analyses, only one shift is associated with a sample (C_minutus_CSM3023a) whose placement is likely due to sequence error as discussed above. In the unconstrained analyses, three shifts or expansions between South America and the Antilles were inferred, while the fossil-constrained analyses identified five such shifts/expansions (including one expansion from Central America into the Antilles). Only one of these shifts (33%) occurs on a branch that overlaps with the putative time period of GAARlandia land bridge establishment in the unconstrained analysis, while in the 4-fossil constrained analysis, all shifts occur outside of this time range. Therefore, our data are consistent with the hypothesis that the Panamanian land bridge significantly impacted Cephalotes diversification and range evolution, but do not support a significant role for the putative GAARlandia land bridge.
Diversification rate shift analyses in BAMM failed to detect any significant rate shifts across the time-scaled Cephalotes phylogeny. The average phylorate plot demonstrates a single diversification rate Downloaded from https://academic.oup.com/isd/article/6/1/9/6514763 by guest on 25 January 2022 regime from the crown to the present (Fig. 3). Out of 901 samples from the posterior, 94% included zero rate shifts, 5.7% included one rate shift, and 0.78% included two rate shifts. Similarly, the 95% Credible Shift Set includes only a single rate configuration with one rate regime (Supp Fig. S6 [online only]). Therefore, we do not find any evidence of shifts in diversification driven by our focal biogeographic range shifts, land bridge formations, or any other potential trait.

Discussion
In this study, we utilized a phylogenomic approach by sequencing ultraconserved elements (UCEs) and integrated these data with fossil information to infer phylogenies of the turtle ant genus Cephalotes with maximum likelihood, Bayesian, and species tree methods. Our inferred topologies are largely consistent with previous studies that integrated molecular data (Price et al. 2014(Price et al. , 2016, but our increased molecular sampling suggests that one additional species group currently designated under the morphology-based classification is not monophyletic (the basalis species group). Furthermore, historical biogeographic range inference using our chronogram of Cephalotes evolution supports the importance of the South America-Central America land bridge in generating numerous independent range shifts within the genus. Nevertheless, these analyses did not support a putative South America-Antilles land bridge as an important factor in turtle ant diversification. Additionally, we did not infer any shifts in diversification rate across the time-scaled phylogeny, including no concordance with our focal land bridge formations. De Andrade and Baroni Urbani (1999) developed a detailed 131-character morphological matrix for all species, both extant and extinct. Their landmark efforts resulted in the first phylogenetic hypothesis for Cephalotes, as well as hypotheses about putative species groups. Yet these analyses did not use a model-based framework and lacked support at many nodes. A molecular phylogeny including approximately half of the described species using five gene fragments was published in 2014 (Price et al. 2014). It showed that many of the previously proposed species groups, based entirely on morphological characters, held up to examination with molecular data, but that the relationships among clades was quite different than originally proposed. That study also inferred an origin of crown Cephalotes in the Eocene (46 Mya). A follow-up phylogenetic analysis combined the molecular data from 2014 and the morphological matrix in a total evidence, tip dating approach to infer a near species-complete phylogeny and the tempo of lineage and phenotypic diversification within the turtle ants (Price et al. 2016). This approach was possible because the morphological matrix included fossil species that could be included as tips in the phylogeny. The resulting phylogeny was mostly congruent with the phylogeny from 2014, in terms of topology and divergence time estimates.
Here, we demonstrate that even with significant increases in the amount of DNA data utilizedper species, and an increase from 61 to 72 described Cephalotes species with molecular data included, most relationships are consistent with previous molecular work. This is especially striking when considering that the UCE approach also captures different kinds of markers rather than exclusively coding gene markers (which are also included as a subset of the total UCE dataset). Two particularly difficult relationships to resolve are the positions of the hamulus and clypeatus species groups, which differed in each of the three inference methods we utilized (maximum likelihood, Bayesian, and gene-tree). More specifically, while the relative position of the hamulus species group within the phylogeny was consistent across our analyses and with the previous two studies that used molecular data (Price et al. 2014(Price et al. , 2016, the position of the clypeatus species group differed substantially from previous analyses and its relationship to the hamulus group changed across each of our analyses. It is not clear why the placement of the clypeatus group is so challenging, but it may result from rapid divergence or introgression, two processes known to complicate phylogenetic inference (Whitfield andLockhart 2007, Leaché et al. 2014) and that may prove fruitful as future areas of investigation. The hamulus group is restricted to the Caribbean island Hispaniola and adjacent islets and the clypeatus group is a South American clade, and both are on long branches in our phylogeny, so it is also plausible that undetectable species extinction over 50 million years (or unusually low rates of extinction after an initial allopatric speciation event) complicate placement of these clades even with considerably increased sampling of both coding and noncoding regions. Furthermore, the clypeatus species group does contain two additional described species that have been collected very rarely but may also shed light on these relationships. Lastly, the goal of the Price et al. (2016) phylogeny was near-complete coverage using all available data (achieving 97.5% of described species at the time of study), and the lower support values for numerous nodes in the resulting phylogeny were likely driven by the lack of molecular data for approximately half of the species. Given that current species group designations are established exclusively from morphology, it is not surprising that the placement of a couple taxa (i.e., C. manni and C. bimaculatus) inferred using any molecular data at all would differ from inferences in Price et al. (2016) inferred using morphological data. Overall, it is possible that within Cephalotes, increased taxonomic sampling is more influential than increased genetic sampling for achieving improved phylogenetic resolution.
Our biogeographic results support the importance of the Panamanian land bridge but find little evidence to support the importance (or existence) of the South America-Antilles land bridge. We infer a large number of independent range shifts or expansions between the South America and Central America regions, and a large majority of these occur on branches that are consistent with the putative age range of the uplift of Panama (Fig. 2). Our data cannot, however, differentiate between an "early dynamic colonization" model and a "complete closure model" as done by Winston et al. (2017), given that most branches where range changes occur span the entire length of the relevant time period (3.5-10 Mya).
Intriguingly, the early origin of a large Central American clade (Price et al. 2014, Fig. 2) and a few other apparently long-distance dispersal events between South America and Central America, were not concordant with the South America-Antilles (GAARlandia) land bridge. As turtle ant species nest in tree cavities found in everything from main limbs to terminal twigs (de Andrade and Baroni Urbani 1999, Powell et al. 2020), it is plausible that, occasionally, a tree stem carrying a living Cephalotes colony traversed the oceanic region separating the land masses. Such rare events have been invoked for other wingless taxa that are highly likely to have dispersed via rare, long-distance dispersal (Gillespie et al. 2012). Reproductives in Cephalotes also have wings and thus may occasionally disperse over longer distances in the air, although average dispersal distances of ant queens and males tend to be low compared to other winged taxa (Chapuisat et al. 1997, Hardy et al. 2008, Suni and Gordon 2010, Türke et al. 2010, Helms 2018. Notably, our inclusion of robust fossil information (which provides strong evidence of historical Cephalotes presence in the Antilles) only increased the number of inferred range shifts/expansions between South America and the Antilles relative to our inference with no fossil information, and also reduced the number of changes occurring within the putative land bridge age range from one to zero. This evidence strongly suggests that fossil data has a significant impact on biogeographic inference in Cephalotes likely due to extinction of species in larger extant clades no longer present in regions like the Caribbean, and while the available fossil record is unusually robust and informative for a single ant genus, it may yet be inadequate to resolve the historical biogeographic dispersal of the group.
We inferred one diversification rate regime (i.e., no rate shifts) across all of Cephalotes diversification (Fig. 3), suggesting no change in diversification rates following dispersal across the Panamanian land bridge or in association with any other historical event. This result contrasts with previous work that did infer a rate shift in a clade endemic to the Chacoan biogeographical region (Price et al. 2014(Price et al. , 2016; the clade corresponds approximately to the previously defined fiebrigi species group that is part of clade IV in Fig. 1). The Chacoan region (Morrone 2006), also known as the "dry diagonal" (Vanzolini 1963), is a distinct environmental region in South America that is composed of the seasonally dry Caatinga, Cerrado, and Chaco biomes with native vegetation thought to have originated less than 10 Mya and as recently as 4 Mya or less for the central Cerrado Biome (Price et al. 2014). The inferred rate shift in the turtle ant "Chacoan clade" was coincident with the young Chacoan biomes, and especially the 4 Mya age of the Cerrado. It was therefore proposed that the historical formation of this distinct regional habitat drove diversification in turtle ants via novel ecological opportunity. Our molecular sampling of the species that are part of the endemic Chacoan clade differs from the previous studies and may have influenced our lack of an inferred rate shift in the Chacoan clade, including nine described taxa out of 14 in the Chacoan clade, versus seven in Price et al. (2016). Nevertheless, we must also note that date calibration in the present study inferred an older crown age and clade ages than did previous studies, which may also contribute to the lack of an inferred recent burst of diversification in our analysis. Notably, the substantially older age of the Chacoan clade inferred in the present study (approximately 18 Mya vs. approximately 4 Mya) largely disassociates the endemic Chacoan clade of the turtle ants from the estimated age of the Chacoan biomes and especially the estimated 4-million-year-old Cerrado biome. The Cerrado biome also has exceptionally high turtle ant endemism across other clades (de Andrade and Baroni Urbani 1999, Price et al. 2014, collection records of the present study), which is also not captured in the 4 Mya-present range of the present phylogeny. These discrepancies between phylogeny dating, rate shift analyses, and the high turtle ant endemism in the Chacoan region then raises the important biological questions of when and where this endemic present-day diversity of the Chacoan region was generated. Furthermore, it is also possible that BAMM underestimated rate shifts across our phylogeny, as has been noted as a potential general issue with the method (Meyer et al. 2018). While outside the biogeographical focus of the present study, these will be important future questions for unraveling turtle ant diversification.
Overall, our results suggest that complex phylogenetic and biogeographic processes have shaped the evolution of Cephalotes turtle ants. The increased sampling of molecular data, incorporating orders of magnitude more DNA data than previous studies as well as expanded species representation, provided considerable value in resolving the backbone of the phylogeny. In addition to solidifying the previously unresolved relationships among a number of clades, our analyses also reaffirmed the need for new taxonomy on some of the previously defined morphological species groups. A couple additional groups were rendered polyphyletic or paraphyletic by our analyses, expanding the number of such issues identified by previous analyses (Price et al. 2014). Partly addressing this need for updated taxonomy, recent taxonomic work on the Brazilian turtle ants synonymized bruchi and the laminatus species groups under fiebrigi and pusillus groups, respectively (Oliveira et al. 2021). Both changes are consistent with our phylogenetic analyses here. Additionally, manni was proposed as a new species group based entirely on morphological characters, providing corroborating support for our phylogenetic placement of manni as a relatively distant relative of the basalis group in which it was previously considered a part.
Our dating and biogeographical analyses further suggest that the uplift of Panama appears to have been a significant event in the history of turtle ant diversification. Yet further research is necessary to enhance our understanding of several apparent long-distance dispersal events among South America, Central America, and the Antilles, including an early origin of a large Central American clade. We found no evidence of shifts in diversification rates associated with the uplift of the Panamanian land bridge or any other historical event. This lack of any rate-shift signal within the group may represent an example of how increased data and taxonomic sampling can impact macroevolutionary analyses (Chang et al. 2020). Nevertheless, these analyses also disassociated significant patterns of endemism within the group from the proposed origin dates of the biomes in which this diversity is found. This raises a number of exciting future questions about the origin of turtle ant diversity at finer biogeographical and temporal scales than were tackled here. Increased molecular sampling of the genus, using the phylogenomic approaches used here, will therefore likely be necessary to resolve remaining phylogenetic uncertainty and a suite of additional biogeographical questions in the turtle ants.

Supplementary Data
Supplementary data are available at Insect Systematics and Diversity online.