Conflicting evolutionary histories of the mitochondrial and nuclear genomes in New World Myotis 1

15 The rapid diversification of Myotis bats into more than 100 species is one of the most extensive 16 mammalian radiations available for study. Efforts to understand relationships within Myotis have 17 primarily utilized mitochondrial markers and trees inferred from nuclear markers lacked resolution. Our 18 current understanding of relationships within Myotis is therefore biased towards a set of phylogenetic 19 markers that may not reflect the history of the nuclear genome. To resolve this, we sequenced the full 20 mitochondrial genomes of 37 representative Myotis, primarily from the New World, in conjunction with 21 targeted sequencing of 3,648 ultraconserved elements (UCEs). We inferred the phylogeny and explored 22 the effects of concatenation and summary phylogenetic methods, as well as combinations of markers 23 based on informativeness or levels of missing data, on our results. Of the 294 phylogenies generated 24 from the nuclear UCE data, all are significantly different from phylogenies inferred using mitochondrial 25 genomes. Even within the nuclear data, quartet frequencies indicate that around half of all UCE loci 26 conflict with the estimated species tree. Several factors can drive such conflict, including incomplete 27 lineage sorting, introgressive hybridization, or even phylogenetic error. Despite the degree of 28 discordance between nuclear UCE loci and the mitochondrial genome and among UCE loci themselves,

The bat genus Myotis (Order Chiroptera, Family Vespertilionidae) comprises more than 100 species that originated during the last 10-15 million years (Stadelmann et al. 2007), making it one of the most successful, extant, mammalian radiations.Myotis are distributed worldwide, excluding polar regions, and generally share a similar ecological niche: aerial insectivory.Myotis species often exhibit little morphological differentiation and, as a result, the rate of cryptic speciation within the genus is thought to be high.For example, specimens identified as M. nigricans and M. albescens form multiple paraphyletic lineages distributed throughout the phylogeny of Neotropical Myotis (Larsen et al. 2012).
Confounding matters, the morphological variation that exists is often a poor indicator of species-level relationships.Early classifications of Myotis identified three major morphotypes (Findley 1972).
Subsequent phylogenetic analyses of the mitochondrial cytochrome-b (cytb) gene demonstrated paraphyletic origins of each morphotype, suggesting frequent convergent evolution (Ruedi et al. 2001).
These same analyses demonstrated that geography was a better predictor of phylogenetic relationship than morphology (Ruedi et al. 2001, Stadelmann et al. 2007).
The ability of mitochondrial markers to resolve a well-supported topology does not guarantee that the mitochondrial tree represents the species tree (for examples see Willis et al. 2014, Li, Gang et al. 2016, Leavitt et al. 2017).The lack of recombination and uniparental inheritance of the mitochondrion means that it is transmitted as a single genetic unit that is susceptible to evolutionary processes that may cause its history to diverge from the history of the species (Edwards et al. 2009).The most widely accepted phylogenies of Myotis rely heavily on mitochondrial data and even phylogenies containing nuclear data demonstrate an over reliance on mitochondrial markers for resolution, likely swamping out potentially conflicting signals from the nuclear genome.For example alignments of the nuclear RAG2 and mitochondrial cytb contained 162 and 560 variable characters, respectively (Stadelmann et al. 2007).Despite these concerns, we find ourselves in a situation where our understanding of evolutionary relationships, our basis for conservation strategies, and our biogeographic hypotheses are all founded, almost entirely, on mitochondrial phylogenies.
Targeted sequencing of ultraconserved elements (UCEs; Faircloth et al. 2012) to collect sequence data from thousands of loci across the nuclear genome has resolved a number of difficult phylogenetic problems (for examples see Faircloth et al. 2012, McCormack et al. 2013, Green et al. 2014, Faircloth et al. 2015, McGee et al. 2016).Here we used UCE baits to collect ~1.4 Mbp from ≥3,600 nuclear loci in addition to random shotgun sequencing to collect full mitochondrial genomes in 37 taxa, primarily representing New World (NW) Myotis.Rather than analyzing each sequence data set once, or a handful of times, we chose to use a range of sampling, partitioning, and inference methods to fully protein coding genes (Figure 1A).Despite being the most common topology, bipartition frequencies and clade probability values were lower than nucleotide based datasets, as would be expected with more slowly evolving amino acid sequences.Analysis of RNA coding genes recovered two topologies depending on the method of phylogenetic inference.The remainder of the analyses recovered various trees without any pattern relating to partitioning scheme or inference method.A majority rule consensus tree for all 28 analyses is shown in Figure 1C.The mitochondrial consensus tree reflects previously reported mitochondrial phylogenies for Myotis.The New World Myotis are generally monophyletic splitting into Nearctic and Neotropical clades with the Old World taxon, M. brandtii, sister to the Neotropical clade.Ambiguity, represented by polytomies, is found only in terminal relationships.

UCE assembly and alignment, and phylogenetic inference
We averaged 3.29 million reads per sample after demultiplexing reads from the UCE-enriched, sequencing pool.These reads were assembled into an average of 5,778 contigs per sample (min = 1,562 M. nyctor, max = 11,784 M. nigricans 3 ).Recovery rates for UCE loci varied across taxa.Of the 5,500 loci in the Amniote probe set, we successfully recovered 3,898 UCE loci, 3,648 loci from five or more samples and 212 loci in all 37 samples (Table 2).On average, 3,332 UCE loci were recovered per sample, ranging from 1,106 (M.nyctor) to 4,008 (M.keaysi).Repetitive sequences, identified via RepeatMasker searches, were minimal, occupying less than 0.02% of sites across all UCE alignments.
In all, 294 different combinations of alignment, partitioning, and inference method were used to identify 175 unique nuclear topologies.The most common tree was recovered 45 times (Figure 1B).All nodes were highly-supported based on clade probability values and all but three nodes were highly supported with maximum likelihood bootstrap replicates.A consensus tree (Figure 1D) of the 294 nuclear topologies resolves New World Myotis (and M. brandtii) as a single polytomy that excludes M.
volans and represents a departure from previous mitochondrial phylogenies, including the one recovered herein.Comparison of the mitochondrial and nuclear consensus trees (Figures 1C and D) reveals substantial differences in the relationships within the genus.
As noted above, we were concerned that phylogenetic error, sampling bias or other factors may be driving the conflict between the mitochondrial and nuclear trees.We started by investigating the effects of matrix composition (or completeness) on phylogenetic inference using UCE loci by generating 10 alignments having varying levels of matrix completeness.Nine alignments were composed of loci with 15 to 95% (at 10% intervals) of taxa.The tenth alignment was composed of all loci which were recovered in 100% of specimens examined.For example, in the '15%' matrix, all loci that were represented in at least 15% of species (n taxa ≥ 5) were included in the alignment (n loci = 3,648).In the '100%' matrix, only loci which were identified in all species (n taxa = 37) were included (n loci = 212).
Alignments were partitioned using three schemes: unpartitioned, individually partitioned by locus, or combined partitions.Alignment lengths, numbers of informative characters and number of partitions identified by PartitionFinder are available in Table 2.The result was 10 alignments that were each partitioned three ways.These were analyzed with Bayesian, maximum likelihood, and coalescent methods using RaxML, ExaBayes, ASTRID, ASTRAL-II and SVDquartets as described in the Methods.
Bootstrap topologies stabilized in each alignment and partition combination within 150 replicates and all Bayesian runs converged in less than ten thousand generations.Unfortunately, computational limits forced us to abandon the individually partitioned, Bayesian analyses.In general, the same alignment produced the same topology regardless of inference method or partitioning scheme with the only exception being the terminal relationships of the M. levis/M.albescens clade in the combined vs.
unpartitioned Bayesian analysis of the 100% complete data matrix.
Trees were also generated from data matrices incorporating UCE loci of differing lengths (Hosner et al. 2016).All 3,648 loci were grouped into ten bins based on locus length so that the first bin contained the 365 shortest loci, the second bin contained the 366 th to the 731 st , and so on.The number of informative characters per bin ranged from 1,115 to 6,995 and the number of informative characters was correlated with average locus length (Supplemental Figure 1).On average, only 2.6% of characters in each bin were parsimony-informative (Supplemental Figure 1).Each of the ten length-based alignments recovered slightly different topologies.Terminal relationships were generally stable across analyses with the majority of differences between topologies found in the early bifurcations of the ingroup (Myotis).
From the above results, we observed that, in general, longer alignments produced well resolved topologies with significant nodal support regardless of the phylogenetic method or partitioning scheme used.Analyses of smaller data sets were more likely to recover unique topologies.
Given the goal of fully exploring nuclear tree space and generating as many reasonable nuclear UCE based topologies for comparison with the mitochondrial tree, we decided to randomly sample a limited number of nuclear UCE loci to create small alignments that would be more likely to result in unique topologies.We therefore randomly subsampled UCE loci to create 100 unique data sets of 365 loci.Loci were concatenated in each replicate data set and analyzed using maximum likelihood in RAxML.Of the 100 alignments analyzed, 80 unique topologies were generated (mean Robinson-Foulds distance = 4.3).
In addition to concatenation methods we explored phylogenetic tree space of the UCE data set using species tree methods.Normalized quartet scores from ASTRAL-II (Mirarab et al. 2015) analyses were consistent among all analyses, with scores ranging from 0.540 to 0.553, and the number of induced quartet gene trees ranging from 7,745,739 (100% complete 212 gene trees) and 63,042,410 (15% 3648 loci).SVDquartets (Chifman et al. 2014) sampled all 66,045 quartets.On average, the total weight of incompatible quartets was 2.84%.Similar to the concatenated analysis, we inferred coalescent-based species from the same 100 subsamples of 365 loci described above.Despite being generated from the same underlying data, summary and concatenation methods only recovered the same tree in 1 of 100 attempts.
Finally, we used weighted and unweighted statistical binning to combine individual gene trees into supergenes, estimate the supergene phylogeny, and then infer the species tree from the supergene trees.The 3,648 loci were combined into 528 binned loci with 480 bins containing seven loci each and 48 bins containing six loci each.Binning methods increases the normalized quartet scores from an average of 0.547 with gene trees to 0.672 for the binned-unweighted and 0.673 for the binned-weighted supergene trees.Given the relatively even distribution of loci into bins the negligible difference in quartet/species tree discordance between the unweighted and weighted supergene trees is expected.
Binning methods have been shown to recover incorrect species trees under coalescent methods by altering gene tree frequencies.In addition, it is possible that support for incorrect trees can actually increase with the number of input gene trees (Liu et al. 2015).With these concerns in mind, both binning methods recovered the same topology which was the most common nuclear topology observed across all analyses (Figure 1C).

Topological comparisons
All mitochondrial and nuclear trees are available in Supplemental File 1.When visualizing all topologies in tree space, nuclear trees co-localized and were distinct from mitochondrial topologies (Figure 2A).Comparisons of Robinson-Foulds symmetrical differences show 26 symmetrical differences between the most similar nuclear and mitochondrial trees (Figure 2B).For comparison only 9 of 43,070 and 0 of 378 pairwise nuclear vs. nuclear and mitochondrial vs. mitochondrial tree comparisons, Figures 2C and D, respectively, exhibited more than 26 symmetrical differences.
Of the 45 analyses that recovered the most frequently observed topology (Figure 1D), 38 were Bayesian and RAxML searches that varied by matrix completeness and partitioning scheme.The fact that these analyses recovered the same topology is expected given that they are not independent.For example, a RAxML analysis of the 25% complete data set uses 1.26 Mb of the 1.38 Mb of data present in the 15% complete matrix, sharing 91% of the data.Analyses that directly varied the alignments and/or sampled less data (e.g.randomly sampling loci) were more likely to generate unique topologies than the nested analyses described above.Of the 200 analyses that randomly sampled UCE loci 164 unique topologies were observed.This implies that when analyses of large data sets produce well-resolved trees with significant nodal support, sampling smaller portions of the data, may provide a mechanism for creating phylogenetic uncertainty not represented by typical tree scoring metrics.

DISCUSSION
We analyzed 3,648 UCE loci and mitochondrial genomes of 37 Myotis bats and outgroups.The mitochondrial and nuclear UCE phylogenies recovered distinct topologies (Figure 1), whether comparing the most commonly recovered or consensus topologies.Previous work with UCE loci demonstrated that support for deep divergences varied based on the number of loci examined (McCormack et al. 2013).
Further, bootstrap replicates and clade probability values can be inaccurate metrics of nodal support (Douady et al. 2003, Hedtke et al. 2006).We used varied the input data and phylogenetic parameters to produce a range of reasonable nuclear and mitochondrial topologies that may be more useful than overreliance on a tree resulting from a one or two analyses.Rather than rejecting concordance between the two data types from a single analysis we took steps to analyze both data sets to generate as many viable topologies as possible, possibly revealing hidden overlap between marker types that are masked by phylogenetic error.
We recovered 175 and 14 unique nuclear and mitochondrial topologies from 294 and 28 analyses using multiple methodologies, sampling strategies, and phylogenetic parameters.By varying our phylogenetic methods we show that recovering a consistent topology is difficult even when using the same marker type, thus making it difficult to determine the "best" topology.Rather than choosing single trees to represent the nuclear and mitochondrial data, we use consensus trees to represent the ambiguity that is present through all analyses along with example phylogenies conforming to the most common topology (Figure 1).Despite our efforts to identify as many plausible, distinct nuclear and mitochondrial trees as possible we were unable recover overlapping topologies between marker types suggesting that nuclear UCE loci and the mitochondrial genomes of Myotis have distinct evolutionary histories.
A comparison of topologies in tree space illustrates the differenced in trees generated for each data set (Figure 2A).Intra-marker discordance, as measured by differences in nuclear vs. nuclear or mitochondrial vs. mitochondrial topologies is low.Pairwise comparisons of nuclear topologies contain an average of 8.3 symmetrical differences, mitochondrial comparisons contain an average of 10.8 differences.By contrast, pairwise differences between nuclear and mitochondrial topologies contain an average of 33.3 symmetrical differences.The two most similar nuclear and mitochondrial topologies contain 26 symmetrical differences.Of the 43,072 pairwise comparisons of nuclear topologies, only 27 contain more than 26 symmetrical differences.Likewise of the 378 pairwise comparisons of mitochondrial topologies only 1 contains more than 26 symmetrical differences.These observations indicate that discordance between marker types (Figure 2B) is much greater than within marker type (Figure 2C and 2D).All but the most divergent nuclear topologies are more similar to each other than any nuclear vs. mitochondrial comparison.
Conflict between mitochondrial and nuclear data may be driven by error in phylogenetic estimation or may reflect genuine discordance between the two marker types (Degnan et al. 2009, Huang, Huateng et al. 2010).We relied on multiple tree-inference methods (e.g.summary vs. concatenation), manipulated phylogenetic parameters (e.g.partitioning strategy), and sampling criteria (e.g.loci sampled in all taxa) to minimize the impacts of phylogenetic error on the data sets.In most cases, varying parameters or methodologies generated unique topologies with minor differences.In most cases unique topologies resulted from rearrangements of a few terminal taxa.For example, placement of M. volans and M. brandtii was often either sister to the remaining New World Myotis or as an early bifurcation between the Nearctic and Neotropical clades.M. vivesi was at times found as sister to the clade containing M. lucifugus, occultus and fortidens or as sister to the clade containing the neotropical Myotis.
Summary methods failed to recover the most common nuclear topology (Figure 1D) except when loci were binned prior to gene tree estimation.Summary methods also tended to recover more unique topologies than concatenation methods when analyzing data from the same gene(s).For example, the random sample analyses recovered 80 unique topologies using concatenation (RAxML) and 89 unique topologies with summary methods (ASTRAL-II).This likely has to do with the limited number of informative characters per locus and by extension limited phylogenetic signal per gene tree (Supplemental Figure 1).In these instances, limited phylogenetic signal per gene would likely lead to increased opportunity for phylogenetic error in gene tree estimation.Further supporting this idea, binning of compatible UCE loci may have indirectly increased phylogenetic signal resulting in the same topology that many of the concatenation analyses recovered.No other summary/coalescent method recovered this topology.
Previous studies that used primarily mitochondrial data recovered effectively the same relationships among Myotis as our mitochondrial analyses (Ruedi et al. 2001, Stadelmann et al. 2007, Roehrs et al. 2010, Larsen et al. 2012, Ruedi et al. 2013, Haynie et al. 2016).Thus, we are confident that the mitochondrial phylogeny we recovered here, and by others, reflects the true mitochondrial tree.
However, the mitochondrial topology may not adequately reflect the species history, particularly when considering the factors that cause incongruence between nuclear and mitochondrial gene trees.Causes of conflicting gene trees include horizontal transfer, gene duplication, introgressive hybridization, and incomplete lineage sorting.The rapid radiation of this clade as well as other biological factors suggests that some of these phenomena are more likely to have influenced the Myotis radiation than others.
Horizontal transfer of genes is thought to be rare in eukaryotes, but, vespertilionids in general (Thomas et al. 2011, Platt et al. 2014, Platt et al. 2016), and Myotis in particular (Pritham et al. 2007, Ray et al. 2007, Ray et al. 2008, Pagan et al. 2010), have experienced horizontal transfer of DNA transposons.
These events would not be reflected in our phylogeny because repetitive sequences were removed prior to phylogenetic analyses.More generally, gene duplications could create conflicting signal among individual UCE markers (ex. comparing non-orthologous UCE loci), but the number of gene duplication events would have to be very high to impact enough of the 3,648 UCE loci to confound the mitochondrial and nuclear phylogenies.Further ruling out gene duplication events as the dominant cause of conflicting phylogenetic signal is the fact that such events are likely depressed in Myotis as evidenced by their smaller genome size (~2.2Gb) and trend towards DNA loss (Kapusta et al. 2017) combined with low rates of paralogy in UCEs general (Derti et al. 2006).
Introgressive hybridization and reticulation could significantly influence the phylogenies of Myotis in a way that leads to conflicting signal between the nuclear and mitochondrial genomes (Sota 2002, Good et al. 2015).Hybridization in bats may be relatively common given their propensity to swarm at cave entrances for breeding purposes.In European Myotis, swarming has allowed for high degrees of hybridization between M. brandtii, M. mystacinus, and M. alcathoe (Bogdanowicz et al. 2012).Further, M. evotis, thysanodes, and keeni all experienced historical gene flow during their divergence (Carstens et al. 2010, Morales et al. 2016).We could also explain the differences between the mitochondrial and nuclear UCE phylogenies if Myotis experienced extensive incomplete lineage sorting during their radiation.Two factors influence the rate of lineage sorting, the fixation rate and the speciation rate (Hudson et al. 2002).Increasing the time to fixation and/or decreasing the amount of time between cladogenic events will increase the likelihood of incomplete lineage sorting.Myotis are generally long lived species (Dzeverin 2008) and underwent a rapid radiation between 5-10 MYA (Lack et al. 2010), suggesting that Myotis species are likely to experience higher levels of lineage sorting.The importance of these events -introgressive hybridization and incomplete lineage sorting -in driving the differences between the mitochondrial and nuclear phylogenies should be further explored.

Evolutionary history of Myotis
Our understanding of relationships within Myotis is heavily biased by mitochondrial data because nuclear markers were harder to collect and produced fewer informative sites (Ruedi et al. 2001, Stadelmann et al. 2007, Lack et al. 2010, Roehrs et al. 2010, Larsen et al. 2012, Ruedi et al. 2013, Haynie et al. 2016) or nuclear phylogenies with large numbers of makers contained limited numbers of taxa (Platt et al. 2015).Our UCE-based results indicate that nuclear trees vary substantially from the mitochondrial tree.Given that the nuclear and mitochondrial trees are different, we find it necessary to re-evaluate Myotis in the context of the nuclear data.
Paraphyly of M. nigricans and M. albescens was inferred from previous mitochondrial phylogenies.Larsen et al. (2012) identified a minimum of four and potentially twelve lineages in M.
albescens and M. nigricans.Our sampling included three M. albescens and two M. nigricans, compared to Larsen's 17 and 29 samples.Despite different mitochondrial and nuclear topologies overall, our mitochondrial and nuclear phylogeny recovered the same paraphyletic clade of three M. albescens samples and M. levis.Close relationships between these taxa was found in previous work and is expected.More importantly we did not find that M. albescens was paraphyletic across much of neotropical Myotis.We also found that M. nigricans is monphlyletic in the nuclear tree, but paraphyletic in the mitochondrial tree.These results from M. nigricans and M. albescens are interesting but further inference is limited due to low sample sizes for these taxa.
The original subgeneric taxonomy of Myotis was based on three morphotypes that were later shown to be the result of convergent evolution (Ruedi et al. 2001).If lineage-sorting affected the mitochondrial phylogeny, it is possible that the morphotypes truly are monophyletic.However, superimposing the previous subgeneric/morphological classification onto the species tree shows an interspersed distribution of morphotypes throughout the most common and consensus nuclear topologies (Figures 1A and B).Many strongly supported terminal relationships link species with different morphotypes.Based on these results, it appears that the three major morphotypes in Myotis are indeed a result of convergent evolution, as suggested by previous work (Ruedi et al. 2001, Stadelmann et al. 2007) despite the conflicting mitochondrial and nuclear phylogenies recovered Among the more dramatic differences between the nuclear and mitochondrial topologies is the placement of M. volans and M. brandtii as sister to all New World taxa by the nuclear data.Our mitochondrial analyses place M. volans within a Nearctic clade and M. brandtii directly in-between the Nearctic and Neotropical bifurcations (Figure 1A and C) as has been previously reported (Stadelmann et al. 2007).By contrast the consensus UCE tree (Figure 1D) places M. brandtii, a species distributed throughout the Old World, within a polytomy at the base of the New World Myotis radiation and M.
volans sister to all New World Myotis (including M. brandtii).The placement of M. brandtii in the nuclear UCE consensus tree does not necessarily contradict the mitochondrial phylogeny since it is an unresolved polytomy.However, it is worth noting that the most commonly recovered nuclear topology (Figure 1B) places M. brandtii sister to all New World Myotis (excluding M. volans).This relationship would more closely affiliate M. brandtii with other Old World taxa.The placement of M. volans as sister to all New World taxa (including M. brandtii) in the most common nuclear tree is a significant departure from previous work and, at first glance, does not make as much sense in a biogeographic framework.M.
volans is distributed across western and northwestern North America as far as far north as Alaska.M.
brandtii is distributed across much of Northern Europe and into the extreme eastern regions of Siberia.
The key may lie in understanding the biogeography and phylogenetics of a third species, M. gracilis, a species that we were unfortunately unable to include.
M. gracilis, along with M. brandtii, are the only two Myotis geographically distributed in the Old World, but phylogenetically affiliated with the New World Myotis (Stadelmann et al. 2007).If future work verifies the sister relationship between M. brandtii and M. gracilis, then we can envision a scenario where M. gracilis, M. brandtii, and M. volans are the result of cladegenic events that occurred during the transition of Myotis from the Old World to the New World.It is important to remember that this interpretation relies on a fairly dramatic departure from the currently accepted mitochondrial relationships of M. volans (represented here by a single sample) to other Myotis species and abandons the nuclear UCE consensus tree for the most frequently recovered UCE topology.In addition, our taxonomic sampling excludes Myotis species from the OW, Ethiopian clade, the sister clade to New World Myotis (Ruedi et al. 2001, Stadelmann et al. 2007, Lack et al. 2010, Ruedi et al. 2013).Taxa from the Ethiopian clade are needed to properly root the New World Myotis clade and understand its biogeographic origins.With these caveats in mind, the hypothesis presented here should be viewed as highly speculative.Increasing the number of Myotis lineages sampled will shed additional light on this hypothesis.
Other taxa with conflicting positions between marker types include M. lucifugus + M. occultus, M. fortidens, and M. vivesi.In general, these relationships are characterized by very short branches (Figure 1D) and are the most likely to be affected by incomplete lineage sorting or limited phylogenetic information.This could explain the strong support with the mitochondrial tree compared to the nuclear species tree, while allowing for a number of nuclear loci to disagree with the species tree, as well.
There are a number of monophyletic groups identified with nuclear data (Figure 1A) that share general morphologies.For example, all of the long-eared bats (septentrionalis, auriculus, evotis, thysanodes and keenii) represent a monophyletic group of higher elevation, forest-dwelling species that glean insects off of surfaces (Fitch et al. 1979, O'Farrell et al. 1980, Warner 1982, Manning et al. 1989, Caceres et al. 2000).The group represented by fortidens, lucifugus and occultus represent a relatively long-haired form of Myotis.While having a distinct dental formula, fortidens was historically described as a subspecies of M. lucifugus (Miller Jr et al. 1928) and occultus has alternately represented its own species or been considered a subspecies of lucifugus (Hollister 1909, Valdez et al. 1999, Piaggio et al. 2002).The clade consisting of keaysi, oxyotus, ruber, riparius, and diminutus represents a neotropical group of primarily woolly-haired bats (LaVal 1973).None of these relationships are monophyletic in the consensus or most common mitochondrial topologies.If the mitochondrial genome has been subjected to phenomena that obscure the true species tree then these species groups, along with their synapomorphic morphological features, can be reevaluated.

Conclusions
Relationships within Myotis, which until now have relied heavily on mitochondrial data, served as the basis for species identification (Puechmaille et al. 2012)

MATERIALS AND METHODS:
Taxon Selection Taxa were selected to span the major phylogenetic break points with emphasis on the Nearctic and Neotropical bifurcation as recovered in previous mitochondrial phylogenies (Stadelmann et al. 2007, Ruedi et al. 2013).In addition, multiple individuals morphologically identified as M. nigricans and M.
albescens were included to test for paraphyly as suggested by Larsen et al. (2012).Three Old World species of Myotis and the outgroup, E. fuscus, were included to root phylogenetic analyses.All field identifications were confirmed from voucher museum specimens.Information for all specimens examined is available in Table 1.
Library preparation, sequencing, and processing Genomic DNA was extracted from 33 samples using either a Qiagen DNEasy extraction kit or a phenol-chloroform/ethanol precipitation.DNA was fragmented using the Bioruptor UCD-300 sonication device (Diagenode, Denville, NJ, USA).Libraries were prepared using the Kapa Library Preparation Kit KR0453-v2.13(Kapa Biosystems, Wilmington, MA, USA) following the manufacturer's instructions with five minor modifications.First, we used half volume reactions.Second, subsequent to end repair, we added Sera-Mag Speedbeads (Thermo-Scientific, Waltham, MA, USA; prepared according to (Glenn et al. 2016)) at a ratio of 2.86:1 for end repair cleanup.Third, we ligated universal iTru y-yoke adapters (Glenn et al. 2016) onto the genomic DNA.Fourth, following adapter ligation, we performed one post-ligation cleanup followed by Dual-SPRI size selection using 55 µL of speedbead buffer (22.5mMPEG, 1M NaCl) and 25 µL of Speedbeads.Finally, we performed a PCR at 95 °C for 45 sec, then 14 cycles of 98 °C for 30 sec, 60 °C for 30 sec, 72 °C for 30sec, then 72 °C for a 5 minute final extension and a 12 °C hold using iTru5 and iTru7 primers to produce Illumina TruSeqHT compatible libraries (Glenn et al. 2016).
Libraries were quantified on a Qubit 2.0 (Life Technologies) and 83 ng from each library was added to create 5 pools of 6 or 7 libraries each.We then split the pools in two.One subsample was enriched for UCE loci, the other was not.UCE loci in the enriched library pools were captured using Tetrapods 5K version 1 baits from MYcroarray (Ann Arbor, MI, USA) following their MYbaits protocol v.
2.3.1 with overnight incubations (Faircloth et al. 2012).Enriched libraries were quantified with a Qubit and pooled with other unrelated samples prior to sequencing on an Illumina HiSeq 3000 to produce paired-end reads of ≤ 151 bases.The unenriched samples were sequenced on a separate run using a single lane of Illumina HiSeq 2500.All samples were demultiplexed with Illumina's fastq2bcl software.
Reads were quality filtered by removing any potential adapter sequence and trimming read ends once the average Phred quality over a four base window score dropped below 20 using the Fastx toolkit (Gordon et al. 2010).
Assembly, annotation, and phylogenetic analysis of the mitochondrial genome Raw reads from the unenriched libraries were used to generate mitochondrial genomes via MitoBim (Hahn et al. 2013).This program used MIRA (Chevreux et al. 1999)  Once assembled, each mitogenome was annotated via MITOS (Bernt et al. 2013).Annotated genes were manually validated via BLAST to confirm sequence identity and length.Protein coding genes were checked for stop codons using EMBOSS's transeq program (Rice et al. 2000).When a stop codon was found, we used the raw reads to verify the sequence.We used BWA v0.7.12 (Li, Heng et al. 2009) to align the reads to the Mitobim assembled mitogenome to verify base calls from Mitobim.The protein coding rRNA and tRNA genes from each assembly were aligned using MUSCLE and concatenated into three different alignments containing only protein coding genes, only rRNA and tRNA genes, or all protein coding and RNA genes.A fourth alignment of all protein coding genes was translated to amino acids and concatenated into a single alignment.
Several different partitioning schemes were examined for each of the mitochondrial alignments.
Alignments were either partitioned by gene, by codon, or by gene type (rRNA and tRNA vs. protein coding).Genes were partitioned individually except in the instances where two genes overlapped.These regions were partitioned separately from the individual genes resulting in three partitions for the two genes: a partition for gene A, a partition for gene B, and a partition for the overlapping nucleotides of gene A and B.

Assembly, and phylogenetic analysis of nuclear UCEs
Quality filtered raw sequence reads from the UCE-enriched libraries were assembled into contigs using the Trinity assembler (Grabherr et al. 2011) with a minimum kmer coverage of 2. We used Phyluce to identify those assembled contigs that were UCE loci.We also harvested UCE loci from Eptesicus fuscus (GCA_000308155.1),Myotis brandtii (GCA_000412655.1),M. davidii (GCA_000327345.1),and M. lucifugus (GCF_000147115.1)genome assemblies using the Phyluce package (Faircloth 2016).Once extracted from Trinity and genome assemblies, we aligned all UCE loci MAFFT (Katoh et al. 2002) and trimmed the aligned data with gBlocks (Castresana 2000).Repetitive sequences (i.e. transposable elements) in each alignment were identified with RepeatMasker and trimmed where found.
Each alignment was analyzed using three different partitioning schemes.Unpartitioned alignments were simply concatenated UCE loci treated as a single unit.These alignments are referred to herein as "unpartitioned".Fully partitioned alignments were concatenated alignments mitochondrial genes that were partitioned by locus.These alignments are referred to herein as "locus partitioned".
Finally, PartitionFinder v1.1.1 (Lanfear et al. 2012) was used to combine individual loci into an optimal partitioning scheme.The combined partitioning schemes for each alignment were identified with PartitionFinder v1.1.1 (Lanfear et al. 2012).Rather than searching for best-fit substitution models for each locus or partition, the GTR+Γ model of sequence evolution was assigned to all loci (Darriba et al. 2015) except in the case of amino acid alignments where the MtMam model was used.Initial trees for PartitionFinder were generated using RAxML v7.4.1 (Stamatakis 2006) with linked branch lengths.
Partitioning schemes were heuristically searched using the hcluster algorithm.Partitioning schemes were chosen using the Bayeisan information criterion.
Finally, each alignment and partitioning scheme was analyzed using Bayesian inference and maximum likelihood phylogenetic methods.Bayesian phylogenies were generated with the MPI version of ExaBayes (v1.4.1) using 4 independent runs of 4 chains each.ExaBayes runs were terminated after 10 million generations only if the average standard deviation of split frequencies was less than 0.01.The first 25% of samples were discarded after which every 1,000 th generation was sampled.Proper sampling, post burn-in was inspected via Tracer v1.6.(Rambaut et al. 2014) and effective sample sizes greater than 200 were considered acceptable.Posterior probability values greater than 95% were considered to be significant.RaxML (v8.1.3)was used to estimate and score the maximum likelihood phylogeny with the rapid bootstrapping option and 10,000 bootstrap replicates.We define strongly supported bipartitions as those present in 95-100% of bootstrap replicates and moderately supported bipartitions are present in 85-95% of bipartitions (Wiens et al. 2008).All mitochondrial analyses are described in Table 3.
Alignment types.-Different combinations of UCE loci were used to create unique matrices based on 1) locus distribution, 2) locus length, or 3) random locus sampling.The first set of alignments (locus distribution) was determined by the number of taxa represented in the UCE alignments (phyluce_align_get_only_loci_with_min_taxa; Faircloth 2016), or degree of completeness.Matrices were constructed using loci which were present in 100% (number of specimens = 37), 95% (n = 35), 85% (n = 31), 75% (n = 27), 65% (n = 24), 55% (n = 20), 45% (n = 16), 35% (n = 12), 25% (n = 9), and 15% (n = 5) of specimens examined.These 10 groups were non-exclusive, so a locus that was assembled in all specimens (100% complete) would also be included with loci present in only 55% of specimens.On the other hand, a locus found in only 55% of specimens would not be included in the 100% complete data set.Each set of UCE alignments was concatenated using phyluce_align_format_nexus_files_for_raxml and a nexus character block was created using the phyluce_align_format_nexus_files_for_raxmlcharsets option.These datasets then served as the basis for downstream phylogenetic analyses.For example, when a partitioning methodology (discussed below) was tested, it was performed for each of the 100%, 95%, 85%, etc. alignments.In addition to partitioning schemes, the effect of missing data was examined using Bayesian and maximum likelihood methods.
The second alignment criteria combined UCE loci of similar sizes.Pervious coalescent analyses of UCE data showed that sub-sampling the most informative loci can result in different topologies (Meiklejohn et al. 2016).Rather than using coalescent based analyses we used concatenation of UCE loci to identify different topologies based on length.Under these assumptions, UCE loci were sorted into ten groups based on their length and the predicted correlation between length and number of informative characters was confirmed (Supplemental Figure 1).UCE loci in the same size cohort were combined into a single alignment, partitioned by locus, and analyzed with RAxML using the methods described below.
The final set of alignments was generated by random sampling of UCE loci.In large phylogenetic analyses, systematic error can result in highly supported, but incorrect topologies due to compounding of non-phylogenetic signal (Rodríguez-Ezpeleta et al. 2007).By randomly reducing the dataset and replicating the maximum likelihood analyses, we can reduce the potential effects of compounding error.
Roughly 10% of the dataset, 365 loci, were randomly sampled, concatenated, and partitioned by locus to create 100 new alignments, which were then analyzed with RAxML using the methods described below.
Partitioning strategies.-Alignments were analyzed using three different partitioning schemessingle, locus, and combined-similar to the mitochondrial partitioning schemes described above.
Unpartitioned alignments were simply concatenated UCE loci treated as a single genetic unit.Rather than searching for best-fit substitution models for each UCE locus or partition, the GTR+Γ model of sequence evolution was assigned to all loci (Darriba et al. 2015).Initial trees for PartitionFinder were generated using RAxML v7.4.1 (Stamatakis 2006) with linked branch lengths.Partitioning schemes were heuristically searched using the hcluster algorithm and the best scheme was chosen using the Bayesian information criterion.
Inference methods.-Phylogenetic trees were generated with three different phylogenetic inference methods across five different inference implementations including concatenation and summary tree methods.
Maximum likelihood trees were inferred for each alignment and partitioning combination using were drawn onto the best scoring tree from the initial RAxML searches for each of the respective data sets.Alignments based on UCE length or randomly sampled were analyzed using slightly different methods.In both cases UCE loci were partitioned individually and nodal support was calculated using 100 bootstrap replicates using the RAxML fast-bootstrapping option.For all maximum likelihood analyses, we define strongly supported bipartitions as those present in 95-100% of bootstrap replicates and moderately supported bipartitions are present in 85-95% of bipartitions (Wiens et al. 2008).
Bayesian analyses were conducted using ExaBayes v1.4.1 (Aberer et al. 2014).For all Bayesian analyses four independent runs of four chains each were run in parallel for a minimum of one million generations sampling every thousandth generation and applying a GTR+Γ substitution model for each partition.After one million generations, analyses continued until the standard deviation of the split frequency between chains was less than 0.01.The "-M 3" option was used to reduce the memory footprint of all ExaBayes runs.Proper sampling, post burn-in, was inspected via Tracer v1.6.(Rambaut et al. 2014).Effective sample sizes greater than 200 were considered acceptable.Posterior probability values greater than 95% were considered significant.An extended majority rule consensus tree was created from all trees after the first 25% of trees were discarded using TreeAnnotator v1.7.0 (Rambaut et al. 2013) and parameter estimates across all runs were calculated with Tracer v1.6 (Rambaut et al. 2014).
Species trees were calculated from gene trees for individual UCE loci recovered in five or more taxa using the GTR+Γ substitution model and fast bootstrapping (-f a) option in RAxML and 1,000 bootstrap replicates.In general, gene trees were classified based on the degree of completeness (i.e.number of taxa represented) similar to the way we treated individuals as described above.
Species trees were estimated and bootstrapped using three different programs.ASTRAL-II v4.10 Mirarab et al. 2015) was used to build a summary tree.Support values for bipartitions in the tree were generated from 100 bootstrap replicates using site as well as site and locus resampling (Seo 2008).
UCE loci are relatively short markers with few informative characters from which to build gene trees.Errors in gene tree estimation may reduce the accuracy of summary methods and phylogenetic inference (Liu et al. 2009, Leaché et al. 2011, DeGiorgio et al. 2014, Mirarab et al. 2016).We used weighted and unweighted statistical binning to combine gene into compatible supergenes, increasing the number of informative informative characters per "locus" and reducing the phylogenetic error (Mirarab et al. 2014, Bayzid et al. 2015).The gene trees used for the summary tree methods described above were used rather than re-estimating trees.Bifurcations supported by more than 50% of the bootstrap replicates were retained for each gene tree.Alignments from compatible trees were concatenated into a single supergene alignment.Trees for supergenes were estimated using RAxML.The best trees for each supergene, as defined by log likelihood score, were retained from 500 searches.
Bipartition support was estimated from 500 bootstrap replicates.For all analyses, the GTR+Γ model of substitution was used and each gene in the supergene alignment was partitioned separately.The resulting supertrees were then used for species tree estimation using ASTRAL-II.For the unweighted analysis, all supertrees were included in the pool of trees.For the weighted analysis, supertrees were weighted according to the number of genes combined in the supergene alignment.For example, if a supergene was a composite of six genes, the supertree was present 6 times compared to a composite of five genes which would be represented only five times.Support for the weighted and unweighted species trees was estimated by site and site and locus re-sampling (Seo 2008) for 100 bootstrap replicates in ASTRAL-II.

Topological comparisons-
. CC-BY-NC 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.It is made available under The copyright holder for this preprint (which was not this version posted June 16, 2017.; https://doi.org/10.1101/112581doi: bioRxiv preprint Trees recovered from all analyses were compared to each other in order to quantify the differences between topologies.Branch lengths have different meanings based on the type of analysis.
For example, ASTRAL-II branch lengths are representative of coalescent units.ASTRID doesn't even calculate branch lengths.For accurate tree comparisons, branch lengths were stripped from all trees.

Acknowledgments:
We would like to thank the following museums, collection managers, and collaborators for the    Findley (1972).Species with more than one sample are designated with a superscript that is referenced in Table 1.Specimens derived from whole genome alignments are designated with a superscript "G".Table 2. General alignment information.For a subset of analyses a series of alignments were generated based on the number of taxa per locus.Thirty-seven taxa were examined so an alignment with all 37 taxa was considered 100% complete.Parsimony-informative characters make up a small portion of the total alignment.The number of partitions in the combined partitioning scheme was calculated with PartitionFinder.
. CC-BY-NC 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.It is made available under The copyright holder for this preprint (which was not this version posted June 16, 2017.Sup File 1 -supFile-all_trees_2017-06-07.nexus.txt-Trees generated from all analyses in nexus format.General alignment information.For a subset of analyses a series of alignments were generated based on the number of taxa per locus.Thirty-seven taxa were examined so an alignment with all 37 taxa was considered 100% complete.Parsimony-informative characters make up a small portion of the total alignment.The optimum partitioning scheme was calculated with PartitionFinder. RAxML v8.1.3(Aberer et al. 2014).The best scoring (lowest -lnL) tree from each dataset was identified from 100 random starting trees and bootstrapped 100 times using the GTR+Γ in both cases.The autoMRE function in RAxML v8.1.3was used to determine the need for additional bootstrap replicates beyond the initial 100(Pattengale et al. 2009).A stopping criterion was set a priori if the weighted Robinson-Foulds distance was less than 5% in 95% of random permutations of computed bootstrap replicates(Pattengale et al. 2009).If necessary, an additional 100 bootstrap replicates were computed until the convergence stopping criteria were met.Finally, bipartition frequencies of bootstrap replicates tissue loans necessary to complete this work: Joseph Cook (Museum of Southwestern Biology), Museum of Vertebrate Zoology, Manuel Ruedi (Natural History Museum of Geneva), Santiago Burneo (Pontificia Universidad Catolica del Ecuador Museo de Zoologia), Heath Garner, Robert Bradley and Caleb Phillips (Texas Tech University Natural Science Research Laboratory), Link Olson (University of Alaska Museum of the North), Cody Thompson and Priscilla Tucker (University of Michigan Museum of Zoology).In addition, we would like to thank the Texas Tech HPCC (http://www.depts.ttu.edu/hpcc/) for providing the computational resources necessary to complete this project.This work was supported by the National Science Foundation, DEB-1355176.Additional support was provided by College of Arts and Sciences at Texas Tech University.All sequence data is available at NCBI's Short Read Archive (SRP095250).UCE contig assemblies are available at (PENDING ACCESSION NUMBER).Supplemental files include: a nexus file containing all estimated species trees (supFile-all_trees_2017-06-07.nexus.txt),and tables listing each of the phylogenetic analyses.An archived file of individual UCE alignments in fasta format (uceLociAlignments.tgz),and an archived file containing all individual UCE gene trees in newick format (UceGeneTrees.tgz) are available through DRYAD (PENDING ACCESSION NUMBER).

Figure
Figure and Table Captions

Figure 1 .
Figure 1.Comparison of nuclear and mitochondrial phylogenetic trees of Myotis.The most common mitochondrial (A) and nuclear (B) topologies.The mitochondrial topology arises from Bayesian and

Figure 2 .
Figure 2. Comparison of mitochondrial and nuclear topologies.(A) All trees recovered from mitochondrial and nuclear analyses were visualized in tree space using multidimensional scaling of ; https://doi.org/10.1101/112581doi: bioRxiv preprint Sup Fig 1. Phylogenetic content of UCE loci based on length.The length of a UCE locus is correlated with the number of phylogenetically informative characters.UCE loci were sorted by length and ten bins of alignments were created so that the shortest loci were combined into one alignment, the next shortest set of loci were combined … etc. Parsimony informative characters made up a minor part of each alignment.
If true, then the mitochondrial genome is as likely to reflect the true species tree as any single UCE locus chosen at random.Phenomena such as lineage sorting, reticulation, and introgression have likely influenced the genomes of Myotis and should be accounted for in subsequent work.It is (Boyles et al. 2007)heses(Simões et al. 2007), and even conservation recommendations(Boyles et al. 2007).Previous studies using nuclear data have largely been uninformative or utilized too few samples to draw definitive conclusions.Trees estimated from ~3,650 nuclear loci and 295 phylogenetic analyses recovered 175 topologies, none of which are congruent with the mitochondrial phylogeny of Myotis.Conflict between the mitochondrial and nuclear trees as well as among individual nuclear loci suggest that the Myotis radiation may have been accompanied by high levels of incomplete lineage sorting and possible hybridization.Rather than placing emphasis on the mitochondrial tree, it may be more appropriate to consider it for what it really is: a single gene on par with a single UCE locus, albeit one with many more phylogenetically informative characters.possible that the Myotis radiation is more accurately reflected as a hard polytomy or a phylogenetic network rather than a strictly bifurcating phylogeny.

Table 1 .
Specimens examined.Collection abbreviations: Museum of Southwestern Biology (MSB), Museum of Vertebrate Zoology (MVZ), Natural History Museum of Geneva (MHNG), Pontificia Universidad Catolica del Ecuador Museo de Zoologia (QCAZ), Texas Tech University Natural Science Research Laboratory (TK), University of Alaska Museum of the North (UAM), University of Michigan Museum of Zoology (UMMZ).

Table 1 -Specimens Examined Species Name herein Museum identification num.
Museum of Southwestern Biology (MSB), Museum of Vertebrate Zoology (MVZ), Natural History Museum of Geneva (MHNG), Pontificia Universidad Catolica del Ecuador Museo de Zoologia (QCAZ), Texas Tech University Natural Science Research Laboratory (TK), University of Alaska Museum of the North (UAM), University of Michigan Museum of Zoology (UMMZ).Samples beginning with "GCA_" represent genome assemblies available through NCBI. .CC-BY-NC 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.It is made available under The copyright holder for this preprint (which was not this version posted June 16, 2017.; https://doi.org/10.1101/112581doi: bioRxiv preprint