Abstract

Species complexes undergoing rapid radiation present a challenge in molecular systematics because of the possibility that ancestral polymorphism is retained in component gene trees. Coalescent theory has demonstrated that gene trees often fail to match lineage trees when taxon divergence times are less than the ancestral effective population sizes. Suggestions to increase the number of loci and the number of individuals per taxon have been proposed; however, phylogenetic methods to adequately analyze these data in a coalescent framework are scarce. We compare two approaches to estimating lineage (species) trees using multiple individuals and multiple loci: the commonly used partitioned Bayesian analysis of concatenated sequences and a modification of a newly developed hierarchical Bayesian method (BEST) that simultaneously estimates gene trees and species trees from multilocus data. We test these approaches on a phylogeny of rapidly radiating species wherein divergence times are likely to be smaller than effective population sizes, and incomplete lineage sorting is known, in the rodent genus, Thomomys. We use seven independent noncoding nuclear sequence loci (total ∼ 4300 bp) and between 1 and 12 individuals per taxon to construct a phylogenetic hypothesis for eight Thomomys species. The majority-rule consensus tree from the partitioned concatenated analysis included 14 strongly supported bipartitions, corroborating monophyletic species status of five of the eight named species. The BEST tree strongly supported only the split between the two subgenera and showed very low support for any other clade. Comparison of both lineage trees to individual gene trees revealed that the concatenation method appears to ignore conflicting signals among gene trees, whereas the BEST tree considers conflicting signals and downweights support for those nodes. Bayes factor analysis of posterior tree distributions from both analyses strongly favor the model underlying the BEST analysis. This comparison underscores the risks of overreliance on results from concatenation, and ignoring the properties of coalescence, especially in cases of recent, rapid radiations.

Species complexes that are undergoing rapid radiation represent a major challenge in molecular systematics because relationships among lineages can be obscured by ancestral polymorphism retained in the component gene trees (e.g., Avise and Wollenberg, 1997; Maddison, 1997). Coalescent theory suggests that individual gene trees will often fail to match the lineage tree when divergence times (t = number of generations; tree branch lengths are in “coalescent units” of t/(2N)) are very short relative to the effective population size of the respective ancestral populations (Kubatko and Degnan, 2007; Degnan and Rosenberg, 2006; and references therein). This phenomenon is evident in the many empirical studies where organelle or nuclear gene sequences are non-monophyletic across reproductively isolated species (e.g., Dolman and Moritz, 2006). One solution to the problem is to increase the number of loci sampled, with the expectation that the multilocus signal will overwhelm the noise attributable to stochastic lineage sorting (Rokas et al., 2003). However, even with a large number of loci, the lineage trees produced from concatenated data sets can converge on incorrect topologies when interior branch lengths are short relative to ancestral effective sizes (Degnan and Rosenberg, 2006; Kubatko and Degnan, 2007). The problem is further compounded in cases where a new species is derived recently from a small area within the larger range of a geographically subdivided species, resulting in persistent paraphyly of the latter (Neigel and Avise, 1986; Patton and Smith, 1994). A second approach to overcoming the “noise” from lineage sorting is to increase the number of gene copies (alleles) sampled per taxon with the expectation that the larger number of coalescence events in common ancestors will provide information on relative divergence times (Edwards and Beerli, 2000) and, thus, on the topology of the lineage tree (Maddison and Knowles, 2006; Rosenberg, 2002; Takahata, 1989).

These issues have come into sharper focus because of the growing capacity to generate multilocus data sets for phylogeographic and phylogenetic analyses (Philippe et al., 2005). Population-level coalescent analyses are becoming increasingly sophisticated for phylogeographic studies, allowing estimation of key parameters such as mutation rate–scaled population sizes, migration rates, and divergence times (reviewed in Hey and Machado, 2003). Remarkably, here the inferred gene trees can be regarded as a nuisance parameter, rather than the aim of the exercise (Rosenberg and Nordborg, 2002). In phylogenetics, much progress has been made in estimation of gene trees, including model-based estimation of sequence parameters (Ronquist and Huelsenbeck, 2003). However, methods to estimate lineage trees, the object of the phylogenetic exercise, from one or more gene trees using coalescent methods remain underdeveloped. Common approaches include quantifying consensus across individual gene trees (Slowinski and Page, 1999) and estimation of a single tree from concatenated sequences (e.g., Rokas et al., 2003; Kubatko and Degnan, 2007). However, as has long been recognized, a better solution is to incorporate models of stochastic mutation along with gene coalescence directly into estimation of lineage trees (Felsenstein, 2004; Maddison, 1997; Takahata, 1989). Initial explorations of this approach suggest that it will increase both efficiency and accuracy, such that with a reasonable number of loci and individuals it will be possible to infer lineage relationships even in cases of rapid radiations (Edwards et al., 2007; Maddison and Knowles, 2006).

Here we compare two approaches to estimating species trees: partitioned Bayesian analysis of concatenated sequences (Ronquist and Huelsenbeck, 2003) and a newly developed Bayesian method that uses a coalescent framework to simultaneously estimate gene trees and species trees from multilocus data (Edwards et al., 2007; Liu and Pearl, 2007). This latter approach has been found to be more efficient than concatenation for estimating lineage trees, and it also avoids potential convergence on incorrect topologies (Edwards et al., 2007; Kubatko and Degnan, 2007). Specifically, we evaluate an important extension of this coalescent approach, particularly useful for recent radiations, which is to estimate species trees when multiple individuals are sequenced per taxon. In the past, relationships were estimated among alleles via molecular approaches (restriction mapping or DNA sequencing), by inferring trees of haplotypes and then, indirectly, the lineage relationships. Where loci are congruent and also monophyletic within species, this second step is relatively straightforward. Otherwise, a different approach is needed to avoid wrongly assuming that all genes have the same history (e.g., Degnan and Salter, 2005; Pamilo and Nei, 1988). The coalescent-based method described to date estimates a species tree from a single sampled allele per taxon (Liu and Pearl, 2007). The new method, tested here, applies a coalescent-based approach that allows for divergent histories of independent genes and directly infers the species tree, given samples of multiple alleles per gene per species.

The subjects of this study, pocket gophers of the genus Thomomys, exemplify both rapid diversification and the challenges of inferring species borders and relationships (Patton and Smith, 1994; Steinberg and Patton, 2000). Within the genus, differing demographic and historical scenarios have been described as responsible for incomplete lineage sorting among various pairs of species. Some species have likely diverged recently through vicariant events, as judged by their partially overlapping niches but separate specialties, such that lineage sorting is incomplete but will likely become complete with time (e.g., possibly T. mazama and T. monticola; Thaeler, 1968a). Other species have arisen rapidly by peripheral isolation from a “parent” species, so that incomplete lineage sorting results from the capture by the “daughter” species of a subset of genetic variation of the “parent.” Because a portion of the genome of the new species is identical to that of the original species, lineage sorting only proceeds stochastically. With no additional diversifying force, these lineages will not become completely sorted unless one or the other species goes extinct (e.g., T. bottae and T. townsendii; Rogers, 1991a, 1991b). In a third known scenario, species have diverged and become effectively reproductively isolated, but neither drift nor selection has been strong enough to eliminate shared gene histories, and infrequent hybridization may occur, which would serve to prevent complete stochastic separation (Patton et al., 1972; Patton, 1990).

In this genus, extant species currently range from the Pacific coast eastward to Texas, the Dakotas and Manitoba, from Baja California and much of Mexico northward through the southern edge of Canada (Fig. 1). Species of Thomomys have been recognized following a biological species concept based on interactions at contact zones as well as genetic, chromosomal, and morphological evidence (Patton, 1993). They are not generally sympatric, but rather replace each other along elevational, soil, or other ecological gradients. On the basis of variation in morphology and chromosome number, two subgenera, Thomomys and Megascapheus, have been recognized (Patton and Smith, 1981; Thaeler, 1980), and this primary division has been supported by phylogenetic analysis of mtDNA and nuclear genes (Smith, 1998; Spradling et al., 2004). Although the number of species described is modest (subgenus Thomomys: T. clusius, T. idahoensis, T. mazama, T. monticola, T. talpoides; subgenus Megascapheus: T. bottae, T. bulbivorus, T. townsendii, T. umbrinus; Fig. 1), the phenotypic and genetic diversity within some species is immense (Hafner et al., 1987; Patton and Smith, 1990; Thaeler, 1980). Most prominently, T. bottae exhibits among the highest levels of diversity known within any single species of mammal (summarized in Patton and Smith, 1990); allozyme genetic distances among populations are commonly in the range of DNei = 0.2–0.3 (Patton and Smith, 1990); mtDNA divergences among populations average 12% (Smith, 1998); it contains the largest amount of karyotype variation known (e.g., 1 to 16 pairs of telocentrics within a subspecies; Patton, 1972); and greater than 200 subspecies have been recognized in the past (subspecies within California have been revised and greatly reduced; Patton, 2005).

Figure 1

Map of North America with ranges of each Thomomys species shown. (a) Subgenus Megascapheus. (b) Subgenus Thomomys. Ranges depicted on this map are based on those in Reid (2006) except south of the United States, where they are taken from Patterson et al. 2005.

Figure 1

Map of North America with ranges of each Thomomys species shown. (a) Subgenus Megascapheus. (b) Subgenus Thomomys. Ranges depicted on this map are based on those in Reid (2006) except south of the United States, where they are taken from Patterson et al. 2005.

Within T. bottae, Patton and Smith (1990) recognized six major genetic groups in the western portion of its range (DNei > 0.3) from a combined morphometric and allozyme analysis, but mtDNA is incompletely sorted among these (Smith, 1998). The distinction of one of these groups, from the Baja peninsula, was recently corroborated using 500 bp of mitochondrial cytochrome b sequence (Alvarez-Castaneda and Patton, 2004). Overall, the extensive geographic diversification within this complex has been attributed to a combination of patchily distributed habitats; chromosome change; genetic drift within geographically limited, dynamic metapopulations; and divergent selection on morphology in relation to variation in soils (Patton 1985; Patton and Smith, 1990). The boundaries among these genetic groups differ across data types, and a multilocus nuclear genealogical analysis of among-group relationships may provide further understanding of their relationships.

Similar processes likely underlie extensive geographic and genetic diversification in other species of Thomomys (e.g., Davis, 1938; Hafner et al., 1987; Thaeler, 1980). Notably, within the subgenus Thomomys, species and populations vary in diploid number (40 to 60), with three of the five species showing extensive chromosomal variation within them. Taxa at all levels (subgenus, genus, species, subspecies) in both subgenera vary dramatically in color (e.g., Getz, 1957), chromosomal number (Thaeler, 1968b, 1980), karyotype (Patton and Sherwood, 1982; Thaeler, 1974), C-value (Sherwood and Patton, 1982), craniomandibular dimensions (Thaeler, 1980), allozyme allele frequency (Patton and Smith, 1981), and mitochondrial lineage history (Patton and Smith, 1994).

Fossils of the genus Thomomys have been found in the Blancan North American Land Mammal Age (NALMA), dating to 4 Ma (Lindsay et al., 2002), indicating that the lineage leading to the genus predates 4 Ma. Korth provides the only comprehensive cladistic analysis of fossils of the Geomyidae (Korth, 1994; Korth and Chaney, 1999). The oldest fossil specimen of either extant lineage in the family (Progeomys, ancestor to the Geomyini, or Parapliosaccomys, ancestor to the Thomomyini) is a Coffee Ranch Progeomys, which dates to 6.5 to 7 Ma (Dalquest, 1983; Korth, 1994; Tedford et al. 2004; Woodburne and Swisher, 1995). Thus, the split between Geomyini and Thomomyini had to have occurred prior to that date.

The primary questions addressed by the present study are (1) How do inferences of lineage history derived from partitioned Bayesian analysis of concatenated sequences and analyses that incorporate the coalescent compare when applied to multilocus data for a rapidly radiating group? (2) Does the inclusion of multiple individuals within each taxon provide any further resolution of, or change the interpretation of, lineage history? (3) Do multilocus genealogical data change the resolution or the nature of lineage relationships described in previous phylogenetic analyses of members of this group? If so, at what hierarchical level?

We compare phylogenetic resolution along the following hierarchy: between subgenera (Megascapheus versus Thomomys), among species within subgenera, and among major genetic groups of T. bottae. We predict that our data will improve the consistency of lineage relationships that differed across data types (Smith, 1998), and that this improvement should be most evident when species are represented by multiple individuals and the coalescent-based approach is used (Edwards et al., 2007; Maddison and Knowles, 2006). Further, where divergence time is short relative to ancestral Ne (i.e., within T. bottae, and between T. bottae and T. townsendii, and, potentially, within T. talpoides and between T. talpoides and T. idahoensis), it is possible that concatenation could yield an incorrectly resolved tree (Degnan and Rosenberg, 2006; Kubatko and Degnan, 2007), whereas the coalescent approach could yield a more accurate representation, either an unresolved or differently resolved topology (Edwards et al., 2007).

Materials and Methods

Taxon Sampling

We included 28 individuals, 26 from the genus Thomomys. We included all species within Thomomys (Patton, 2005) other than T. clusius, the Wyoming pocket gopher, for which tissue was not available. For all species other than the geographically restricted T. bulbivorus, we included multiple individuals, where possible from populations or subspecies determined by other genetic methods to be substantially divergent or geographically distant (Appendix): T. bottae (Tbo; n = 12), T. bulbivorus (Tbu; n = 1), T. idahoensis (Tid; n = 2), T. mazama (Tma; n = 2), T. monticola (Tmo; n = 2), T. talpoides (Tta; n = 3), T. townsendii (Tto; n = 2), and T. umbrinus (Tum; n = 2). As outgroups, we include two phylogenetically divergent species from the sister tribe within the Geomyidae, Orthogeomys heterodus and Geomys breviceps (see Spradling et al., 2004), though sequences for the latter could be generated for only four of the seven loci. Twelve individuals of T. bottae were included specifically to test the cohesiveness of five “genetic groups” (Alvarez-Castaneda and Patton, 2004; Patton and Smith, 1990) within California and Baja California, Mexico, and the phylogenetic position of T. bottae awahnee, a subspecies restricted to the Yosemite Valley, which is highly divergent genetically from the remaining populations (Patton and Smith 1990; Smith 1998). Specific localities as well as the basis for selection of each individual used in the phylogeny are listed in the Appendix.

Data Collection

Genomic DNA was extracted using a Sigma GenElute Blood extraction column (Sigma-Genosys). Seven noncoding nuclear sequence loci were optimized for use across all members of the genus and the outgroups. These loci range from 471 to 820 base pairs (bp) in length (mean = 627 bp) and were developed from clone sequences screened and selected from a genomic library created from T. bottae. Sequences have been deposited in GenBank (clone sequences: accession numbers 116062, 116064, 116067, 116070, 116073, 110076, 116079; phylogenetic sequences: accession numbers: 116080 to 116272).

Each locus for which primers were designed was amplified once at each of three temperatures in three species of subgenus Megascapheus and two species of subgenus Thomomys to select loci suitable for phylogenetic-level comparison of the genus. Optimal thermal-cycling conditions for each locus are reported in Table 1. A locus was excluded if it sequenced well in most taxa but failed in one or more taxa because of amplification of duplicated regions, which were detected by the presence of heterozygotes in every individual sequenced. In some cases, primers were modified to avoid amplification of duplicated loci. These primers are named to indicate their derivation from a clone sequence. Each clone sequence was compared to multiple genome databases in order to assess its homology to an annotated genomic region; all loci used here have a low probability of being homologous to a protein-coding region of the genome based on these comparisons.

Table 1

Genetic locus information.

Locus name Primer sequence Ta (°C) (Mega) Ta (°C) (Thom) Aligned length Substitution model PIa with outgroup(s) PI ingroup only Gap characters 
TBO26 F:TTC ATA TGG AAC AAG AAA AGA CC 60/TD*** 56–58 614 GTR+Γ 40 27 
 R:TAG TTC CCT TGC CCA TTT AGT G        
TBO29 F:CCA CCA CCA TTG TTT TCC TCT C 60–62 56/TD 601 HKY 34 23 
 R:CAA ACA AAC CTC TCC AAA CTC TGA C        
TBO47 F:TGT GGA GGA TTT TTC CCA CTT ACT A 62 56-60 819 GTR 52 41 
 R:CAA CAC AAT AGT AGA AAC CAT GCA GTC        
TBO53** F:CCA GGA GTA TAG CCT AAT GGT AGA GTT C 62 56 621 HKY+Γ 35 35 
 R:TTT TTG TGC CAC AGT TTC ACA TTC C        
TBO53-2 F:TCT AWA TGK TAG AGT TCA AGG GGG TAG A 62 56      
 R:CCA CAG TTT CAC ATT CCC ATT TTT        
TBO53-3 F:AGG GTT TCT AAG AGA ACT CAG ACT CAA G 62 56      
 R:TTC TGC TGT CTG TAT AGA GAT GAG AGT C        
TBO59 F:AGA TTT CGC CTT ATA CAA GCT ACT GAC 60–62 58–60 553 HKY+Γ 29 17 
 R:CTC CTT CTT CCT CTT CTT CAC TCA GG        
TBO64-2** F:CTG GCT CCC GTC AGC TCT A 56–58/TD 52–58/TD 471 HKY+Γ 36 36 
 R:AAG TTC AAG GCC CAT GAC TCA C        
TBO64-3 F:ACA ART TCA AGC CCC AGG ACT RAT ATA C        
 R:CAT GAC TCA CAC AAA MCA GAA AAG AAA T        
TBO72** F:CTT CCT GAA TCC ACC AGG TCA CTC 56 56 706 GTR 30 29 
 R:GGA GGA GCT GCG AAA ATC CTT GAG        
TBO72-2** F:CYG CAA CCT TCT CCT TYC AT 65 62      
 R:TCA GCA GGA CAG TGG AGG GC        
TBO72-3 F:GCA ACC TTC TCC TTT CAT TGC TC n/a n/a      
 R:GAA AAT CCT TGA GGC TGC TCT CG        
Maximum likelihood distances, HKY, 95% confidence interval 
Locusτ name Nucleotide diversity Π* Geomyini versus Thomomyini Between Megascapheus and Thomomys Between Megascapheus species Between Thomomys species Between T. bottae genetic groups, w/awahnee 
TBO26 0.029 0.0112–0.0697 0.0170–0.0601 0–0.0277 0–0.0054% 0–0.0329 
 SD = 0.0146      
TBO29 0.019 0.0390–0.0544 0.0143–0.0317 0–0.0226 0.0024–0.0259 0–0.0070 
 SD = 0.0094      
TBO47 0.031 0.0364–0.0597 0.0330–0.0472 0–0.0111 0.0115–0.0419 0–0.0070 
 SD = 0.0150      
TBO53** 0.028 0.0449–0.0630 0.0323–0.055 0.0031–0.0249 0.0006–0.0179 0.0026–0.0215 
 SD = 0.0141      
TBO59 0.022 0.0333–0.0767 0.0122–0.0357 0–0.0151 0–0.0340 0–0.0092 
 SD = 0.0112      
TBO64–2** 0.064 0.0015–0.0544 0.0315–0.0661 0.0124–0.0513 0–0.0436 0–0.0451 
 SD = 0.0311      
TBO72** 0.023 0.2511–0.3002 0.0266–0.0409 0–0.0262 0–0.0113 0–0.0037 
 SD = 0.0115      
Locus name Primer sequence Ta (°C) (Mega) Ta (°C) (Thom) Aligned length Substitution model PIa with outgroup(s) PI ingroup only Gap characters 
TBO26 F:TTC ATA TGG AAC AAG AAA AGA CC 60/TD*** 56–58 614 GTR+Γ 40 27 
 R:TAG TTC CCT TGC CCA TTT AGT G        
TBO29 F:CCA CCA CCA TTG TTT TCC TCT C 60–62 56/TD 601 HKY 34 23 
 R:CAA ACA AAC CTC TCC AAA CTC TGA C        
TBO47 F:TGT GGA GGA TTT TTC CCA CTT ACT A 62 56-60 819 GTR 52 41 
 R:CAA CAC AAT AGT AGA AAC CAT GCA GTC        
TBO53** F:CCA GGA GTA TAG CCT AAT GGT AGA GTT C 62 56 621 HKY+Γ 35 35 
 R:TTT TTG TGC CAC AGT TTC ACA TTC C        
TBO53-2 F:TCT AWA TGK TAG AGT TCA AGG GGG TAG A 62 56      
 R:CCA CAG TTT CAC ATT CCC ATT TTT        
TBO53-3 F:AGG GTT TCT AAG AGA ACT CAG ACT CAA G 62 56      
 R:TTC TGC TGT CTG TAT AGA GAT GAG AGT C        
TBO59 F:AGA TTT CGC CTT ATA CAA GCT ACT GAC 60–62 58–60 553 HKY+Γ 29 17 
 R:CTC CTT CTT CCT CTT CTT CAC TCA GG        
TBO64-2** F:CTG GCT CCC GTC AGC TCT A 56–58/TD 52–58/TD 471 HKY+Γ 36 36 
 R:AAG TTC AAG GCC CAT GAC TCA C        
TBO64-3 F:ACA ART TCA AGC CCC AGG ACT RAT ATA C        
 R:CAT GAC TCA CAC AAA MCA GAA AAG AAA T        
TBO72** F:CTT CCT GAA TCC ACC AGG TCA CTC 56 56 706 GTR 30 29 
 R:GGA GGA GCT GCG AAA ATC CTT GAG        
TBO72-2** F:CYG CAA CCT TCT CCT TYC AT 65 62      
 R:TCA GCA GGA CAG TGG AGG GC        
TBO72-3 F:GCA ACC TTC TCC TTT CAT TGC TC n/a n/a      
 R:GAA AAT CCT TGA GGC TGC TCT CG        
Maximum likelihood distances, HKY, 95% confidence interval 
Locusτ name Nucleotide diversity Π* Geomyini versus Thomomyini Between Megascapheus and Thomomys Between Megascapheus species Between Thomomys species Between T. bottae genetic groups, w/awahnee 
TBO26 0.029 0.0112–0.0697 0.0170–0.0601 0–0.0277 0–0.0054% 0–0.0329 
 SD = 0.0146      
TBO29 0.019 0.0390–0.0544 0.0143–0.0317 0–0.0226 0.0024–0.0259 0–0.0070 
 SD = 0.0094      
TBO47 0.031 0.0364–0.0597 0.0330–0.0472 0–0.0111 0.0115–0.0419 0–0.0070 
 SD = 0.0150      
TBO53** 0.028 0.0449–0.0630 0.0323–0.055 0.0031–0.0249 0.0006–0.0179 0.0026–0.0215 
 SD = 0.0141      
TBO59 0.022 0.0333–0.0767 0.0122–0.0357 0–0.0151 0–0.0340 0–0.0092 
 SD = 0.0112      
TBO64–2** 0.064 0.0015–0.0544 0.0315–0.0661 0.0124–0.0513 0–0.0436 0–0.0451 
 SD = 0.0311      
TBO72** 0.023 0.2511–0.3002 0.0266–0.0409 0–0.0262 0–0.0113 0–0.0037 
 SD = 0.0115      
a

PI = phylogenetically informative.

**

The locus sequenced is that which was reliably obtained by sequencing with either the −2 primer pair or the −3 or −4 primers. These primers were used to cycle sequence from template generated either by the original primer pair or the −2 primer pair.

***

TD = touchdown PCR annealing temperature: 65°C to 51°C, reducing by 1°C per cycle for the first 15 cycles, then 50°C for 10 more cycles.

τ

= gene trees are archived in TreeBASE, Study no. S1997.

Species Subspecies Prior basis for genetic differentiation MVZ no. County, State Abbreviation 
Thomomys bottae awahnee Genetically distinct in mtDNA studies 158708 Mariposa Co. CA Tbaw a 
 awahnee Genetically distinct in mtDNA studies 201577 Mariposa Co. CA Tbaw b 
 xerophilus Baja California allozyme group 153684 Baja Norte, BC Tbbj a 
 cactophilus Baja California allozyme group 153694 Baja del Sur, BC Tbbj b 
 albatus Basin and Range allozyme group 156116 Imperial Co. CA Tbbr a 
 ruidosae Basin and Range allozyme group 147023 Lincoln Co. NM Tbbr b 
 bottae Central California allozyme group 166821 Monterey Co. CA Tbcc a 
 alpinus Central California allozyme group 164670 Tulare Co. CA Tbcc b 
 riparius Great Basin allozyme group 148289 Riverside Co. CA Tbgb a 
 mewa Great Basin allozyme group 162920 Fresno Co. CA Tbgb b 
 saxatilis Northern California allozyme group 160762 Lassen Co. CA Tbnc a 
 laticeps Northern California allozyme group 160618 Humboldt Co. CA Tbnc b 
Thomomys townsendii townsendii Snake River Plain group 163705 Payette Co. ID Tto a 
 relictus Humboldt River and NE California group 175669 Lassen Co. CA Tto b 
Thomomys umbrinus chihuahuae Sierra Madre clade 150425 Durango Tum a 
 atroavarius Coastal clade (basal) 153745 Sinaloa Tum b 
Thomomys bulbivorus   218819 Lane Co. OR Tbu 
Thomomys talpoides yakimensis 2n = 40, > 1000 mi from Uinta Co. 176467 Yakima Co. WA Tta a 
 bridgeri 2n = 40, Fn = 70, > 1000 mi from Yakima Co. 179622 Uinta Co. CO Tta b 
 ocius 2n = 56, Fn = 78 179659 Moffat Co. CO Tta c 
Thomomys idahoensis pygmaeus 2n = 58, Fn = 76 179647 Uinta Co. WY Tid a 
 pygmaeus 2n = 58, Fn = 76 179648 Uinta Co. WY Tid b 
Thomomys mazama mazama 2n = 56 171042 Siskiyou Co. CA Tma a 
 nasicus 2n = 58 176438 Deschutes Co. OR Tma b 
Thomomys monticola  Tuolumne drainage mtDNA clade 201631 Tuolumne Co. CA Tmo a 
  Merced drainage mtDNA clade 208607 Sierra Co. CA Tmo b 
Species Subspecies Prior basis for genetic differentiation MVZ no. County, State Abbreviation 
Thomomys bottae awahnee Genetically distinct in mtDNA studies 158708 Mariposa Co. CA Tbaw a 
 awahnee Genetically distinct in mtDNA studies 201577 Mariposa Co. CA Tbaw b 
 xerophilus Baja California allozyme group 153684 Baja Norte, BC Tbbj a 
 cactophilus Baja California allozyme group 153694 Baja del Sur, BC Tbbj b 
 albatus Basin and Range allozyme group 156116 Imperial Co. CA Tbbr a 
 ruidosae Basin and Range allozyme group 147023 Lincoln Co. NM Tbbr b 
 bottae Central California allozyme group 166821 Monterey Co. CA Tbcc a 
 alpinus Central California allozyme group 164670 Tulare Co. CA Tbcc b 
 riparius Great Basin allozyme group 148289 Riverside Co. CA Tbgb a 
 mewa Great Basin allozyme group 162920 Fresno Co. CA Tbgb b 
 saxatilis Northern California allozyme group 160762 Lassen Co. CA Tbnc a 
 laticeps Northern California allozyme group 160618 Humboldt Co. CA Tbnc b 
Thomomys townsendii townsendii Snake River Plain group 163705 Payette Co. ID Tto a 
 relictus Humboldt River and NE California group 175669 Lassen Co. CA Tto b 
Thomomys umbrinus chihuahuae Sierra Madre clade 150425 Durango Tum a 
 atroavarius Coastal clade (basal) 153745 Sinaloa Tum b 
Thomomys bulbivorus   218819 Lane Co. OR Tbu 
Thomomys talpoides yakimensis 2n = 40, > 1000 mi from Uinta Co. 176467 Yakima Co. WA Tta a 
 bridgeri 2n = 40, Fn = 70, > 1000 mi from Yakima Co. 179622 Uinta Co. CO Tta b 
 ocius 2n = 56, Fn = 78 179659 Moffat Co. CO Tta c 
Thomomys idahoensis pygmaeus 2n = 58, Fn = 76 179647 Uinta Co. WY Tid a 
 pygmaeus 2n = 58, Fn = 76 179648 Uinta Co. WY Tid b 
Thomomys mazama mazama 2n = 56 171042 Siskiyou Co. CA Tma a 
 nasicus 2n = 58 176438 Deschutes Co. OR Tma b 
Thomomys monticola  Tuolumne drainage mtDNA clade 201631 Tuolumne Co. CA Tmo a 
  Merced drainage mtDNA clade 208607 Sierra Co. CA Tmo b 

In general, polymerase chain reaction (PCR) was performed as follows: 10 ng of genomic DNA were mixed in a 12-μ L reaction containing 1.5 mM MgSO4, 1 × Thermopol II buffer (New England Biolabs, Inc.; NEB), 0.1 U NEB Taq polymerase, 0.2 mM dNTPs. The thermal profile consisted of 95°C for 5 min, 30 cycles of 94°C for 30 s, primer-specific annealing temperature (Ta) for 1 min, 72°C for 2 min, followed by 72°C for 5 s. Reaction products were visualized on a 1% sodium borate agarose gel stained with GelStar (Lonza Group, Switzerland) 1 μ L/100 mL. Products were treated with a mixture of exonuclease and shrimp alkaline phosphatase (ExoSAP-IT; USB) as follows: 1 μ L of diluted (1:9) ExoSAP-IT per PCR reaction product incubated for 30 min at 37°C, then 15 s at 80°C.

Reaction products were subjected to cycle sequencing using Big Dye 3.1 (Applied Biosystems) with the following reaction mixture: 0.5 to 4 μ L of PCR template (depending on band brightness) in an 8-μ L reaction containing 0.125× Big Dye, 1.4 μ L of 5× sequencing buffer (Applied Biosystems) and 0.25 mM sequencing primer. Cycle sequencing thermal conditions consisted of 95°C for 1 min, then 25 cycles of 98°C for 15 s, 50°C for 5 s, 60°C for 4 min. Cycle sequencing products were cleaned using EtOH + NaOAc precipitation. Products were separated and visualized on an Applied Biosystems 3730XL capillary sequencer and base-called using the KB basecaller and relaxed clear-range criteria. Base calls were checked and sequences further edited in Sequencher 4.6 (Gene Codes).

Data Analysis

Sequences were aligned in Clustal X (Thompson et al., 1997). Alignments were evaluated by eye, and gap initiation and extension penalties were adjusted before realignment in Clustal X, until the alignment was nearly optimal; final adjustments, where necessary, were made by hand. Individual sequences contained gap characters (coded as −), as well as heterozygous sites (coded with IUPAC standard ambiguity codes) and minimal missing sites (0% to 16%, average ∼ 5%) (coded as ?).

Data were evaluated for fit to 24 and 56 evolutionary models, using MrModelTest (Nylander, 2004) and ModelTest (Posada and Crandall, 1998), respectively. We chose the most parameterized model that best fit the data at each locus, evaluated by either the likelihood-ratio test or Akaike information criterion, because higher parameterization has been shown to increase performance (Lemmon and Moriarty, 2004; Huelsenbeck and Rannala, 2004). ModelTest includes 32 models not available in MrBayes, so to ensure we used the most parameterized model possible, we compared the best fitting model among those tested in ModelTest to that in MrModelTest. If the best fitting model determined in ModelTest was more parameterized, we selected the next most parameterized model available in MrBayes (Table 1).

Phylogenetic analyses were performed using Bayesian reconstruction methods in the program MrBayes 3.1 (Ronquist and Huelsenbeck, 2003). For single gene analysis, the Markov chain Monte Carlo (MCMC) analysis was run for 5 × 107 generations, sampling trees every 1000 generations, using four Markov chains (default heating values). Six of the seven loci contained phylogenetically informative indels in the alignment. For each of these genes, an additional analysis was performed exactly the same way except that an additional partition was added to represent the phylogenetically informative indels. Gaps were coded with a binary model, set to equal substitution rates between states. For each such alignment, phylogenetically informative gaps were coded with a 0 or 1 indicating presence or absence of a gap; a gap was only coded once, regardless of its length. Stationarity of the MCMC was evaluated using the “Are We There Yet” (AWTY) software (Wilgenbush et al., 2004), which plots the cumulative posterior probabilities for each tree. Burn-in trees, those generated before the point at which these values stabilized, were discarded; we discarded from 500 to 2000 sampled trees (500,000 to 2,000,000 generations) for each individual locus run. The majority-rule consensus tree for the estimated posterior distribution of trees (with burn-in trees truncated) from single genes was assembled using MrBayes (Ronquist and Huelsenbeck, 2003).

For the concatenated analysis, each locus was considered a partition and assigned its own substitution model; ratepr was set to variable to ensure that branch lengths were estimated separately for each partition, and the MCMC analysis was run for 108 generations, sampling trees every 1000 generations, using four Markov chains. In the concatenated analysis, a single partition was added, for a total of 8 partitions, into which gap states were combined from all six loci (n = 19 gaps), coded as described for single gene analyses, above. Based on plots viewed in AWTY, as described above, we discarded 3000 trees (3,000,000 generations) for each concatenated run. The majority-rule consensus tree for the estimated posterior distribution of trees (with burn-in trees truncated) from the concatenated loci was assembled in MrBayes (Ronquist and Huelsenbeck, 2003).

Phylogenetic analyses were also performed using a modified version of a new method (Bayesian Estimation of Species Trees or BEST) that uses a Bayesian hierarchical model to estimate a distribution of species trees from vectors of estimated gene trees, across multiple loci. The basic method is described in Liu and Pearl (2007) and further evaluated in Edwards et al. (2007). In brief, BEST uses MrBayes to generate a posterior distribution of gene trees across loci using a prior based on an approximate species tree, thus fitting the gene tree distributions to several biologically realistic constraints. It then estimates a species tree from the joint posterior distribution of gene trees. Importance sampling is used to calculate the weights for species trees in order to produce a correct posterior distribution of the species tree. The result is a posterior distribution of species trees that is based on the distribution of a weighted joint distribution of gene trees. The modification used in the present analysis is to incorporate multiple alleles from each taxon into the probability density function of gene trees, given the species tree (Liu et al., 2008). For this data set, the same substitution models described above were used for each locus. Therefore, in every aspect but the model underlying the generation of a species tree, the concatenated and hierarchical analyses were identical. Chains were run for 2 × 108 generations; one out of every thousand gene trees was sampled. Convergence of the chains was assessed by examining the log likelihood values; all trees generated before the log likelihood reached stationarity were discarded as burn-in. The first 15,000 trees were discarded as burn-in. The majority-rule consensus tree for the estimated posterior distribution of species trees was constructed in PAUP* v. 4.0610 (Swofford, 2003).

The method assumes no reticulation among taxa and to fit this assumption, the membership of each individual sample to a taxon should be provided as a prior in the analysis. To compare outcomes of fitting this assumption with overriding it, the method was applied in two ways, with each genetic group of T. bottae considered as a “taxon,” although we know that intraspecific genetic groups are not reproductively isolated, and with all T. bottae considered as one taxon. To evaluate the outcome of increasing the number of alleles per taxon, the single-allele method (Liu and Pearl, 2007) was also applied to one individual per species using the same substitution models and run parameters as previously described.

In order to compare the trees generated by both the Bayesian analyses of the concatenated sequences and the hierarchical method, we calculated the log likelihoods of all post–burn-in trees. We calculated the Bayes factor from the minimum log likelihood of the posterior distribution of the BEST trees and the maximum log likelihood of the posterior distribution of the concatenated trees (this was a conservative calculation rather than taking the average of both distributions) and can thereby assess the strength of support for either method (Goodman, 1999). Gene trees in the BEST analysis are compiled using each individual as a component taxon, before combining in the species tree analysis, and thus have the same number of terminal taxa as do the concatenated trees.

Estimation of Divergence Times

Divergence times for the splitting of the Thomomyine tribe from the rest of the Geomyid taxa, as well as for the splitting of each major clade, were estimated using BEAST v. 1.4 (Drummond and Rambaut, 2003). Because the molecular clock model was rejected for four of the seven loci (tested using the likelihood-ratio test of the clock hypothesis in PAUP* under the substitution models selected [above]: TBO26: df = 26, χ2 = 51.30, P = 0.00; TBO29: df = 26, χ2 = 50.13, P = 0.00; TBO47: df = 26, χ2 = 22.95, P = 0.64; TBO53: df = 25, χ2 = 38.41, P = 0.04; TBO59: df = 26, χ2 = 23.27, P = 0.62; TBO64: df = 25, χ2 = 41.37, P = 0.02; TBO72: df = 25, χ2 = 20.00, P = 0.75), a relaxed-clock model (Drummond et al., 2006) was used in the BEAST analysis. The concatenated alignment was analyzed under the most parameterized evolutionary model (GTR+Γ) available. The split of Thomomys from the outgroup was set to fit a lognormal shape prior with a mean of 2.08, standard deviation of 0.075, zero offset of 6.5, initial value of 6.5, and UPGMA starting tree height of 10; the chain was run for 6.5 × 106 generations and all resulting effective sample sizes exceeded 250. The dating errors for the fossil used to calibrate the divergence time estimates place the fossil age between 6.5 and 6.8 Ma (Tedford et al. 2006). To ensure fossil dating error did not substantially alter the divergence time estimates, the analysis was also run the same way, but with zero offset and initial values of 7, to bracket these dates. Results were nearly identical (data not shown).

Results

All seven loci were highly variable across all taxa (Table 1). Overall, uncorrected sequence divergence (Nei, 1987) across all ingroup taxa was very high and variable among loci (mean = 0.031, range = 0.02–0.06). Large indels characterized some of the clades within the genus; in particular, in locus TBO47, a 209-bp indel separated the two subgenera. In loci TBO47, TBO53, and TBO59, indels of 4 to 9 bp were also phylogenetically informative. Gene tree topologies are marked at nodes supported with greater than 90% posterior probability (Fig. 2). Phylogenetic resolution obtained for single gene trees varied widely, with between two (TBO26, TBO59, and TBO72) and nine (TBO47 and TBO53) significantly supported (95% posterior probability, Huelsenbeck and Rannala, 2004) clades. Gene tree topologies did not vary substantially with the inclusion of gap partitions, and posterior probability values varied only slightly.

Figure 2

Distinct gene tree topologies. Nodes with greater than 90% posterior probability support from analysis of seven individual loci (TBOXX) in MrBayes are marked with a box (□). Branch lengths are in substitutions/site. (a) TBO26; (b) TBO29; (c) TBO47; (d) TBO53; (e) TBO59; (f) TBO64; (g) TBO72. (Continued)

Figure 2

Distinct gene tree topologies. Nodes with greater than 90% posterior probability support from analysis of seven individual loci (TBOXX) in MrBayes are marked with a box (□). Branch lengths are in substitutions/site. (a) TBO26; (b) TBO29; (c) TBO47; (d) TBO53; (e) TBO59; (f) TBO64; (g) TBO72. (Continued)

Concatenated tree topologies also did not vary with the inclusion of the gap partition; posterior probabilities and branch lengths (substitutions/site) were altered very slightly (Fig. 3). Fourteen of the 21 bipartitions were supported with 90% posterior probability or greater. The subgenera Thomomys and Megascapheus were each monophyletic. Three of the four named species within the subgenus Thomomys were monophyletic; T. idahoensis was weakly nested within T. talpoides, making T. talpoides paraphyletic with respect to T. idahoensis. There was strong support for the basal split between T. mazama and the clade containing T. talpoides, T. monticola, and T. idahoensis. Within Megascapheus, T. bulbivorus was well separated from the rest of the taxa and appeared as the sister group to the remaining species. Otherwise, relationships within and among taxa of Megascapheus were less well defined relative to those in the Thomomys subgenus. The monophyly of each of T. bottae awahnee, the Yosemite Valley form, T. bottae members of the Basin and Range genetic group, and T. bottae members of the Baja California group was well supported. Thomomys umbrinus was strongly supported as monophyletic and was weakly nested within T. bottae. Thomomys townsendii and three of the six forms/genetic groups examined within T. bottae were not monophyletic. Except for the relative divergence of the Baja California group, relationships among genetic groups of T. bottae and T. townsendii were poorly resolved. Overall, branch lengths between speciation events were very short, both as viewed in the component gene trees (Fig. 2) and the concatenated tree (Fig. 3), supporting the contention based on fossil evidence and paleoecological theory (Korth, 1994; Webb and Opdyke, 1995) that both subgenera experienced rapid radiations.

Figure 3

Majority-rule consensus tree generated by MrBayes, from seven concatenated loci, plus a partition that included coding for all gaps. Nodes are labeled with the posterior probability derived from the analysis including gaps (left). Posterior probabilities excluding gaps are shown only if they differ from those for the analysis including gaps (right, in square brackets above the branch). Labels in parentheses above the branch show the number of individual gene trees (out of seven) containing this node with > 90% posterior probability. Divergence time means, with 95% highest posterior density, for each node as estimated using BEAST are indicated below each branch. Branch lengths are in substitutions/site. Tree deposited in TreeBASE (Study no. S1997).

Figure 3

Majority-rule consensus tree generated by MrBayes, from seven concatenated loci, plus a partition that included coding for all gaps. Nodes are labeled with the posterior probability derived from the analysis including gaps (left). Posterior probabilities excluding gaps are shown only if they differ from those for the analysis including gaps (right, in square brackets above the branch). Labels in parentheses above the branch show the number of individual gene trees (out of seven) containing this node with > 90% posterior probability. Divergence time means, with 95% highest posterior density, for each node as estimated using BEAST are indicated below each branch. Branch lengths are in substitutions/site. Tree deposited in TreeBASE (Study no. S1997).

Close inspection of individual gene trees reveals both support and conflict relative to the concatenated tree. Few clades other than the two subgenera of Thomomys were strongly supported in more than one gene tree. Of the 12 significantly supported clades in the concatenated Bayes tree (Fig. 3; the number of gene trees that portray each clade with significant support is indicated above the branches in parentheses), only 6 were strongly supported by more than one gene tree. The monophyly of each subgenus was supported by four and the T. idahoensis clade was significantly supported in five of the seven gene trees; T. mazama, T. monticola, the split of T. bulbivorus from the rest of the Megascapheus subgenus, and the split of the two subgenera (reciprocal monophyly) were supported by two of the seven gene trees. Every other well-supported clade depicted in the concatenated Bayes tree received > 95% posterior support in one or none of the seven gene trees. Fully 28% of the strongly supported clades in the gene trees conflict with those strongly supported in the concatenated tree. Thus, the information contained in the gene trees was largely not reflected in the concatenated tree; conversely, the concatenated tree did not provide any indication of the level of conflict among gene trees.

In contrast to the Bayesian analysis of concatenated sequences, the hierarchical Bayesian method (BEST) directly estimates relationships among taxa, rather than individuals (Fig. 4). In the first analysis, in which we considered each form/genetic group of T. bottae a taxon (Fig. 4a), support for nodes was much weaker than that resulting from the concatenated analysis, although the BEST results provide strong support for the monophyly of each of the two subgenera. As in the concatenated analysis, within the subgenus Megascapheus, T. bulbivorus appears to be the sister lineage to other taxa, but support for internal branches is exceptionally low and relationships are generally unresolved. To explore the effect of increasing sample size per taxon, we repeated the hierarchical Bayes analysis grouping all 12 individuals of T. bottae into one taxon (Fig. 4b). This second analysis increased support for branches within the Megascapheus group considerably (0.06–0.33 to 0.55 and 0.65) relative to those in Thomomys (0.31 and 0.36), although support remained very low. The topology of the two trees did not vary. However, with low support, the topology of the species within the subgenus Thomomys deviated from that in the concatenated analysis, in that T. talpoides was sister to T. monticola rather than to T. idahoensis. We note that by a priori allocating individual membership to a taxon, or the case of the nine-taxon tree, all T. bottae samples to a single taxon, we exclude the possibility of detecting paraphyly of this taxon with T. umbrinus or T. townsendii (Patton and Smith, 1994). The third analysis, using only one allele per taxon, demonstrated the increased power obtained by increasing the number of alleles per taxon. The tree topology did not vary (Fig. 4b) but the support for every node, including the strongly supported split between subgenera, diminished. Posterior probability values from this analysis are shown in parentheses above the branches of each node (Fig. 4b).

Figure 4

Majority-rule consensus tree generated in PAUP* from BEST output. Nodes are labeled with their posterior probabilities. Branch lengths are in substitutions/site. Trees deposited in TreeBASE (Study no. S1997). (a) Tree estimated for 14 named taxa (genetic group or species; see text for details). (b) Tree estimated for 9 named taxa (species). Posterior probability obtained for the same nodes from a single-allele analysis in BEST are shown in parentheses after the multiallele posterior probabilities.

Figure 4

Majority-rule consensus tree generated in PAUP* from BEST output. Nodes are labeled with their posterior probabilities. Branch lengths are in substitutions/site. Trees deposited in TreeBASE (Study no. S1997). (a) Tree estimated for 14 named taxa (genetic group or species; see text for details). (b) Tree estimated for 9 named taxa (species). Posterior probability obtained for the same nodes from a single-allele analysis in BEST are shown in parentheses after the multiallele posterior probabilities.

A comparison of the log likelihoods for all post–burn-in trees for both the concatenated analysis and the BEST analysis clearly shows that log likelihoods for trees of each type are tightly grouped (Fig. 5). The Bayes factor calculated from this analysis, log(BF), is greater than 186.529, which strongly supports the model that generated the BEST trees over that which generated the concatenated analysis.

Figure 5

Plot of the log likelihoods for 2000 posterior trees (after burn-in trees are removed) from the concatenated analysis and the individual gene genealogy analysis performed by BEST. The Bayes factor calculated from the minimum log likelihood among the BEST trees, minus the maximum log likelihood among the concatenated trees, is 186.529.

Figure 5

Plot of the log likelihoods for 2000 posterior trees (after burn-in trees are removed) from the concatenated analysis and the individual gene genealogy analysis performed by BEST. The Bayes factor calculated from the minimum log likelihood among the BEST trees, minus the maximum log likelihood among the concatenated trees, is 186.529.

Ninety-five percent confidence intervals for divergence time estimates (millions of years) for each node, rounded to 2 places, are shown on Figure 3 below the branches. These should be regarded as provisional estimates, as only one calibration point was available from the fossil literature (see details in methods). The dates suggest Pliocene divergence of the subgenera and mid-late Pleistocene radiation of species within them.

Discussion

Gene Trees versus Concatenated Tree versus Species Trees

In spite of considerable nucleotide variation, there are few strongly supported relationships that are consistent among gene trees. A majority-rule or consensus method of generating a species tree from the gene trees would yield little information. In addition, support for specific clades within either subgenus varies with locus, and some gene trees support nodes that conflict with those in the concatenated tree. These conflicts are likely to reflect true differences in histories across genes, which can result from a large number of possible factors (Gatesy and Baker, 2005). This observation reinforces the importance of using a coalescent model, which explicitly incorporates different histories among genes. By contrast, the species tree generated by a partitioned analysis of concatenated loci, which assumes topologically consistent gene trees, yields several well-supported clades. Most nominal species (4/7; T. bulbivorus is only represented by one individual, so lack of monophyly could not be tested) are monophyletic. However, it is striking how strongly some clades are supported, despite the lack of support in most of the gene trees. All but one of the supported clades are supported by two or fewer gene trees. Gatesy and Baker (2005) noted that the combination of multiple genes, which separately do not support a clade, often reveals emergent support for or emergent conflict with that clade. Similarly, individual genes trees may conflict with a clade, but when combined, support it. Based on these results, and our observations, it is apparent that it is not possible to predict the outcome of a concatenated analysis from component gene trees. Furthermore, there is no way to assess whether clades strongly supported by the emergent properties that manifest upon combining genes in a concatenated analysis reflect the correct species relationships.

The previously identified paraphyly of T. bottae with respect to T. townsendii was evident but not strongly supported by this analysis. Some genetic groups within T. bottae are monophyletic with strong support in the concatenated analysis; yet, the monophyly of these groups is strongly supported by only one gene tree. The genetic groups were originally described from allozyme trees that included a much broader sampling of individuals and were constructed using phenetic methods. The results presented here were generated using model-based Bayesian methods, have more limited sampling and, thus, they are not directly comparable to the previous allozyme trees. Nonetheless, it is notable that two of the five allozyme-generated genetic groupings within T. bottae are supported by the concatenated analysis, namely, the Basin and Range, and Baja California genetic groups (T. b. awahnee is not a genetic group, sensu Patton and Smith, 1990, but a subspecies).

The results of the hierarchical Bayesian model show a similar, but not identical, topology to that of the concatenated tree, but with virtually no support for most of the relationships, either in the analysis that treats each T. bottae genetic group as a taxon or the analysis that groups all of T. bottae as one taxon. The only significantly supported clades by this method are the subgenera, Megascapheus and Thomomys. In some respects, this is the tree one intuitively expects when looking at the gene trees individually (Fig. 2). Most relationships are loosely upheld with poor support.

The disagreement between the BEST trees and the concatenated tree is explained by understanding what concatenation does. The concatenation method assumes that all loci have the same evolutionary history. Because of this, the species tree estimation is identical to estimating an optimal gene tree using molecular sequences. In general, Bayesian approaches are equivalent to maximum likelihood methods if there is a large amount of data. It follows from the consistency of maximum likelihood approaches (Chang, 1996; Rogers, 2001) that the Bayesian concatenation method may overestimate the posterior probability of the tree if the data include long sequences, even if the signal of the species tree in the data is low. Here, the signal is the topology of each gene tree; if there are few well-supported relationships within gene trees, or if the signal across gene trees is highly conflicting, then little true signal exists. Thus, as long as the sequence is long enough, this method can produce a highly resolved tree that may be incorrect. Similarly, it has been shown (Steel and Matsen, 2007) that increasing the length of sequences increases the possibility of obtaining strong support for an incorrect, resolved tree when the true tree is unresolved.

The assumptions underlying the BEST model employed here are more biologically realistic and capture some basic principles inherent in lineage sorting. This is more likely to be an accurate reflection of history, and not “positively misleading” (sensu Degnan and Rosenberg, 2006). This contention is statistically supported by the Bayes factor calculated for the comparison of the concatenated trees to the BEST trees; log(BF) > 186.529 indicates strong support for the model underlying the BEST trees over the model underlying the concatenated trees. Finally, phylogenetic reconstruction methods, as they have been conceived of to date, assume that species relationships are “tree-like” (Rosenberg, 2002) and that there is no reticulation. Given that multiple biogeographic and demographic scenarios are known to be implicated in incomplete lineage sorting, in theory (Rosenberg, 2003) and in various Thomomys taxa (Patton and Smith, 1989), especially within Megascapheus, it is not surprising that many relationships within the genus are not well resolved, even by the concatenation method.

Multiple Individuals Per Taxon

The use of one individual per taxon in a phylogenetic analysis is common but also potentially misleading, as it ensures that each taxon will be monophyletic. Patton and Smith (1981) and Smith (1998) used multiple individuals from the T. bottae group, as done here, to reveal paraphyly and polyphyly of species in the subgenus Megascapheus. Allozyme and morphometric analyses were the basis for defining the “genetic groups” within T. bottae (Patton and Smith, 1990), although these were not completely concordant with the subsequent mitochondrial analysis (Smith, 1998). Similarly, in our Bayesian phylogenetic analyses, when we treat each individual separately in either the gene tree or in concatenated analyses, the paraphyly of various species is apparent. When individuals are grouped into their respective genetic groups or species as a prior in the hierarchical analysis, the species and groups receive very low posterior support. Although the prior grouping of alleles into species/genetic groups precludes us from observing paraphyly, if that is the actual relationship among the species/groups, the lack of posterior support may well be a reflection of the incomplete lineage sorting we knew to be present among some species.

The outcomes for the subgenus Thomomys are somewhat more surprising. Inclusion of multiple individuals per taxon revealed no significant support for monophyly of species within the subgenus using the BEST method. It is likely that the subgenus Thomomys is subject to similar lack of lineage sorting as it radiated at the same time as did Megascapheus and was doubtless subject to similar environmental and demographic conditions. In particular, the widespread species, T. talpoides, has nested within its range the highly geographically restricted species T. idahoensis and T. clusius (not included in this phylogenetic analysis), in addition to chromosomal and karyotypic variation throughout the rest of its range. It is very likely that T. talpoides is polytypic, as suggested by Nevo et al. (1974), and further investigation is warranted. It is also important to note that the two specimens of T. idahoensis used in this study were sampled from the same population, rather than from different subspecies, or distant genetic types, as were all other pairs of samples within species/genetic groups in the study, so the consistent monophyly of this species relative to T. talpoides is not very informative.

In general, this lack of complete sorting would not have been revealed without the use of multiple individuals per taxon, a result strongly in line with predictions of the simulations by Maddison and Knowles (2006). Maddison and Knowles simulated species tree reconstruction for taxa with incompletely sorted genes under two methods, using 1, 3, 9, and 27 individuals and genes in various combinations; the most accurate results were seen with 3 individuals and nine loci, or the reverse. In most cases we included two individuals and seven genes, close to the optimal category suggested by Maddison and Knowles. The inclusion of 12 individuals of T. bottae, a species thought to have radiated rapidly, did not further resolve species-level relationships with other taxa. The difficulty in resolving Thomomys species is fully consistent with the general arguments of Maddison and Knowles and the earlier conclusions of Neigel and Avise (1986) from their simulations of mitochondrial lineage sorting, namely, that rapid and recent speciation under a multitude of demographic scenarios often results in polyphyly, followed by paraphyly, and only much later, monophyly. By contrast, the result of the BEST analysis performed on one exemplar per species provides compelling support for the proposal that increasing the number of individuals of each taxon will greatly increase resolution if incomplete lineage sorting is present. Although support for species in the multiple allele analysis was not strong (0.31 to 0.65 for species), the support disappears when only one individual is used (0 for species).

A potential source of concern with using BEST is the assignment of alleles to a taxon (species or genetic group) a priori, which may obscure paraphyletic and polyphyletic relationships. Although an obvious advantage of using multiple individuals in a phylogenetic reconstruction is to elucidate nonmonophyletic relationships, treating each individual as a discrete taxon violates the assumptions of divergent evolution implicit in phylogenetic analysis and should be used as a heuristic only. All phylogenetic reconstruction methods assume that taxa (species) are reciprocally monophyletic, which is largely assumed to result from extended reproductive isolation. This assumption leads to the supposition that given enough data (taxa, genetic loci, other characters) or computing power, relationships are unambiguously recoverable using phylogenetic methods. However, regardless of one's preference in species concept, recently diverged or rapidly diverged species may not meet any or all of the criteria to qualify as species or, rather, may not have done so at every locus, every population, or in all cases (Beltran et al., 2002; Smith, 1998). Thus, reconstructing historical relationships among taxa whose degree of separation is variable with respect to these criteria automatically violates the primary assumption of phylogenetic methods (Patton and Smith, 1994). The alternative, coalescent reconstruction, accommodates the possibility of incomplete gene segregation at species boundaries, although methods have not yet been developed for phylogenetic analysis (Maddison and Knowles, 2006). The BEST method, tested here, is optimized for phylogenetic analysis, but as with all other phylogenetic analysis methods to date, does not allow for reticulation of any kind among taxa in its current form.

Does Concatenation of Multiple Loci Generate an Overresolved Tree?

Typically, the advantage of using multiple loci in phylogenetic reconstruction is to provide additional, independent support for lineage relationships. Although lineage relationships as evaluated using concatenated genes in a Bayesian partitioned analysis did result in strong support for each of the subgenera and multiple individual species, no single species-level relationship was supported by the BEST analysis. By examining well-supported clades in the gene trees in comparison to those in the concatenated tree and to those in the BEST trees, we observe that conflict among gene trees does not necessarily result in reduced support for a clade in a concatenated analysis. By contrast, coincidence of clade support among genes results in high support by the BEST method, whereas conflict among gene trees reduces support. The use of multiple loci, therefore, appears to result in a more realistic depiction of lineage history than the use of one or few loci, particularly when analyzed in a coalescent context. It might be said that adding more loci improves resolution of the concatenated, partitioned analysis but improves realism of the coalescent analysis.

Lineage sorting necessarily follows reproductive isolation. Phylogenetic methods that are designed to estimate a bifurcating tree assume that there is no reticulation. Thus, regardless of species status based on predominant reproductive behavior, ecology, morphology, and geography, phylogenetic methods may not provide much discerning power for recently radiated groups. The advantage of using the hierarchical model is that it does not have a tendency to over-support poorly resolved clades, and the use of a species prior permits alleles from multiple individuals to be jointly considered when assessing relationships. There is clearly room for new methods that incorporate coalescent theory and the occurrence of intermittent or low-level genetic introgression into analysis of lineage clusters.

Thomomys Phylogenetics: Insights from More Taxa and More Loci

Over the late Miocene (5 to 10 Ma), numerous Geomyoid lineages in the fossil fauna began to evolve similar tooth characteristics in parallel, specifically, increased hypsodonty with a gradual reduction in buccal and lingual enamel (Wood, 1937). Many of these early lineages had gone extinct by the end of the Pliocene (Wood, 1937), coinciding with the fossil evidence for a great diversity of remaining Geomyoids towards the end of the Pliocene (Korth, 1994). Fossil Thomomyines date to the middle Hemphillian NALMA (∼ 7 Ma; Shotwell, 1967; Tedford et al., 2004), and are presumed to have been restricted to small patches of habitable terrain within a savannah-dominated landscape (Webb and Opdyke, 1995). According to the divergence times generated here, the subgenus Megascapheus originated approximately 5 Ma, and the subgenus Thomomys originated approximately 4 Ma, both shortly after the start of the Pliocene, during the Blancan NALMA. Paleoecologists have described the mid-Pliocene as the time when grasses began to dominate in the increasingly semiarid regions of the continent, including the Great Basin and the high plains states (Webb and Opdyke, 1995). Clade ages within the subgenus Thomomys are estimated to be approximately 1 to 2 Ma; clade ages within the subgenus Megascapheus are estimated to be younger, from 0.8 to 1 Ma. These dates corroborate fossil descriptions of these newly emerged Geomyinae, with full hypsodonty (high crowned teeth) in their cheek teeth, proposed to be a key innovation ideally suited for sustaining a lifetime of grinding grasses (Korth, 1994). It is likely the arid and semiarid grasslands supported subterranean rodents especially well and permitted expansion of Geomyine taxa.

It is known that some species form hybrids in narrow zones of sympatry, at least within some members of Megascapheus (Patton et al., 1972, 1984). Within the extraordinary geographic diversity of T. bottae, one, possibly two, distinct species have arisen. Thomomys townsendii is ecomorphologically distinct, monophyletic for allozymes in one analysis (Patton and Smith, 1994), and forms a narrow contact zone with T. bottae, resulting in limited hybridization. However, it is nested well within T. bottae for allozymes and mtDNA, rendering the latter paraphyletic (Patton and Smith, 1994). A second species, T. umbrinus, contains some populations that are chromosomally distinct from T. bottae, is likewise morphologically distinct, and forms few hybrids in secondary contact with T. bottae (Patton, 1993). Depending on the data set and method of analysis, it either forms a sister group to T. bottae or is nested within it (Patton and Smith, 1981; Smith, 1998).

Much has been said in the past about the nature of genetic structuring in subterranean rodents, and pocket gophers in particular, in relation to geography and lifestyle (summarized in Steinberg and Patton, 2000). The low levels of dispersal, generally low-density populations, and high philopatry permit rapid genetic drift of local populations sustaining huge genetic variance across the range of the species. However, episodic gene flow via juvenile dispersal of moderately long distance, known to occur into vacant niches (Daly and Patton, 1990; Hafner et al., 1998), may have maintained the cohesiveness of the gopher taxa. This type of fluctuating subdivided population structure will also result in high global effective population size, which, when combined with rapid speciation events, will further increase the challenge of determining lineage history from molecular data. This has been evident in genetic patterns as well as chromosomal variation and phenotypic variation (Patton and Smith, 1991). It has been suggested that low levels of hybridization at the boundaries of ranges persists because of the lack of alternative mates for individuals in those contact zones.

In general, because gopher species are rarely sympatric, or even contiguously parapatric, it is not easy to assess the ease with which taxa interbreed. Selection pressure is not strong enough to evolve reinforcement mechanisms to prevent the loss of fitness presumed to be endured by hybrids by this model (Steinberg and Patton, 2000). Whether reticulation remains possible among some of the genus Thomomys because of the short time since species divergence, or because of a lack of pressure to enhance isolating mechanisms, reconstructing the histories of and relationships among these rapidly radiating taxa is difficult with phylogenetic techniques. Two-lineage models that incorporate divergence and gene flow (e.g., Hey and Nielsen, 2004, 2007) are available, but extending coalescent models to multilineage analyses remains a challenge. Many recent discussions in the literature have called out a need for coalescent methods that can be applied at the interface of phylogenetic and population processes that will assist in discerning among lineages of different ages and different histories. A coalescent method that can test between hypotheses of recent reticulation versus a relatively recent rapid speciation event (resulting in incomplete lineage sorting) would be a powerful tool for these kinds of questions. The use of more realistic methods of phylogenetic reconstruction, such as the hierarchical Bayesian model here, that do not positively mislead is an important step toward identifying where current analytical methods may be inadequate to understand the histories and boundaries of recently diverged species.

Bottae's pocket gopher, San Mateo County, California; photo credit: Pauline Kamath.

Bottae's pocket gopher, San Mateo County, California; photo credit: Pauline Kamath.

Acknowledgment

This work was supported by NSF DEB 0306091, the Museum of Vertebrate Zoology (MVZ) Kellogg Fund, and the MVZ Martens Fund to N.M.B. and by S. Edwards to L.L. N.M.B. is grateful to J. L. Patton and the laboratory of J. McGuire for helpful discussions and the laboratory assistance of Allison Tam. We wish to thank Laura Kubatko, Jack Sullivan, and two anonymous reviewers for helpful comments and suggestions.

References

Alvarez-Castaneda
S. T.
Patton
J. L.
Geographic genetic architecture of pocket gopher (Thomomys bottae) populations in Baja California, Mexico
Mol. Ecol.
 , 
2004
, vol. 
13
 (pg. 
2287
-
2301
)
Avise
J. C.
Wollenberg
K.
Phylogenetics and the origin of species
Proc. Nat. Acad. Sci. USA
 , 
1997
, vol. 
94
 (pg. 
7748
-
7755
)
Ballard
J. W. O.
Rand
D. M.
The population biology of mitochondrial DNA and its phylogenetic implications
Annu. Rev. Ecol. Evol. Syst.
 , 
2005
, vol. 
36
 (pg. 
621
-
642
)
Beltrán
M.
Jiggins
C. D.
Bull
V.
Linares
M.
Mallet
J.
McMillan
W. O.
Bermingham
E.
Phylogenetic discordance at the species boundary: Comparative gene genealogies among rapidly radiating Heliconius butterflies
Mol. Biol. Evol.
 , 
2002
, vol. 
19
 (pg. 
2176
-
2190
)
Bush
G. L.
Case
S. M.
Wilson
A. C.
Patton
J. L.
Rapid speciation and chromosomal evolution in mammals
Proc. Natl. Acad. Sci. USA
 , 
1977
, vol. 
74
 (pg. 
3942
-
3946
)
Chang
J. T.
Full reconstruction of Markov models on evolutionary trees: Identifiability and consistency
Math. Biosci.
 , 
1996
, vol. 
137
 (pg. 
51
-
73
)
Dalquest
W. W.
Mammals of the Coffee Ranch local fauna Hemphillian of Texas
 , 
1983
Texas Memorial Museum, University of Texas at Austin
Daly
J. C.
Patton
J. L.
Dispersal, gene flow, and allelic diversity between local populations of Thomomys bottae pocket gophers in the Coastal Ranges of California
Evolution
 , 
1990
, vol. 
44
 (pg. 
1283
-
1294
)
Davis
W. B.
Variations in Townsend pocket gophers
J. Mammal.
 , 
1937
, vol. 
18
 (pg. 
145
-
158
)
Davis
W. B.
Relation of size of pocket gophers to soil and altitude
J. Mammal.
 , 
1938
, vol. 
19
 (pg. 
338
-
342
)
Degnan
J. H.
Rosenberg
N. A.
Discordance of species trees with their most likely gene trees
PLoS Genet.
 , 
2006
, vol. 
2
 (pg. 
0762
-
0768
)
Degnan
J. H.
Salter
L. A.
Gene tree distributions under the coalescent process
Evolution
 , 
2005
, vol. 
59
 (pg. 
24
-
37
)
Dolman
G.
Moritz
C.
A multilocus perspective on refugial isolation and divergence in rainforest skinks (Carlia)
Evolution
 , 
2006
, vol. 
60
 (pg. 
573
-
582
)
Drummond
A. J.
Rambaut
A.
BEAST v1.0
 , 
2003
 
Drummond
A. J.
Ho
S. Y. W.
Phillips
M. J.
Rambaut
A.
Relaxed phylogenetics and dating with confidence
PLoS Biol.
 , 
2006
, vol. 
4
 pg. 
e88
 
Edwards
S. V.
Beerli
P.
Perspective: Gene divergence, population divergence, and the variance in coalescence time in phylogeographic studies
Evolution
 , 
2000
, vol. 
54
 (pg. 
1839
-
1854
)
Edwards
S. V.
Liu
L.
Pearl
D. K.
High resolution species trees without concatenation
Proc. Natl. Acad. Sci. USA
 , 
2007
, vol. 
104
 (pg. 
5936
-
5941
)
Felsenstein
J.
Coalescents and species trees
Inferring phylogenies
 , 
2004
Sunderland, Massachusetts
Sinauer Associates
(pg. 
488
-
495
)
Gatesy
J.
Baker
R. H.
Hidden likelihood support in genomic data: Can forty–five wrongs make a right?
Syst. Biol.
 , 
2005
, vol. 
54
 (pg. 
483
-
492
)
Getz
L. L.
Color variation in pocket gophers, Thomomys
J. Mammal.
 , 
1957
, vol. 
38
 (pg. 
523
-
526
)
Goodman
S. N.
Toward evidence-based medical statistics. 2. The Bayes Factor
Ann. Intern. Med.
 , 
1999
, vol. 
130
 (pg. 
1005
-
1013
)
Hafner
M. S.
Demastes
J. W.
Hafner
D. J.
Spradling
T. A.
Sudman
P. D.
Nadler
S. A.
Age and movement of a hybrid zone: Implications for dispersal distance in pocket gophers and their chewing lice
Evolution
 , 
1998
, vol. 
52
 (pg. 
278
-
282
)
Hafner
M. S.
Hafner
J. C.
Patton
J. L.
Smith
M. F.
Macrogeographic patterns of the genetic differentiation in the pocket gopher Thomomys umbrinus
Syst. Zool.
 , 
1987
, vol. 
36
 (pg. 
18
-
34
)
Hasegawa
M.
Kishino
H.
Yano
T.
Dating of the human ape splitting by a molecular clock of mitochondrial-DNA
J. Mol. Evol.
 , 
1985
, vol. 
22
 (pg. 
160
-
174
)
Hey
J.
Machado
C. A.
The study of structured populations—New hope for a difficult and divided science
Nat. Rev. Genet.
 , 
2003
, vol. 
4
 (pg. 
535
-
543
)
Hey
J.
Nielsen
R.
Multilocus methods for estimating population sizes, migration rates and divergence time, with applications to the divergence of Drosophila pseudoobscura, D. persimilis
Genetics
 , 
2004
, vol. 
167
 (pg. 
747
-
760
)
Hey
J.
Nielsen
R.
Integration within the Felsenstein equation for improved Markov chain Monte Carlo methods in population genetics
Proc. Nat. Acad. Sci. USA
 , 
2007
, vol. 
104
 (pg. 
2785
-
2790
)
Huelsenbeck
J. P.
Rannala
B.
Frequentist properties of Bayesian posterior probabilities of phylogenetic trees under simple and complex substitution models
Syst. Biol.
 , 
2004
, vol. 
53
 (pg. 
904
-
913
)
Korth
W. W.
Stehli
F. G.
Jones
D. S.
Geomyidae
The Tertiary record of rodents in North America. Topics in geobiology, volume 12
 , 
1994
New York
Springer
(pg. 
199
-
212
)
Korth
W. W.
Chaney
D. S.
A new subfamily of Geomyoid rodents (Mammalia) and a possible origin of Geomyidae
J. Paleo.
 , 
1999
, vol. 
73
 (pg. 
1191
-
1200
)
Kubatko
L. S.
Degnan
J. H.
Inconsistency of phylogenetic estimates from concatenated data under coalescence
Syst. Biol.
 , 
2007
, vol. 
56
 (pg. 
17
-
24
)
Lanave
C.
Preparata
G.
Sacone
C.
Serio
G.
A new method for calculating evolutionary substitution rates
J. Mol. Evol.
 , 
1984
, vol. 
20
 (pg. 
86
-
93
)
Lee
M. S. Y.
Hugall
A. F.
Model type, implicit data weighting, and model averaging in phylogenetics
Mol. Phylogenet. Evol.
 , 
2006
, vol. 
38
 (pg. 
846
-
857
)
Lemmon
A. R.
Moriarty
E. C.
The importance of proper model assumption in Bayesian phylogenetics
Syst. Biol.
 , 
2004
, vol. 
53
 (pg. 
265
-
277
)
Lindsay
E.
Mou
Y.
Downs
W.
Pederson
J.
Kelly
T. S.
Henry
C.
Trexler
J.
Recognition of the Hemphillian/Blancan boundary in Nevada
J. Vert. Paleo.
 , 
2002
, vol. 
22
 (pg. 
429
-
442
)
Liu
L.
Edwards
S. V.
Belfiore
N. M.
Brumfield
R. T.
Pearl
D. K.
Estimating species trees using multiple-allele data
Evolution
 , 
2008
 
(in press)
Liu
L.
Pearl
D. K.
Species trees from gene trees: Reconstuction Bayesian posterior distributions of a species phylogeny using estimated gene tree distributions
Syst. Biol.
 , 
2007
, vol. 
56
 (pg. 
504
-
514
)
Maddison
W. P.
Gene trees in species trees
Syst. Biol.
 , 
1997
, vol. 
46
 (pg. 
523
-
536
)
Maddison
W. P.
Knowles
L. L.
Inferring phylogeny despite incomplete lineage sorting
Syst. Biol.
 , 
2006
, vol. 
55
 (pg. 
21
-
30
)
Nei
M.
Molecular evolutionary genetics
 , 
1987
New York
Columbia University Press
Neigel
J. E.
Avise
J. C.
Nevo
E.
Karlin
S.
Phylogenetic relationships of mitochondrial DNA under various demographic models of speciation
Evolutionary processes and theory
 , 
1986
New York
Academic Press
(pg. 
515
-
534
)
Nevo
E.
Kim
Y. J.
Shaw
C. R.
Thaeler
C. S.
Genetic variation, selection and speciation in Thomomys talpoides pocket gophers
Evolution
 , 
1974
, vol. 
28
 (pg. 
1
-
23
)
Nylander
J. A. A.
MrModelTest v2. Program distributed by the editor
 , 
2004
Evolutionary Biology Centre, Uppsala University
 
[24 instead of 56 models of nucleotide substitution.]
Pamilo
P.
Nei
M.
Relationships between gene trees and species trees
Mol. Biol. Evol.
 , 
1988
, vol. 
5
 (pg. 
568
-
583
)
Patterson
B. D.
Ceballos
G.
Sechrest
W.
Tognelli
M. F.
Brooks
T.
Luna
L.
Ortega
P.
Salazar
I.
Young
B. E.
Digital distribution. Maps of the mammals of the Western Hemisphere, version 2.0
 , 
2005
Arlington, Virginia
NatureServe
Patton
J. L.
Patterns of geographic variation in karyotype in the pocket gopher. Thomomys bottae (Eydoux and Gervais)
Evolution
 , 
1972
, vol. 
26
 (pg. 
574
-
586
)
Patton
J. L.
Population structure and the genetics of speciation in pocket gophers, genus Thomomys
Acta Zool. Fennica
 , 
1985
, vol. 
170
 (pg. 
109
-
114
)
Patton
J. L.
Nevo
E.
Reig
O. A.
Geomyid evolution: The historical, selective, and random basis for divergence patterns within and among species
Evolution of subterranean mammals at the organismal and molecular levels
 , 
1990
New York
Wiley-Liss
(pg. 
49
-
69
)
Patton
J. L.
Harrison
R. G.
Hybridization and hybrid zones in pocket gophers (Rodentia, Geomyidae)
Hybrid zones and the evolutionary process
 , 
1993
New York
Oxford University Press
(pg. 
290
-
308
)
Patton
J. L.
Wilson
D. E.
Reeder
D. M.
Family Geomyidae
Mammal species of the world: A taxonomic and geographic reference
 , 
2005
3rd edition
Baltimore, Maryland
The Johns Hopkins University Press
(pg. 
859
-
870
)
Patton
J. L.
Selander
R. K.
Smith
M. H.
Genic variation in hybridizing populations of gophers (genus Thomomys)
Syst. Zool.
 , 
1972
, vol. 
21
 (pg. 
263
-
270
)
Patton
J. L.
Sherwood
S. W.
Genome evolution in pocket gophers (genus Thomomys) I
Heterochromatin variation and speciation potential. Chromosoma
 , 
1982
, vol. 
85
 (pg. 
149
-
162
)
Patton
J. L.
Smith
M. F.
Molecular evolution in Thomomys: Phyletic systematics, paraphyly, and rates of evolution
J. Mammal.
 , 
1981
, vol. 
62
 (pg. 
493
-
500
)
Patton
J. L.
Smith
M. F.
Otte
D.
Endler
J. A.
Population structure and the genetic and morphologic divergence among pocket gopher species (genus Thomomys)
Speciation and its consequences
 , 
1989
Sunderland, Massachusetts
Sinauer Associates
(pg. 
284
-
304
)
Patton
J. L.
Smith
M. F.
The evolutionary dynamics of the pocket gopher Thomomys bottae, with emphasis on California populations
Univ. Calif. Publ. Zool.
 , 
1990
, vol. 
123
 (pg. 
1
-
161
)
Patton
J. L.
Smith
M. F.
Paraphyly, polyphyly, and the nature of species boundaries in pocket gophers (genus Thomomys)
Syst. Biol.
 , 
1994
, vol. 
43
 (pg. 
11
-
26
)
Patton
J. L.
Smith
M. F.
Price
R. D.
Hellenthal
R. A.
Genetics of hybridization between the pocket gophers Thomomys bottae Thomomys townsendii in northeastern California
Great Basin Naturalist
 , 
1984
, vol. 
44
 (pg. 
431
-
440
)
Philippe
H.
Delsuc
F.
Brinkmann
H.
Lartillot
N.
Phylogenomics
Annu. Rev. Ecol. Evol. Syst.
 , 
2005
, vol. 
36
 (pg. 
541
-
562
)
Posada
D.
Crandall
K. A.
ModelTest: Testing the model of DNA substitution
Bioinformatics
 , 
1998
, vol. 
14
 (pg. 
817
-
818
)
Reid
F. A.
Peterson field guide to mammals of North America
 , 
2006
4th edition
Houghton Mifflin
Rogers
J. S.
Maximum likelihood estimation of phylogenetic trees is consistent when substitution rates vary according to the invariable sites plus gamma distribution
Syst. Biol.
 , 
2001
, vol. 
50
 (pg. 
713
-
722
)
Rogers
M. A.
Evolutionary differentiation within the Northern Great Basin pocket gopher Thomomys townsendii, I. Morphological variation
Great Basin Naturalist
 , 
1991
, vol. 
51
 (pg. 
109
-
126
)
Rogers
M. A.
Evolutionary differentiation within the Northern Great Basin pocket gopher Thomomys townsendii, II. Genetic variation and biogeographic consequences
Great Basin Naturalist
 , 
1991
, vol. 
51
 (pg. 
127
-
152
)
Rokas
A.
Williams
B. L.
King
N.
Carroll
S. B.
Genome-scale approaches to resolving incongruence in molecular phylogenies
Nature
 , 
2003
, vol. 
425
 (pg. 
798
-
804
)
Ronquist
F.
Huelsenbeck
J. P.
MrBayes 3: Bayesian phylogenetic inference under mixed models
Bioinformatics
 , 
2003
, vol. 
19
 (pg. 
1572
-
1574
)
Rosenberg
N. A.
The probability of topological concordance of gene trees and species trees
Theor. Popul. Biol.
 , 
2002
, vol. 
61
 (pg. 
225
-
247
)
Rosenberg
N. A.
The shapes of neutral gene genealogies in two species: Probabilities of monophyly, paraphyly, and polyphyly in a coalescent model
Evolution
 , 
2003
, vol. 
57
 (pg. 
1465
-
1477
)
Rosenberg
N. A.
Nordberg
M.
Genealogical trees, coalescent theory and the analysis of genetic polymorphisms
Nat. Rev. Genet.
 , 
2002
, vol. 
3
 (pg. 
380
-
390
)
Sherwood
S. W.
Patton
J. L.
Genome evolution in pocket gophers genus Thomomys 2
Variation in cellular DNA content. Chromosoma
 , 
1982
, vol. 
85
 (pg. 
163
-
180
)
Shotwell
J. A.
Late Tertiary Geomyoid rodents of Oregon
 , 
1967
 
Mus. Nat. Hist. Bull. No. 9. Univ. Oregon
Slowinski
J. B.
Page
R. D. M.
How should species phylogenies be inferred from sequence data?
Syst. Biol.
 , 
1999
, vol. 
48
 (pg. 
814
-
825
)
Smith
M. F.
Phylogenetic relationships and geographic structure in pocket gophers in the genus Thomomys
Mol. Phylogenet. Evol.
 , 
1998
, vol. 
9
 (pg. 
1
-
14
)
Spradling
T. A.
Brant
S. V.
Hafner
M. S.
Dickerson
C. J.
DNA data support a rapid radiation of pocket gopher genera (Rodentia: Geomyidae), J
Mammal. Evolution
 , 
2004
, vol. 
11
 (pg. 
105
-
125
)
Steel
M.
Matsen
F. A.
The Bayesian “star paradox” persists for long finite sequences
Mol. Biol. Evol.
 , 
2007
, vol. 
24
 (pg. 
1075
-
1079
)
Steinberg
E. K.
Patton
J. L.
Lacey
E. A.
Patton
J. L.
Cameron
G. N.
Genetic structure and the geography of speciation in subterranean rodents: Opportunities and constraints for evolutionary diversification
Life underground: The biology of subterranean rodents
 , 
2000
Chicago, Illinois
The University of Chicago Press
(pg. 
301
-
331
)
Swofford
D. L.
PAUP*: Phylogenetic analysis using parsimony (*and other methods). Version 4
 , 
2003
Sunderland, MA
Sinauer Assoc.
Takahata
N.
Gene genealogy in three related populations: Consistency probability between gene and population trees
Genetics
 , 
1989
, vol. 
122
 (pg. 
957
-
966
)
Tedford
R. H.
Albright
L. B.
Barnosky
A. D.
Ferrusquia-Villafranca
I.
Hunt
R. M.
Storer
J. E.
Swisher
C. C.
Voorhies
M. R.
Webb
S. D.
Whistler
D. P.
Woodburne
M. O.
Mammalian biochronology of the Arikareean through Hemphillian interval (late Oligocene through early Pliocene epochs)
Late Cretaceous and Cenozoic mammals of North America: Biostratigraphy and geochronology
 , 
2004
New York
Columbia University Press
(pg. 
169
-
231
)
Thaeler
C. S.
An analysis of the distribution of pocket gopher species in northeastern California (genus Thomomys), volume 86
 , 
1968
Berkeley, California
University of California Publications in Zoology. University of California Press
Thaeler
C. S.
Karyotypes of sixteen populations of the Thomomys talpoides complex of pocket gophers (Rodentia-Geomyidae)
Chromosoma
 , 
1968
, vol. 
25
 (pg. 
172
-
183
)
Thaeler
C. S.
Four contacts between ranges of different chromosome forms of the Thomomys talpoides complex (Rodentia: Geomyidae)
Syst. Biol.
 , 
1974
, vol. 
23
 (pg. 
343
-
354
)
Thaeler
C. S.
Chromosome numbers and systematic relations in the genus Thomomys (Rodentia: Geomyidae)
J. Mammal.
 , 
1980
, vol. 
61
 (pg. 
414
-
422
)
Thompson
J. D.
Gibson
T. J.
Plewniak
F.
Jeanmougin
F.
Higgins
D. G.
The ClustalX windows interface: Flexible strategies for multiple sequence alignment aided by quality analysis tools
Nucleic Acids Res.
 , 
1997
, vol. 
24
 (pg. 
4876
-
4882
)
Webb
S. D.
Opdyke
N. D.
Global climatic influence on Cenozoic land mammal faunas
Effects of past global change on life
 , 
1995
Washington, DC
Board on Earth Science Research, Commission on Geosciences. Environmental Research Nature Research Council, National Academy Press
(pg. 
184
-
208
)
Wilgenbusch
J. C.
Warren
D. L.
Swofford
D. L.
AWTY: A system for graphical exploration of MCMC convergence in Bayesian phylogenetic inference
 , 
2004
 
Wood
A. E.
Parallel radiation among the Geomyoid rodents
J. Mammal.
 , 
1937
, vol. 
18
 (pg. 
171
-
176
)
Woodburne
M. O.
Swisher
C. C.
III
Berggren
W. A.
Kent
D. V.
Hardenbol
J.
Land mammal high-resolution geochronology, intercontinental overland dispersals, sea level, climate, and vicariance
Geochronology, time-scales and global stratigraphic correlation: Framework for an historical geology
 , 
1995
, vol. 
volume 54
 
Tulsa, OK
Society of Stratigraphic Geology Special Publication
(pg. 
335
-
364
)
Zwickl
D. J.
Holder
M. T.
Model parameterization, prior distributions, and the general time-reversible model in Bayesian phylogenetics
Syst. Biol.
 , 
2004
, vol. 
53
 (pg. 
877
-
888
)

Identity, locality, and abbreviations for all Thomomys samples included in the analysis. MVZ = Museum of Vertebrate Zoology, University of California, Berkeley; mtDNA = mitochondrial DNA; 2n = diploid number; Fn = fundamental number.

Species Subspecies Prior basis for genetic differentiation MVZ no. County, State Abbreviation 
Thomomys bottae awahnee Genetically distinct in mtDNA studies 158708 Mariposa Co. CA Tbaw a 
 awahnee Genetically distinct in mtDNA studies 201577 Mariposa Co. CA Tbaw b 
 xerophilus Baja California allozyme group 153684 Baja Norte, BC Tbbj a 
 cactophilus Baja California allozyme group 153694 Baja del Sur, BC Tbbj b 
 albatus Basin and Range allozyme group 156116 Imperial Co. CA Tbbr a 
 ruidosae Basin and Range allozyme group 147023 Lincoln Co. NM Tbbr b 
 bottae Central California allozyme group 166821 Monterey Co. CA Tbcc a 
 alpinus Central California allozyme group 164670 Tulare Co. CA Tbcc b 
 riparius Great Basin allozyme group 148289 Riverside Co. CA Tbgb a 
 mewa Great Basin allozyme group 162920 Fresno Co. CA Tbgb b 
 saxatilis Northern California allozyme group 160762 Lassen Co. CA Tbnc a 
 laticeps Northern California allozyme group 160618 Humboldt Co. CA Tbnc b 
Thomomys townsendii townsendii Snake River Plain group 163705 Payette Co. ID Tto a 
 relictus Humboldt River and NE California group 175669 Lassen Co. CA Tto b 
Thomomys umbrinus chihuahuae Sierra Madre clade 150425 Durango Tum a 
 atroavarius Coastal clade (basal) 153745 Sinaloa Tum b 
Thomomys bulbivorus   218819 Lane Co. OR Tbu 
Thomomys talpoides yakimensis 2n = 40, > 1000 mi from Uinta Co. 176467 Yakima Co. WA Tta a 
 bridgeri 2n = 40, Fn = 70, > 1000 mi from Yakima Co. 179622 Uinta Co. CO Tta b 
 ocius 2n = 56, Fn = 78 179659 Moffat Co. CO Tta c 
Thomomys idahoensis pygmaeus 2n = 58, Fn = 76 179647 Uinta Co. WY Tid a 
 pygmaeus 2n = 58, Fn = 76 179648 Uinta Co. WY Tid b 
Thomomys mazama mazama 2n = 56 171042 Siskiyou Co. CA Tma a 
 nasicus 2n = 58 176438 Deschutes Co. OR Tma b 
Thomomys monticola  Tuolumne drainage mtDNA clade 201631 Tuolumne Co. CA Tmo a 
  Merced drainage mtDNA clade 208607 Sierra Co. CA Tmo b 
Species Subspecies Prior basis for genetic differentiation MVZ no. County, State Abbreviation 
Thomomys bottae awahnee Genetically distinct in mtDNA studies 158708 Mariposa Co. CA Tbaw a 
 awahnee Genetically distinct in mtDNA studies 201577 Mariposa Co. CA Tbaw b 
 xerophilus Baja California allozyme group 153684 Baja Norte, BC Tbbj a 
 cactophilus Baja California allozyme group 153694 Baja del Sur, BC Tbbj b 
 albatus Basin and Range allozyme group 156116 Imperial Co. CA Tbbr a 
 ruidosae Basin and Range allozyme group 147023 Lincoln Co. NM Tbbr b 
 bottae Central California allozyme group 166821 Monterey Co. CA Tbcc a 
 alpinus Central California allozyme group 164670 Tulare Co. CA Tbcc b 
 riparius Great Basin allozyme group 148289 Riverside Co. CA Tbgb a 
 mewa Great Basin allozyme group 162920 Fresno Co. CA Tbgb b 
 saxatilis Northern California allozyme group 160762 Lassen Co. CA Tbnc a 
 laticeps Northern California allozyme group 160618 Humboldt Co. CA Tbnc b 
Thomomys townsendii townsendii Snake River Plain group 163705 Payette Co. ID Tto a 
 relictus Humboldt River and NE California group 175669 Lassen Co. CA Tto b 
Thomomys umbrinus chihuahuae Sierra Madre clade 150425 Durango Tum a 
 atroavarius Coastal clade (basal) 153745 Sinaloa Tum b 
Thomomys bulbivorus   218819 Lane Co. OR Tbu 
Thomomys talpoides yakimensis 2n = 40, > 1000 mi from Uinta Co. 176467 Yakima Co. WA Tta a 
 bridgeri 2n = 40, Fn = 70, > 1000 mi from Yakima Co. 179622 Uinta Co. CO Tta b 
 ocius 2n = 56, Fn = 78 179659 Moffat Co. CO Tta c 
Thomomys idahoensis pygmaeus 2n = 58, Fn = 76 179647 Uinta Co. WY Tid a 
 pygmaeus 2n = 58, Fn = 76 179648 Uinta Co. WY Tid b 
Thomomys mazama mazama 2n = 56 171042 Siskiyou Co. CA Tma a 
 nasicus 2n = 58 176438 Deschutes Co. OR Tma b 
Thomomys monticola  Tuolumne drainage mtDNA clade 201631 Tuolumne Co. CA Tmo a 
  Merced drainage mtDNA clade 208607 Sierra Co. CA Tmo b