Abstract
A phylogeny of tetrapods is inferred from nearly complete sequences of the nuclear RAG-1 gene sampled across 88 taxa encompassing all major clades, analyzed via parsimony and Bayesian methods. The phylogeny provides support for Lissamphibia, Theria, Lepidosauria, a turtle-archosaur clade, as well as most traditionally accepted groupings. This tree allows simultaneous molecular clock dating for all tetrapod groups using a set of well-corroborated calibrations. Relaxed clock (PLRS) methods, using the amniote = 315 Mya (million years ago) calibration or a set of consistent calibrations, recovers reasonable divergence dates for most groups. However, the analysis systematically underestimates divergence dates within archosaurs. The bird-crocodile split, robustly documented in the fossil record as being around ∼ 245 Mya, is estimated at only ∼ 190 Mya, and dates for other divergences within archosaurs are similarly underestimated. Archosaurs, and particulary turtles have slow apparent rates possibly confounding rate modeling, and inclusion of calibrations within archosaurs (despite their high deviances) not only improves divergence estimates within archosaurs, but also across other groups. Notably, the monotreme-therian split (∼ 210 Mya) matches the fossil record; the squamate radiation (∼ 190 Mya) is younger than suggested by some recent molecular studies and inconsistent with identification of ∼ 220 and ∼ 165 Myo (million-year-old) fossils as acrodont iguanians and ∼ 95 Myo fossils colubroid snakes; the bird-lizard (reptile) split is considerably older than fossil estimates (≤ 285 Mya); and Sphenodon is a remarkable phylogenetic relic, being the sole survivor of a lineage more than a quarter of a billion years old. Comparison with other molecular clock studies of tetrapod divergences suggests that the common practice of enforcing most calibrations as minima, with a single liberal maximal constraint, will systematically overestimate divergence dates. Similarly, saturation of mitochondrial DNA sequences, and the resultant greater compression of basal branches means that using only external deep calibrations will also lead to inflated age estimates within the focal ingroup.
Despite being the most heavily studied organisms, relationships between the major groups of land vertebrates (tetrapods) remain incompletely known. The long history of morphological work has failed to resolve the affinities of many highly divergent groups; recently molecular data have resolved many uncertainties but conversely challenged some previously well-accepted groups. Currently, there is strong morphological and molecular evidence for the monophyly of the following higher-level groups: lissamphibians, amniotes, mammals, sauropsids (“reptiles” and birds), and lepidosaurs. However, there remains uncertainty in several other areas (see Meyer and Zardoya, 2003, and references therein). The interrelationships of the three lineages of living amphibians remain uncertain, with both morphological and molecular data favoring either the urodele-caecilian or urodele-anuran hypothesis; the most recent studies (e.g., San Mauro et al., 2005; Zhang et al., 2005) favor the latter but support is not strong. Similarly, long-accepted relationships among monotreme, marsupial, and placental mammals were challenged when some nucleotide and DNA hybridization studies favored a heterodox monotreme and marsupial clade; more recent studies have suggested that this might be due to base composition bias (Phillips and Penny, 2003) and nuclear data corroborate the traditional hypothesis (e.g., Killian et al., 2001; Baker et al., 2004; van Rheede et al., 2006). The monophyly of diapsids and of archosaurs was questioned when multiple molecular studies suggested that turtles might be nested within diapsids, and possibly even within archosaurs. Nevertheless, these results are problematic because of the almost total lack of supporting morphological evidence uniting archosaurs and turtles (see Rieppel, 2000); furthermore, the molecular datasets disagree on the exact position of turtles within archosauromorphs, and “further molecular clarification is needed” (Meyer and Zardoya, 2003).
Estimates for the divergence times of the major groups of tetrapods are also contentious, with molecular clock estimates often at odds with fossil-based estimates. A common pattern is that the molecular dates imply much earlier divergences than those suggested by the fossil record, e.g. within birds (Cooper and Penny, 1997; Pereira and Baker, 2006), within mammals (Kumar and Hedges, 1998; Penny et al., 1999; Springer et al., 2003), and within amphibians (San Mauro et al., 2005; Zhang et al., 2005). Within mammals, the molecular data suggests that divergences between monotremes, marsupials, and placentals were very closely spaced in time, at odds with morphological and palaeontological evidence that places monotremes quite distant to the other two living groups (Phillips and Penny, 2003). The divergence dates inferred from molecular clock studies, if robust, can also be used to refute phylogenetic hypotheses about fossil taxa; for instance, claims of very early advanced (colubroid) snakes. However, rapid advances in molecular clock methodology (e.g., Welch and Bromham, 2005) suggest that the earlier studies need to be revisited. Furthermore, only two clock analyses (Vidal and Hedges, 2005; Wiens et al., 2006) have so far been applied to date divergences among the most diverse clade of amniotes, the squamates (lizards and snakes). Vidal and Hedges inferred divergences much deeper than those suggested by both Wiens et al. and the fossil record (e.g., Evans, 2003).
Although molecular data promise to resolve many of the above uncertainties about phylogenetic relationships and divergence dates, many of the previous data sets are inconclusive for several reasons. First, until recently the only well-sampled data sets with long sequences used mitochondrial DNA (e.g., Janke et al., 2001; Rest et al., 2003; Pereira and Baker, 2006), which has evolutionary dynamics (fast rate, saturation, nonstationarity) that can create misleading topologies and/or branch lengths at the divergences considered here (e.g., Springer et al., 2001; Phillips and Penny, 2003). Second, many of the studies which used more appropriate data such as multiple nuclear genes (e.g., Kumar and Hedges, 1998) had insufficient taxon sampling, with amphibians and reptiles often being poorly represented and many key groups (e.g., sphenodontids) not sampled at all. This sparse and unbalanced sampling can lead to errors reconstructing both topology and branch lengths and thus the pattern and timing of divergences (e.g., Zwickl and Hillis, 2002). Surprisingly, there has yet to be a long nuclear gene sequence sampled extensively across every major group of tetrapods and their nearest sarcopterygian outgroups. This is due to nuclear genes being poorly sampled in lower tetrapods in general, and the most commonly used gene (c-mos) is often represented by very short (375 bp) fragments. Thus, the extensive nuclear sequence data sets required for testing hypotheses about phylogeny and diversification times in tetrapods do not yet exist.
RAG-1 (recombination-activating gene 1) is an ideal locus for this purpose. It is a long (∼ 3 kb) gene found across vertebrates that exists as a single copy and is uninterrupted by introns (Groth and Barrowclough, 1999, and references therein). It has an overall evolutionary rate appropriate to the divergence scales of interest, and furthermore contains slightly faster and slightly slower regions that could resolve problems at different time scales. In separate studies, large regions of this gene have been sampled across several tetrapod groups, but as these studies proceeded independently, taxon sampling has been haphazard. Most of the gene (∼ 2.8 kb) has now been sequenced in studies dealing with lungfishes and coelacanths (Brinkmann et al., 2004), birds (Groth and Barrowclough, 1999), turtles (Krenz et al., 2005), and lepidosaurs (Townsend et al., 2004); in addition, general genomic studies have resulted in sequences for a urodele, a marsupial, and several placental mammals being available on GenBank. However, major gaps remain within monotremes, marsupials, anurans, urodeles, and caecilians, though smaller (1 to 1.5 kb) fragments of RAG-1 have been sequenced for these taxa (e.g., San Mauro et al., 2004; Baker et al., 2004).
The current study aims to use nearly complete sequences of RAG-1 across a dense sample of tetrapods to elucidate broad evolutionary patterns. Complete RAG-1 is well sampled across birds and reptiles, but relatively poorly sampled across other land vertebrates. Additional sequences were obtained from lungfishes, amphibians, monotremes, and marsupials; this denser taxon sampling allows a wider choice of calibration points, and more accurate estimates of model parameters and thus, phylogenetic trees (Zwickl and Hillis, 2002). Phylogenetic analyses were then performed using parsimony and Bayesian methods. The optimal trees were tested and corrected for rate variation, and multiple calibration points were used to infer divergence dates between major tetrapod groups, especially within squamates. This dated phylogeny sheds new light on the timing and pattern of the radiation of land vertebrates.
Sequencing and Alignment
Taxon Sampling and Sequencing
Twelve additional mammals and amphibians were sequenced (Table 1) to improve taxon sampling for the complete RAG-1 gene. Some of these species have already been sequenced for smaller portions of RAG-1, and in these instances only the remaining regions were targeted. Trees were rooted with the coelacanth (Latimeria); outgroups more divergent than sarcopterygian fish were not used because of inclusion of these deeply divergent groups led to much larger regions of alignment ambiguity. In addition, for the clades that were densely sampled in previous studies (squamates, crocodylians, turtles, and birds), certain species were omitted in order to keep the taxon set tractable for detailed phylogenetic analyses and maintain relatively balanced numbers across major lineages to aid phylogenetic reconstruction. All major groupings in these clades were represented in the retained exemplar set, and this pruning had no topological effect, as reconstructed relationships were congruent with those obtained using the original full data sets. All new sequences were determined for both strands using direct automated sequencing from PCR products. Detailed information on specimens, primers, PCR, and sequencing are available in the supplementary information (available online at http://systematicbiology.org).
Taxa and GenBank accessions. Taxa with accession numbers for both this study and GenBank are composites of our 5′ end sequence and a Genbank 3′ end sequence. Names are from current (October, 2005) GenBank accessions.
| Higher Taxon | Genus | Species | This study | GenBank | Higher Taxon | Genus | Species | This study | GenBank |
|---|---|---|---|---|---|---|---|---|---|
| Squamates | Xenosaurus | grandis | AY662607 | Geochelone | pardalis | AY687912 | |||
| Lialis | jicari | AY662628 | Dermatemys | mawii | AY687910 | ||||
| Leposoma | parietale | AY662621 | Platysternon | megacephalum | AY687905 | ||||
| Rhineura | floridana | AY662618 | Mammals | Monodelphis | domestica | U51897 | |||
| Shinisaurus | crocodylurus | AY662610 | Notoryctes | typhlops | EF551555 | AY125040 | |||
| Elgaria | panamintina | AY662603 | Sarcophilus | harrisii | EF551556 | AY125037 | |||
| Heloderma | suspectum | AY662606 | Cercartetus | concinnus | EF551557 | AY125036 | |||
| Eremias | sp. | AY662615 | Tachyglossus | culeatus | EF551558 | AF303971 | |||
| Xantusia | vigilis | AY662642 | Ornithorhynchus | natinus | EF551559 | AF303974 | |||
| Varanus | griseus | AY662608 | Homo | sapiens | M29474 | ||||
| Lanthanotus | borneensis | AY662609 | Oryctolagus | cuniculus | M77666 | ||||
| Cylindrophis | ruffus | AY662613 | Sus | scrofa | AB091392 | ||||
| Dinodon | sp. | AY662611 | Lama | glama | AF305953 | ||||
| Ramphotyphlops | braminus | AY662612 | Rattus | norvegicus | XM230375 | ||||
| Basiliscus | plumifrons | AY662599 | Apomys | hylocoetes | AY294942 | ||||
| Anolis | paternus | AY662589 | Mus | musculus | NM009019 | ||||
| Leiocephalus | carinatus | AY662598 | Elephas | maximus | EF551560 | AY125021 | |||
| Brookesia | thieli | AY662577 | Amphibians | Pleurodeles | waltl | AJ010258 | |||
| Leiolepis | belliana | AY662587 | Ambystoma | mexicanum | EF551561 | AY323752 | |||
| Japalura | tricarinata | AY662585 | Xenopus | laevis | L19324 | ||||
| Calotes | calotes | AY662584 | Litoria | ewingii | EF551562 | ||||
| Physignathus | cocincinus | AY662582 | Typhlonectes | natans | EF551566 | ||||
| Ctenophorus | salinarum | AY662580 | Hypogeophis | rostratus | EF551565 | ||||
| Chamaeleo | rudis | AY662578 | Ichthyophis | glutinosus | EF551563 | AY456256 | |||
| Aspidoscelis | tigris | AY662620 | Rhinatrema | bivittatum | EF551564 | AY456257 | |||
| Bipes | biporus | AY662616 | Crocodiles | Alligator | mississipiensis | AF143724 | |||
| Dibamus | sp. | AY662645 | Gavialis | gangeticus | AF143725 | ||||
| Ctenotus | robustus | AY662630 | Caiman | latirostris | AY239167 | ||||
| Euprepis | uratus | AY662629 | Tomistoma | schlegelii | AY239176 | ||||
| Typhlosaurus | lomii | AY662641 | Crocodylus | cataphractus | AY239174 | ||||
| Eumeces | nthracinus | AY662634 | Birds | Gallus | gallus | M58530 | |||
| Asymblepharus | sikimmensis | AY662631 | Megapodius | freycinet | AF143731 | ||||
| Zonosaurus | sp. | AY662644 | Anas | strepera | AF143729 | ||||
| Pseudothecadactylus | lindneri | AY662626 | Gavia | immer | AF143733 | ||||
| Gekko | gecko | AY662625 | Spheniscus | humboldti | AF143734 | ||||
| Crenadactylus | ocellatus | AY662627 | Charadrius | vociferus | AF143736 | ||||
| Sphenodon | punctatus | AY662576 | Grus | canadensis | AF143732 | ||||
| Turtles | Carettochelys | insculpta | AY687904 | Coracias | caudata | AF143737 | |||
| Lissemys | punctata | AY687902 | Passer | montanus | AF143738 | ||||
| Apalone | spinifera | AY687901 | Tinamus | guttatus | AF143726 | ||||
| Chelonia | mydas | AY687907 | Struthio | camelus | AF143727 | ||||
| Podocnemis | expansa | AY687924 | Outgroups | Protopterus | dolloi | AY442928 | |||
| Pelusios | williamsi | AY687923 | Lepidosiren | paradoxa | AY442926 | ||||
| Elseya | latisternum | AY687920 | Latimeria | menadoensis | AY442925 |
| Higher Taxon | Genus | Species | This study | GenBank | Higher Taxon | Genus | Species | This study | GenBank |
|---|---|---|---|---|---|---|---|---|---|
| Squamates | Xenosaurus | grandis | AY662607 | Geochelone | pardalis | AY687912 | |||
| Lialis | jicari | AY662628 | Dermatemys | mawii | AY687910 | ||||
| Leposoma | parietale | AY662621 | Platysternon | megacephalum | AY687905 | ||||
| Rhineura | floridana | AY662618 | Mammals | Monodelphis | domestica | U51897 | |||
| Shinisaurus | crocodylurus | AY662610 | Notoryctes | typhlops | EF551555 | AY125040 | |||
| Elgaria | panamintina | AY662603 | Sarcophilus | harrisii | EF551556 | AY125037 | |||
| Heloderma | suspectum | AY662606 | Cercartetus | concinnus | EF551557 | AY125036 | |||
| Eremias | sp. | AY662615 | Tachyglossus | culeatus | EF551558 | AF303971 | |||
| Xantusia | vigilis | AY662642 | Ornithorhynchus | natinus | EF551559 | AF303974 | |||
| Varanus | griseus | AY662608 | Homo | sapiens | M29474 | ||||
| Lanthanotus | borneensis | AY662609 | Oryctolagus | cuniculus | M77666 | ||||
| Cylindrophis | ruffus | AY662613 | Sus | scrofa | AB091392 | ||||
| Dinodon | sp. | AY662611 | Lama | glama | AF305953 | ||||
| Ramphotyphlops | braminus | AY662612 | Rattus | norvegicus | XM230375 | ||||
| Basiliscus | plumifrons | AY662599 | Apomys | hylocoetes | AY294942 | ||||
| Anolis | paternus | AY662589 | Mus | musculus | NM009019 | ||||
| Leiocephalus | carinatus | AY662598 | Elephas | maximus | EF551560 | AY125021 | |||
| Brookesia | thieli | AY662577 | Amphibians | Pleurodeles | waltl | AJ010258 | |||
| Leiolepis | belliana | AY662587 | Ambystoma | mexicanum | EF551561 | AY323752 | |||
| Japalura | tricarinata | AY662585 | Xenopus | laevis | L19324 | ||||
| Calotes | calotes | AY662584 | Litoria | ewingii | EF551562 | ||||
| Physignathus | cocincinus | AY662582 | Typhlonectes | natans | EF551566 | ||||
| Ctenophorus | salinarum | AY662580 | Hypogeophis | rostratus | EF551565 | ||||
| Chamaeleo | rudis | AY662578 | Ichthyophis | glutinosus | EF551563 | AY456256 | |||
| Aspidoscelis | tigris | AY662620 | Rhinatrema | bivittatum | EF551564 | AY456257 | |||
| Bipes | biporus | AY662616 | Crocodiles | Alligator | mississipiensis | AF143724 | |||
| Dibamus | sp. | AY662645 | Gavialis | gangeticus | AF143725 | ||||
| Ctenotus | robustus | AY662630 | Caiman | latirostris | AY239167 | ||||
| Euprepis | uratus | AY662629 | Tomistoma | schlegelii | AY239176 | ||||
| Typhlosaurus | lomii | AY662641 | Crocodylus | cataphractus | AY239174 | ||||
| Eumeces | nthracinus | AY662634 | Birds | Gallus | gallus | M58530 | |||
| Asymblepharus | sikimmensis | AY662631 | Megapodius | freycinet | AF143731 | ||||
| Zonosaurus | sp. | AY662644 | Anas | strepera | AF143729 | ||||
| Pseudothecadactylus | lindneri | AY662626 | Gavia | immer | AF143733 | ||||
| Gekko | gecko | AY662625 | Spheniscus | humboldti | AF143734 | ||||
| Crenadactylus | ocellatus | AY662627 | Charadrius | vociferus | AF143736 | ||||
| Sphenodon | punctatus | AY662576 | Grus | canadensis | AF143732 | ||||
| Turtles | Carettochelys | insculpta | AY687904 | Coracias | caudata | AF143737 | |||
| Lissemys | punctata | AY687902 | Passer | montanus | AF143738 | ||||
| Apalone | spinifera | AY687901 | Tinamus | guttatus | AF143726 | ||||
| Chelonia | mydas | AY687907 | Struthio | camelus | AF143727 | ||||
| Podocnemis | expansa | AY687924 | Outgroups | Protopterus | dolloi | AY442928 | |||
| Pelusios | williamsi | AY687923 | Lepidosiren | paradoxa | AY442926 | ||||
| Elseya | latisternum | AY687920 | Latimeria | menadoensis | AY442925 |
Alignment and Sequence Characteristics
The final aligned data set comprises 88 taxa and 3297 sites and is available via TreeBASE accession S1781. The Townsend et al. (2004) lepidosaur alignment was used as a fixed alignment block, with other taxa assembled and added to this alignment using ClustalX (Thompson et al., 1997) and amino acid sequences. Because of ambiguous alignment at both amino acid and nucleotide levels, the first (5′ end) 474 alignment positions were excluded from analysis (for most taxa this amounts to around the first 360 sequenced sites). The last (3′ end) 210 sites were also excluded to minimize missing data leaving 2613 sites (871 codons). The entire human RAG-1 coding region (GenBank accession M29474) is 3135 nucleotides and our included sites set covers 2592 of these (83%). Nearly all taxa have complete data for this region: three of the five crocodylians (Gatesy et al., 2004) are missing 213 sites at the 5′ end, Lepidosiren is missing 70 sites at the 5′ end, Hypogeophis is missing 137 sites at 3′ end, and some taxa (Sarcophilus harrisii, Ichthyophis glutinosus, Ornithorhynchus anatinus) assembled from concatenating partial GenBank and newly generated sequences have 70 to 90 sites of missing data in the middle region. The 88 taxa data matrix comprises 2613 sites, of which 1729 are variable (first codon: 502 variable sites; second codon: 370 variable sites; third codon: 857 variable sites). This translates to 871 amino acids with 543 variable sites.
Phylogenetic Analyses: Methods and Results
Parsimony
Parsimony analyses (all changes weighted equally) and nonparametric bootstrapping were performed using PAUP* (Swofford, 2000), employing heuristic searches using 200 random additions with TBR branch swapping. Branch support values (Bremer, 1988) were calculated in PAUP using batch commands generated by TreeRot v.2 (Sorenson, 2000), modified to use the above search settings.
The nucleotide data yielded two equally parsimonious trees (length 14,480 steps; consistency index 0.23, retention index 0.64). The two trees are very similar (differing only in some relationships within eutherians) and similar to the 50% majority-rule bootstrap consensus (shown in Fig. 1). The amino acid data yielded 168 parsimonious trees of length 3995. All were very similar to one another and the two nucleotide trees, differing most notably in grouping Sphenodon with the turtle-archosaur lineage.
Parsimony tree, with bootstrap and Bremer support values. Tree is a 50% majority-rule consensus of 1000 nonparametric bootstrap replicates; each of the two MPTs was very similar to this tree.
Parsimony tree, with bootstrap and Bremer support values. Tree is a 50% majority-rule consensus of 1000 nonparametric bootstrap replicates; each of the two MPTs was very similar to this tree.
The parsimony analyses of the nucleotide data retrieved with strong support (> 90% bootstrap) many traditionally accepted higher groupings within tetrapods, such as Mammalia, Archosauria, Lepidosauria, Squamata, Anguimorpha, Iguania, and Serpentes. In addition, the following groupings of major clades within tetrapods, which have been less certain or recently questioned (see introductory section), receive strong support in all the above analyses (bootstrap > 90%): Lissamphibia, Batrachia (Urodela + Anura), Theria (Placentalia + Marsupialia), and Testudines plus Archosauria. The branches leading to Lepidosauria, Testudines + Archosauria, Testudines, and Batrachia are relatively short despite the strong support for these clades. Relationships within the following clades are largely congruent with previous analyses cited: Squamata (Townsend et al., 2004), Aves (Groth and Barrowclough, 1999), Crocodylia (Gatesy et al., 2003), and Testudines (Krenz et al., 2005; their parsimony tree).
Bayesian Analyses
Model-based analyses require proper model choice. This involved two stages: determining the appropriate number of data partitions and then the best model for each partition. Bayesian and corrected Akaikie information criteria (BIC and AICc; Posada and Buckley, 2004; Lee and Hugall, 2006) were used to evaluate partition strategies, using the MCMC equilibrium average lnL, running MrBayes v3.04b and 3.1.1 (Ronquist and Huelsenbeck, 2003). For the nucleotide analyses, there was significant gain (P ≪ 0.01) by partitioning the data into codon positions (1st, 2nd, 3rd), but further subdivisions are ill-defined and do not result in large or significant gains. Unlinking branch lengths across partitions did not significantly improve model fit. For each codon (partition), hierarchical likelihood-ratio tests and the Akaikie information criterion (implemented in ModelTest; Posada and Crandall, 2001) indicated the following models: codon 1: HLT = GTRig, AIC = TVMig, codon 2: HLT = TRNig, AIC = GTRig; codon 3: HLT and AIC = GTRig. The closest appropriate model available in MrBayes was thus GTRig in all cases (see Lemmon and Moriarty, 2004). Addition of covarion to the model did not increase likelihoods significantly, and did not change topology or alter posterior branch length estimates (the included gamma parameter presumably already approximates it; see Penny et al., 2001), and the results reported here do not use this additional parameter. Thus, the final nucleotide analysis employed a separate GTRig model for each codon, with branch lengths linked.
For the amino acid data, MCMC-based BIC and AIC analyses indicated the optimal model was Jones (as implemented in MrBayes) with rates = invgamma. Runs allowing alternative models (mixed model option) indicated that the marginal probability for the Jones model was ≥ 0.99. Addition of covarion to the model did not result in significant improvement or affect branch length estimates, and the results here are for analyses without this extra parameter. Codon models were not considered as they are computationally intractable for datasets of this size (Shapiro et al., 2006). Preliminary Bayesian MCMC analyses were conducted to ascertain the best run conditions: these used 4 × 1 million step chains (with standard heating T = 0.2), 1/100 sampling, and a 50% burn-in (leaving 5000 sampled trees). These indicated acceptable chain swapping rates, and that MCMC lnL equilibrium and posterior probability (PP) convergence (for both nucleotides and amino acids) were attained by 1 million steps (MCMC diagnostics bipartition frequency standard deviation < 0.02). Therefore, the full analysis employed runs of 4 × 10 million step chains with 1/100 sampling, with the first 20,000 sampled trees discarded as burn-in (leaving 80,000 samples for analysis). These settings allowed accurate posterior estimation of all parameters (in particular topology and branch lengths). The resultant nucelotide data tree with posterior probabilities is shown in Figure 2. Two independent runs returned PP within 0.04 across all nodes where PP > 0.50, with no PP differing in significance at the 0.95 level. The same chain settings were also employed for the amino acid data, and the resultant majority-rule consensus tree (with posteriors) is shown in Figure 3. The MP and Bayesian trees are very similar and support levels high (> 90% BS, > 0.95 PP) for all major nodes discussed below.
Comparison of relative node depths across different data types, determined using Bayesian MCMC trees made ultrametric using PLRS. Branch lengths for amino acids and each of the three codon positions are plotted against branch lengths for linked nucleotides.
Comparison of relative node depths across different data types, determined using Bayesian MCMC trees made ultrametric using PLRS. Branch lengths for amino acids and each of the three codon positions are plotted against branch lengths for linked nucleotides.
Bayesian MCMC consensus trees: (a) nucleotide data, (b) amino acid data, with posterior probabilities for nodes.
Bayesian MCMC consensus trees: (a) nucleotide data, (b) amino acid data, with posterior probabilities for nodes.
The nucleotide and amino acid Bayesian trees were almost identical and contained all the major clades retrieved in the parsimony analyses. In particular, there was strong (PP = 1.00) support for Lissamphibia, Batrachia, Theria, and Testudines + Archosauria. The only major difference from the MP tree involved the grouping of Carettochelys, Lissemys, and Apalone with other cryptodiran turtles—an arrangement more in line with traditional views (see Gaffney et al., 1991; Near et al., 2005). Paraphyly of the two amphisbaenian taxa (Bipes and Rhineura) appears to be the result of lack of signal in this part of the tree (PP 0.53; see Townsend et al., 2004). The position of the elephant is probably resolved correctly in the amino acid tree but misplaced in the nucleotide tree (Afrotheria as first split in extant eutherians: Madsen et al., 2001; Amrine-Madsen et al., 2003b). There was strong agreement across parsimony and Bayesian analyses and nucleotide and amino acid data sets. The long sequences meant most clades were strongly supported in both analyses. Most are well accepted and uncontroversial, and the discussion below focuses on the more contentious clades, which were strongly supported (bootstrap > 70%, posterior > 0.95) in all analyses. Most retrieved nodes are consistent with accepted ideas, and where there have been differences among recent molecular studies, our tree generally recovers the topology seen in the more rigorous analyses.
The monophyly of lissamphibians, and the sister-group relationship of anurans and urodeles (Batrachia), are supported. A recent mtDNA study (Zhang et al., 2005) produced the same results, but previous nuclear studies (San Mauro et al., 2004, 2005) did not include both amniotes and sarcopterygian fish and thus could not test monophyly of lissamphibians. In the mtDNA study, the branch leading to lissamphibians was relatively short; the additional section of RAG-1 sequenced here adds much more support for these clades, and suggests the shortened branch leading to lissamphibians in previous studies might be the result of saturation. However, the basal divergence within lissamphibians (between caecilians and batrachians) is still quite deep, as discussed in the next section.
Within mammals, monotremes are sister to therians (a marsupial-placental clade), as almost universally recognized in recent times. The previous mtDNA and DNA hybridization studies that favored a heterodox monotreme-marsupial clade might have suffered from base composition bias (Phillips and Penny, 2003). There have been few extensive molecular studies aimed at addressing this question; though these have reaffirmed the traditional hypothesis (e.g., Killian et al., 2001; Baker et al., 2004), alternatives are still being entertained (e.g., Musser, 2003). Only very recently has a study of multiple nuclear genes (van Rheede et al., 2006) retrieved Theria. That study used numerous loci but consequently was restricted to sparser taxon sampling of nonmammalian taxa. Our study therefore complements it with more extensive taxon sampling across nonmammalian outgroups. These recent nuclear analyses, together with morphology, and documented biases in the conflicting mtDNA analyses are enough to strongly reaffirm monophyly of therians. A notable feature of this RAG-1 result is the considerable branch length (= time) encompassed by the stem lineages leading to monotremes, to marsupials, and to placentals (see molecular dating analysis below).
The grouping of turtles as a sister clade to archosaurs has very little morphological support (Rieppel, 2000) but is in line with many mitochondrial studies (Zardoya and Meyer, 1998; Janke et al., 2001; Rest et al., 2003). Some nuclear studies (based on protein sequences and sparser taxon sampling) have also found this grouping (e.g., Iwabe et al., 2005). Other nuclear studies, while suggesting archosaur affinities of turtles, have embedded turtles within archosaurs as sister-group to crocodylians (e.g., Hedges and Poling, 1999; Cao et al., 2000). The current study appears to be the first well-sampled nuclear study to retrieve turtles as sister to a monophyletic Archosauria. Turtle-archosaur affinities, as noted by others (e.g., Rieppel, 2000), means that diapsid reptiles (archosaurs and lepidosaurs) are not monophyletic—either temporal fenestration (the diapsid skull) evolved convergently, or turtles have secondarily lost their temporal fenestrae. Considering only extant taxa, both possibilities are equally parsimonious; addition of fossil taxa to this tree are required to investigate this issue. Similarly, the status of proposed fossil ancestors of turtles (e.g., procolophonids, pareiasaurs, sauropterygians) again cannot be resolved without morphological analyses, perhaps in the combined analysis or using the molecular tree as a backbone constraint (see Lee, 2005, and references therein). Finally, however, we add the caveat that this molecular turtle-archosaur clade still needs to be further evaluated for two reasons. First, morphological support for this arrangement continues to be elusive (Rieppel, 2000). Second, there are rate differences between turtles and diapsid reptiles, due to a severe slow down in turtles (see below). This raises the following possibility: even if the true reptile tree contains a basal dichotomy between turtles and other reptiles (diapsids) as traditionally inferred based on morphology, the slow molecular evolutionary rates in turtles might predispose the reptile RAG-1 tree to be rooted within diapsids. However, turtles do not have such comparatively slow rates in mitogenome studies (Rest et al., 2003; Janke et al., 2001), yet those studies also place them with archosaurs. Hence, in the following discussion, a turtle-archosaur clade will be provisionally accepted. Within archosaurs, relationships are consistent with other nuclear studies (e.g., Gatesy et al., 2003; Groth and Barrowclough, 1999); notably, the nuclear data support the traditional view that paleognaths are the basal bird lineage, and the conflicting signal previously found in mtDNA data has been shown to be ambiguous (Braun and Kimball, 2002).
A surprising result is the relatively great distance between Sphenodon and squamates, and the resultant short length and low support for the branch leading to Lepidosauria. Lepidosaurs are almost universally accepted as strongly corroborated, yet the results here suggest that Sphenodon diverged from squamates only relatively shortly after archosaurs did; i.e., squamates, Sphenodon, and archosaurs almost form a trichotomy. This pattern could explain why an earlier study (Hedges and Poling, 1999) that used sparser taxon sampling (but numerous loci) failed to corroborate Lepidosauria and inferred squamates as the most basal reptiles. However, better-sampled analyses of mtDNA (Rest et al., 2003) corroborate a monophyletic Lepidosauria. The RAG-1 study of Townsend et al. (2004) assumed a monophyletic Lepidosauria and rooted them with taxa belonging to a single putative outgroup clade (archosaurs plus turtles). The present study is thus the first nuclear study based on sufficient taxon sampling across tetrapods to demonstrate lepidosaurian monophyly. The short branch leading to Lepidosauria is also consistent with the scarcity of stem lepidosaurs in the fossil record; the most recent review only recognizes kuehneosaurs and Marmoretta (Evans, 2003); this contrasts, for instance, with the long branches and consequent wealth of stem fossil taxa leading to archosaurs and to mammals.
Relationships within turtles, crocodylians, birds, and squamates are largely congruent with previous analyses mentioned above and will not be discussed in detail here. However, the dating of divergences within those clades is discussed below.
Molecular Divergence Dating: Methods and Results
The dense taxon sampling and long nuclear sequences in this analysis allow, for the first time, nuclear estimates of divergences in many amniote clades simultaneously, using multiple calibration points dispersed across the tree. Previous studies across amniotes employed mtDNA (e.g., Janke et al., 2001; Rest et al., 2003; Pereira and Baker, 2006), which may well suffer saturation at the timescales of interest (see below). Nuclear studies have focused on individual clades (e.g., mammals: Springer et al., 2003; turtles: Near et al., 2004; birds: Ericson et al., 2006; squamates: Vidal and Hedges, 2005) using only internal calibrations—other amniote groups were not surveyed sufficiently to enable retrieved dates to be compared with those obtained using external calibrations. Such comparisons would be worthwhile as the poor fossil record in many groups (e.g., birds and squamates) renders many internal calibrations contentious (e.g., van Tuinen and Dyke, 2004), and there remain large discrepancies in estimated dates across different studies.
Molecular clock dating is more reliable when there is little rate heterogeneity across lineages (e.g., Sanderson, 2002; Ho et al., 2005). However, molecular phylogenies typically exhibit uneven root-to-tip path lengths, implying significant apparent rate variation. As can be seen in Figures 2 and 3, the Bayesian trees show considerable variation in path length. Each data partition (1st, 2nd, 3rd, combined, and amino acid) showed significant apparent rate variation (P < 0.01) across lineages according to the likelihood ratio test (Felsenstein, 1981). For both nucleotides and amino acids (Fig. 2 and Fig. 3), turtle and crocodile paths are relatively short and squamates long. For nucleotides, mammals and batrachians are also relatively long, but this is less apparent for amino acids. This heterogeneity must be adequately accommodated to reconstruct the underlying ultrametric chronogram. We therefore employed penalized likelihood rate smoothing (PLRS), as implemented in r8s (versions 1.6 and 1.7; Sanderson, 2002), with the TN algorithm and optimal smoothing factors determined by cross-validation. Smoothing used the additive function but the log function produced similar results (not shown). Trees from both nucleotide and amino acid MCMC analyses were used along with a range of potential fossil-based calibration points spread across the major amniote lineages (see next section). In estimating divergence times, major sources of uncertainty are (1) sampling and stochastic error in estimation of branch lengths, (2) saturation, (3) calibration error, and (4) variation across lineages in rates of molecular evolution. We discuss each in turn, focusing especially on issues with recent methods aimed at addressing 3 and 4.
Confidence Intervals
MCMC variance in branch lengths for each data partition is minor, indicating that variation due to sampling and model parameterization is minor for the long sequences employed here. A measure of this was obtained from variation in estimated ages across 200 sampled trees (drawn one every 20,000 steps) from the post-burn-in MCMC analyses, and results are included in Table 3. These analyses employed PLRS and only a single (amniote = 315 Mya) calibration; variation with additional calibrations enforced (constraining more nodes) is necessarily lower. For nucleotides, the 95% confidence interval (CI) of age was on average 13% of the mean divergence age estimate; for all divergences it was < 20%, except for caiman-alligator (31%). As there are fewer variable sites, variation in the amino acid MCMC analysis is a little higher, with the 95% confidence interval (CI) of average age 21% of the mean.
Sampling across MCMC trees includes variation due to uncertainties in model parameter values, topology, and branch lengths but does not incorporate uncertainty due to the smoothing function process itself: even if we know without error the true tree and number of substitutions on each branch, there would be uncertainties in how to stretch the branches to make it ultrametric. Sensitivity to smoothing factor was assessed by measuring the range of ages seen across the 95% CI of smoothing factors, which was based on the r8s cross-validation chi-squared error with d.f. = (number of taxa − 2). For both amino acid and nucleotides, the range of ages obtained across all plausible smoothing factors is < 10% of the mean except within crocodylians (15%). Ideally, it would be desirable to incorporate uncertainty in just how to model rate variation into final estimates of error bars. Data sets that are highly rate-variable generally would be more sensitive to the rate smoothing method adopted, resulting in greater uncertainty in dating. However, because variation here due to smoothing factor is much lower than the across MCMC sample variation, and the two sources of errors are not simply multiplicative, we use the latter as it is common practice (Sanderson, 2002).
Saturation
To explore effects due to saturation across codon positions and amino acids, we compared the divergence age estimates from MCMC analyses of amino acids, linked nucleotide codon positions, and each codon position. Figure 3 plots PLRS divergence ages for all compatible nodes against estimates from the linked nucleotide analysis. This used the single and most basal available calibration (amniote = 315 Mya) thus showing the result of the RAG-1 with rate smoothing, unconfounded by imposing age constraints on multiple nodes. The linear relationship of node age across data types suggests saturation in 3rd positions (and amino acids) is not compressing the deeper divergences relative to shallower ones (Fig. 3). For this reason, all codon positions linked were used to estimate branch lengths from the nucleotide data. In contrast, mtDNA divergences typically saturate at these timescales (e.g., Gatesy et al., 2003; Penny and Phillips, 2003). This (and the topology results below) indicates that for these divergence times, nuclear gene sequences are more appropriate than mtDNA sequences because of fewer multiple hits.
Rate Variation
Apparent substitution rates across various branches were first calculated applying the basal amniote = 315 Mya calibration. Then, these rates were reestimated with the addition of archosaur and caiman calibrations, which are the calibrations having the highest effect in changing tree proportions and, thus, inferred rates (see next section). We assessed how inferred rates changed for different data partitions: 1st, 2nd, and 3rd positions, linked 1st plus 2nd positions, all positions linked (Fig. 2a) and amino acids (Fig. 2b). Estimated rate varies among these groups by about a factor of 2.5 for 1st positions, 3.2 for 2nd, 3.5 for 3rd, 3.2 for all nucleotides combined (linked), and 2.9 for the amino acids. Plotting amino acid versus nucleotide evolutionary rates (Fig. 4) reveals that turtles and crocodiles appear slow, and by comparison, snakes fast. However, it also indicates two trends, with mammals and batrachians (urodeles and anurans) having a slower amino acid rate relative to nucleotide rate, compared to all other taxa. Base composition analysis indicates considerable heterogeneity at 3rd positions. Across all terminal taxa, the chi-squared test (in PAUP) is insignificant (> 0.05) for 1st and 2nd codons but highly significant (≪0.01) for 3rd positions. Across the major clades plotted (Fig. 4), there is 5 to 15 times more base composition variation in 3rd positions than in 1st or 2nd positions, most notably due to mammals and batrachians being less AT rich (40% versus > 50%). Therefore, some of the apparent rate variation might be due to base composition nonstationarity causing model fit error, rather than actual substitution rate differences. For example, root-to-tip path lengths for mammals and batrachians are relatively long for nucleotides (especially 3rd positions), but not so for amino acids; this higher apparent nucleotide rate might in part be driven by asymmetrical (rather than increased) substitution rates. By comparison, short paths within turtles for both amino acids and nucleotides suggests genuinely slow substitution rates.
Comparison of nucleotide and amino acid substitution rates across selected phylogenetic groups, as estimated by PLRS. Diamonds are from using the single amniote = 315 Mya calibration. Effect of additional calibrations on inferred substitution rates: squares indicate effects of adding bird-croc = 245 Mya; triangles of further adding caiman-alligator = 68 Mya. The last additional calibration has little effect on rate estimates for birds or turtles.
Comparison of nucleotide and amino acid substitution rates across selected phylogenetic groups, as estimated by PLRS. Diamonds are from using the single amniote = 315 Mya calibration. Effect of additional calibrations on inferred substitution rates: squares indicate effects of adding bird-croc = 245 Mya; triangles of further adding caiman-alligator = 68 Mya. The last additional calibration has little effect on rate estimates for birds or turtles.
Calibration Consistency versus Rate Smoothing Error
Calibration choice is one of the most important factors influencing divergence date estimates but is often barely discussed in molecular clock studies. The following 12 proposed calibration points were considered (described as a split between taxa; also labeled in Table 2 and Table 3, and Figure 5): (1) bird-mammal = 315 Mya (Amniote; see Reisz and Muller, 2004); (2) elephant-pig (extant eutherians) = 98 Mya (see Benton and Donoghue, 2007); (3) Lissemys-Apalone = 100 Mya (trionychoid turtles: Near et al., 2005); (4) scincomorphs-anguimorphs = 168 Mya (Evans, 2004; see below); (5) Heloderma-Elgaria = 99 Mya (Wiens et al., 2006); (6) pleurodire-cryptodire turtles = 210 Mya (turtles: Near et al., 2005): (7) penguin-crane = 62 Mya (Slack et al., 2006); (8) Lepidosaur-Archosaur = 255 Mya (reptile: Reisz and Muller, 2004); (9) bird-crocodile 245 Mya (archosaur: Muller and Reisz; 2005); (10) alligator-caiman = 68 Mya (Alligatorines: Muller and Reisz, 2005); (11) Sus-Llama = 65 Mya (Cetartiodactylans: Springer et al., 2003); and (12) Varanus-Shinisaurus∼ 85 mya. Most of these calibrations have been widely used before. The amniote calibration is perhaps the single most commonly used calibration point for tetrapods (Graur and Martin, 2004), allowing direct comparison with those previous studies, and the best positioned in our tree for methodological reasons, as it is closest to the root (Sanderson, 2002). Increasing the age of this calibration point from 315 to 330 Mya (Reisz and Muller, 2004) has very little effect, with all nodes changing by less than 5% (results not shown). The split between the scincomorph lineage and anguimorph lineage occurred at least ∼ 168 Mya; this is based on unequivocal (paramacellodids) and likely (Saurillodon) scincomorphs of that age (the anguimorph nature of the contemporaneous Parviraptor has recently been questioned: Evans and Wang, 2005). The likely inclusion of lacertoids, amphisbaenians and iguanians on the anguimorph lineage (e.g., Townsend et al., 2004) does not change this calibration age as none of these occur earlier in the fossil record (see below for discussion of proposed early acrodont iguanians). The Sus-Llama calibration (Springer et al., 2003) is investigated but not implemented, as it is an extrapolated rather than direct fossil age (Gatesy and O'Leary, 2001). Similarly, the Varanus-Shinisaurus may be dated at ∼ 85 Mya as terrestrial lizards undoubtedly closely related to living varanids appear at that time (Molnar, 2000). The aquatic mosasauroids are older but their affinities with varanids are debated (e.g., Caldwell, 1999); thus, this calibration point is also investigated but not implemented.
Chronogram for tetrapods based on nucleotide data. This tree was based on the consensus tree in Figure 2a, rate-smoothed using PLRS with five calibrations (indicated by circled ages on nodes). Node labels refer to splits in Table 3. Time scale below is in millions of years before present, above in geological eras. Confidence intervals are indicated in Table 3 to simplify the figure.
Chronogram for tetrapods based on nucleotide data. This tree was based on the consensus tree in Figure 2a, rate-smoothed using PLRS with five calibrations (indicated by circled ages on nodes). Node labels refer to splits in Table 3. Time scale below is in millions of years before present, above in geological eras. Confidence intervals are indicated in Table 3 to simplify the figure.
Cross-validation deviance of candidate calibrations, based on an ultrametric tree derived by PLRS using the amniote = 315 Mya calibration. This tree was rescaled to each candidate calibration in turn. Σ deviance is sum of absolute value of differences between estimated and proposed fossil (left) dates for the other candidate calibrations. Σ %dev is from deviance expressed as a proportion of proposed date. Node labels refer to Figure 5 and Table 3.
| Nucleotides | Amino acids | ||||
|---|---|---|---|---|---|
| Age | ∑deviance | ∑%dev | ∑deviance | ∑%dev | |
| 168 | Scincomorph-Anguimorph | 218 | 1.3 | 202 | 1.2 |
| 65 | Cetartiodactylans | 218 | 3.4 | 198 | 3.1 |
| 98 | Eutherians | 222 | 2.3 | 196 | 2.0 |
| 85 | Varanus-Shinisaurus | 230 | 2.7 | 257 | 3.0 |
| 100 | Trionychoids | 242 | 2.4 | 269 | 2.7 |
| 315 | Amniotes | 243 | 0.8 | 197 | 0.6 |
| 210 | Turtles | 264 | 1.3 | 199 | 0.9 |
| 255 | Reptiles | 291 | 1.1 | 253 | 1.0 |
| 245 | Archosaurs | 316 | 1.3 | 432 | 1.8 |
| 62 | Penguin-Crane | 419 | 6.8 | 463 | 7.5 |
| 99 | Heloderma-Elgaria | 630 | 6.4 | 556 | 5.6 |
| 68 | Alligatorines | 4642 | 68.3 | 2023 | 29.8 |
| Nucleotides | Amino acids | ||||
|---|---|---|---|---|---|
| Age | ∑deviance | ∑%dev | ∑deviance | ∑%dev | |
| 168 | Scincomorph-Anguimorph | 218 | 1.3 | 202 | 1.2 |
| 65 | Cetartiodactylans | 218 | 3.4 | 198 | 3.1 |
| 98 | Eutherians | 222 | 2.3 | 196 | 2.0 |
| 85 | Varanus-Shinisaurus | 230 | 2.7 | 257 | 3.0 |
| 100 | Trionychoids | 242 | 2.4 | 269 | 2.7 |
| 315 | Amniotes | 243 | 0.8 | 197 | 0.6 |
| 210 | Turtles | 264 | 1.3 | 199 | 0.9 |
| 255 | Reptiles | 291 | 1.1 | 253 | 1.0 |
| 245 | Archosaurs | 316 | 1.3 | 432 | 1.8 |
| 62 | Penguin-Crane | 419 | 6.8 | 463 | 7.5 |
| 99 | Heloderma-Elgaria | 630 | 6.4 | 556 | 5.6 |
| 68 | Alligatorines | 4642 | 68.3 | 2023 | 29.8 |
All candidate calibrations, when employed, were used as absolute constraints for two reasons. In principle, if all calibrations are treated as minimum ages, the analysis tends to stretch all branches to be consistent with the calibration that implies greatest tree depth, pushing all the other calibrations earlier as a result. In effect, the analysis can become driven by a single, erroneously early calibration, causing a systematic overestimation of ages (see discussion). Further, even if all calibrations are reasonably accurate, a methodological artefact (“model overfitting”) of PLRS means shallow calibrations often overestimate ages of deep nodes: constraining deeper nodes to be only minimum ages allows this error, whereas fixing deeper nodes would prevent it (Sanderson, 2002; Welch and Bromham, 2005; Welch et al., 2005; Ho et al., 2005; Porter et al., 2005; Yang and Rannala, 2006).
Near and Sanderson (2004) proposed a rational approach to choosing reliable calibrations. Each calibration point is employed in turn and the tree rate-smoothed; the ages of other putative calibration nodes are estimated using the resultant ultrametric tree and then compared with the actual dates. The calibration points that best reconstruct all others are retained, resulting in an internally consistent set. Using this approach with typical (rate-variable) data assumes that the rate-smoothing methods adequately reconstruct the underlying ultrametric tree. Because both rate smoothing and calibration contain errors, where a calibration point fails to reconstruct (is inconsistent with) all others, this could either reflect a genuine problem with that calibration point or a problem with the data and/or rate smoothing method. These issues are exemplified in the current data set.
One problem with the cross-validation approach is the “model overfitting” bias in PLRS, which tends to artificially inflate ages for nodes below the deepest calibration point employed. Therefore, although using only deep calibrations may accurately reconstruct shallower candidate calibrations, using only shallower calibrations often greatly overestimate the dates of deeper candidate calibrations. This artefact will artificially increase the deviance (defined below) of shallow calibrations. This was the case here, under both available rate-smoothing algorithms in r8s (the default additive function, and the alternative log rate function intended to ameliorate this artefact). The deepest (amniote = 315 Mya) calibration provided a stable solution and plausible ages for shallower candidate calibrations, but the shallower calibrations gave unstable solutions and implausible ages for the deeper calibrations. This problem can be partly rectified by using the same ultrametric tree (generated by smoothing from a basal node) when evaluating each calibration point. Although this approach removes the consistent bias inflating the deviance of shallow calibrations, the adopted ultrametric tree could still contain erroneous branch lengths in regions where rate-smoothing has performed poorly, artificially inflating the deviance of accurate calibrations in those regions (as discussed below).
We evaluated the deviance of each candidate calibration using a common ultrametric tree, created by smoothing across the amniote node (the most basal candidate calibration). This tree was rescaled to each candidate calibration and used to estimate the ages of the other candidate calibration nodes. Deviance was measured as the absolute value of the difference between the estimated and actual dates. For any one calibration, the sum of the deviances across all the other calibration nodes indicates how consistent that calibration is with the remainder: the smaller the value the more consistent (see Table 2; deviance was also calculated as a proportion of the proposed date; c.f. cross-validation procedure of Near and Sanderson, 2004). These analyses were conducted on all three codon positions (linked and unlinked), 1st + 2nd positions only (linked), and amino acids. Results for the linked nucleotides, and for amino acids, are shown; other nucleotide analyses gave similar results to the former.
Of the 12 candidate calibrations (see Tables 2 and 3) the most consistent (for nucleotide data) are scincomorph-anguimorph, eutherians, trionychoids, and amniotes (Table 2). The cetartiodactylan and Varanus-Shinisaurus calibrations also performed well, suggesting they should be investigated further. However, for amino acid data, the trionychoid and Varanus-Shinisaurus calibrations were less consistent. All three archosaur calibrations (penguin-crane, bird-crocodile, and alligator-caiman) are aberrant, especially the last. The conflict between the molecular and the fossil dates for these calibration points suggests that at least one line of evidence is misleading. Importantly, the three “inconsistent” archosaur calibrations are not readily dismissed. All three are in agreement with each other in suggesting deeper divergences within archosaurs than predicted by the “consistent” calibrations (see Table 3), and are based on strong fossil evidence. The bird-croc divergence is documented by numerous well-preserved stem-crocodylians (phytosaurs, rauisuchians, sphenosuchians, aetosaurs) and stem-birds (ornithischian and saurischian dinosaurs) around 230 to 225 Mya (e.g., Brochu, 2001). However, earlier examples are scarce, consisting of only the stem-bird Marasuchus (= Lagosuchus; Ladinian 230–235; Sereno and Arcucci, 1994). Other earlier crown archosaurs (Lewisuchus, Turfanosaurus, Arizonasaurus: 241–235 Mya) have been questioned (Wu and Russell, 2001; Gower and Nesbitt, 2006). A reasonable interpretation of the stratigraphic record would be that the bird-crocodile divergence occurred around 245 to 240 Mya, with an increase in diversity in both subclades occurring around 230 Mya. The split is unlikely to be much older than 245 million years given that the fossil record of archosaurs is good before this time, but all known examples lie outside the crocodile-bird clade. Given the wealth of well-supported crown archosaurs around 230 Mya, it is almost impossible that the bird-crocodile split could have occurred any later than this. Similarly, crocodylians have been the focus of extensive phylogenetic studies, which have identified well-preserved stem-alligatorines that are 68 My old (Reisz and Muller, 2004). The recent molecular evidence that gavials are nested deeply within crocodylines does not challenge the identity of these early fossils as stem-alligatorines (see Gatesy et al., 2003: fig. 3). Similarly, the minimum age of the penguin-crane divergence is well corroborated by complete fossil penguins with a suite of synapomorphies indicating relationship to that morphologically distinctive clade (Slack et al., 2006).
Median divergence dates within tetrapods based on penalized likelihood rate smoothing of 200 MCMC samples of the nucleotide and the amino acid Bayesian analyses. These dates are essentially the same (all within 5%) as dates obtained using the consensus trees (see Fig. 5). Results using four different sets of calibrations with lungfish-tetrapod root maximum constraint of 450 Mya (see text). The 95% confidence intervals enforce only the amniote = 315 Mya calibration. Calibration age (if applicable to node) shown at extreme left, with enforced calibrations indicated by shading. Node labels at left refer to Figure 5.
| Nucleotides | Amino acids | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Calibration age | Node | 1cal | 4cal | 2cal | 5cal | 95% CI | 1cal | 4cal | 2cal | 5cal | 95% CI |
| Tetrapods | 353 | 353 | 353 | 353 | ±12 | 354 | 354 | 354 | 354 | ±16 | |
| Lissamphibians | 323 | 323 | 322 | 322 | ±19 | 292 | 292 | 292 | 292 | ±28 | |
| Batrachians | 274 | 274 | 274 | 274 | ±21 | 267 | 267 | 266 | 266 | ±29 | |
| Caecilians | 115 | 115 | 115 | 115 | ±16 | 138 | 138 | 138 | 138 | ±32 | |
| Anurans | 167 | 167 | 167 | 166 | ±16 | 150 | 150 | 150 | 150 | ±26 | |
| Urodeles | 129 | 129 | 129 | 129 | ±15 | 105 | 105 | 105 | 105 | ±20 | |
| Therians | 175 | 182 | 175 | 182 | ±13 | 197 | 195 | 197 | 195 | ±22 | |
| Mammals | 201 | 207 | 201 | 207 | ±14 | 227 | 227 | 228 | 227 | ±24 | |
| Rodents | 25 | 27 | 25 | 27 | ±4 | 21 | 21 | 21 | 21 | ±6 | |
| Monotremes | 36 | 37 | 36 | 37 | ±6 | 48 | 48 | 48 | 48 | ±15 | |
| Marsupials | 64 | 66 | 64 | 66 | ±7 | 73 | 73 | 73 | 73 | ±14 | |
| Australidelphia | 56 | 58 | 56 | 58 | ±7 | 68 | 68 | 68 | 68 | ±16 | |
| Lepidosaurs | 250 | 257 | 265 | 268 | ±12 | 261 | 264 | 274 | 275 | ±17 | |
| Squamates | 171 | 182 | 181 | 190 | ±14 | 184 | 194 | 194 | 201 | ±19 | |
| Iguanians | 124 | 138 | 132 | 142 | ±13 | 134 | 144 | 141 | 148 | ±21 | |
| Iguanidae | 69 | 76 | 73 | 79 | ±10 | 80 | 86 | 84 | 89 | ±16 | |
| Acrodonts | 71 | 78 | 75 | 80 | ±10 | 86 | 92 | 91 | 94 | ±17 | |
| Austral agamids | 22 | 24 | 23 | 25 | ±5 | 27 | 29 | 28 | 30 | ±10 | |
| Chamaeleons | 30 | 34 | 32 | 35 | ±5 | 38 | 40 | 40 | 42 | ±9 | |
| Anguimorphs | 96 | 120 | 103 | 122 | ±15 | 102 | 118 | 109 | 120 | ±17 | |
| Toxicofera | 139 | 154 | 148 | 158 | ±13 | 146 | 155 | 153 | 160 | ±18 | |
| Snakes | 96 | 106 | 102 | 109 | ±10 | 98 | 105 | 102 | 108 | ±15 | |
| Alethinophidia | 44 | 49 | 47 | 50 | ±6 | 56 | 60 | 59 | 62 | ±13 | |
| Teiiods | 64 | 70 | 68 | 72 | ±8 | 64 | 68 | 67 | 70 | ±14 | |
| Scincomorpha | 144 | 157 | 153 | 162 | ±13 | 153 | 160 | 160 | 166 | ±20 | |
| Skinks | 97 | 105 | 102 | 108 | ±10 | 99 | 106 | 105 | 109 | ±18 | |
| Lygosomines | 72 | 78 | 77 | 81 | ±9 | 75 | 80 | 79 | 82 | ±17 | |
| Gekkotans | 86 | 91 | 91 | 94 | ±10 | 102 | 107 | 108 | 111 | ±18 | |
| Diplodactylids | 49 | 52 | 52 | 54 | ±7 | 63 | 66 | 67 | 68 | ±15 | |
| Turtle-Archosaur | 232 | 237 | 265 | 265 | ±13 | 247 | 250 | 273 | 273 | ±18 | |
| Pleurodires | 114 | 121 | 136 | 137 | ±22 | 160 | 163 | 183 | 182 | ±38 | |
| Birds | 91 | 93 | 115 | 115 | ±9 | 104 | 105 | 135 | 136 | ±19 | |
| Neoaves | 53 | 54 | 67 | 67 | ±9 | 62 | 63 | 81 | 81 | ±17 | |
| Neognaths | 70 | 71 | 87 | 87 | ±9 | 70 | 71 | 92 | 92 | ±17 | |
| Galloanseriformes | 52 | 53 | 65 | 65 | ±7 | 53 | 54 | 68 | 68 | ±15 | |
| Crocodylians | 33 | 34 | 42 | 42 | ±6 | 55 | 57 | 78 | 78 | ±21 | |
| 68 | Alligatorines | 17 | 17 | 21 | 21 | ±5 | 32 | 32 | 46 | 46 | ±17 |
| 255 | Reptiles | 268 | 273 | 284 | 285 | ±11 | 275 | 278 | 288 | 289 | ±13 |
| 210 | Turtles | 178 | 185 | 208 | 207 | ±14 | 203 | 208 | 233 | 231 | ±28 |
| 168 | Scincomorph-Anguimorph | 156 | 170 | 166 | 176 | ±14 | 166 | 176 | 175 | 182 | ±19 |
| 62 | penguin-crane | 48 | 49 | 61 | 61 | ±10 | 50 | 51 | 64 | 64 | ±17 |
| 65 | Cetartiodactylans | 58 | 65 | 58 | 65 | ±11 | 65 | 64 | 65 | 64 | ±17 |
| 85 | Varanus-Shinisaurus | 86 | 107 | 91 | 109 | ±14 | 97 | 111 | 103 | 113 | ±20 |
| 245 | Archosaurs | 196 | 201 | 245 | 245 | ±15 | 192 | 194 | 245 | 245 | ±24 |
| 98 | Eutherians | 87 | 98 | 87 | 98 | ±9 | 100 | 98 | 100 | 98 | ±20 |
| 99 | Heloderma-Elgaria | 70 | 99 | 74 | 99 | ±12 | 77 | 99 | 81 | 99 | ±19 |
| 100 | Trionychoids | 91 | 100 | 109 | 100 | ±17 | 94 | 100 | 108 | 100 | ±29 |
| 315 | Amniotes | 315 | 315 | 315 | 315 | na | 315 | 315 | 315 | 315 | na |
| Nucleotides | Amino acids | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Calibration age | Node | 1cal | 4cal | 2cal | 5cal | 95% CI | 1cal | 4cal | 2cal | 5cal | 95% CI |
| Tetrapods | 353 | 353 | 353 | 353 | ±12 | 354 | 354 | 354 | 354 | ±16 | |
| Lissamphibians | 323 | 323 | 322 | 322 | ±19 | 292 | 292 | 292 | 292 | ±28 | |
| Batrachians | 274 | 274 | 274 | 274 | ±21 | 267 | 267 | 266 | 266 | ±29 | |
| Caecilians | 115 | 115 | 115 | 115 | ±16 | 138 | 138 | 138 | 138 | ±32 | |
| Anurans | 167 | 167 | 167 | 166 | ±16 | 150 | 150 | 150 | 150 | ±26 | |
| Urodeles | 129 | 129 | 129 | 129 | ±15 | 105 | 105 | 105 | 105 | ±20 | |
| Therians | 175 | 182 | 175 | 182 | ±13 | 197 | 195 | 197 | 195 | ±22 | |
| Mammals | 201 | 207 | 201 | 207 | ±14 | 227 | 227 | 228 | 227 | ±24 | |
| Rodents | 25 | 27 | 25 | 27 | ±4 | 21 | 21 | 21 | 21 | ±6 | |
| Monotremes | 36 | 37 | 36 | 37 | ±6 | 48 | 48 | 48 | 48 | ±15 | |
| Marsupials | 64 | 66 | 64 | 66 | ±7 | 73 | 73 | 73 | 73 | ±14 | |
| Australidelphia | 56 | 58 | 56 | 58 | ±7 | 68 | 68 | 68 | 68 | ±16 | |
| Lepidosaurs | 250 | 257 | 265 | 268 | ±12 | 261 | 264 | 274 | 275 | ±17 | |
| Squamates | 171 | 182 | 181 | 190 | ±14 | 184 | 194 | 194 | 201 | ±19 | |
| Iguanians | 124 | 138 | 132 | 142 | ±13 | 134 | 144 | 141 | 148 | ±21 | |
| Iguanidae | 69 | 76 | 73 | 79 | ±10 | 80 | 86 | 84 | 89 | ±16 | |
| Acrodonts | 71 | 78 | 75 | 80 | ±10 | 86 | 92 | 91 | 94 | ±17 | |
| Austral agamids | 22 | 24 | 23 | 25 | ±5 | 27 | 29 | 28 | 30 | ±10 | |
| Chamaeleons | 30 | 34 | 32 | 35 | ±5 | 38 | 40 | 40 | 42 | ±9 | |
| Anguimorphs | 96 | 120 | 103 | 122 | ±15 | 102 | 118 | 109 | 120 | ±17 | |
| Toxicofera | 139 | 154 | 148 | 158 | ±13 | 146 | 155 | 153 | 160 | ±18 | |
| Snakes | 96 | 106 | 102 | 109 | ±10 | 98 | 105 | 102 | 108 | ±15 | |
| Alethinophidia | 44 | 49 | 47 | 50 | ±6 | 56 | 60 | 59 | 62 | ±13 | |
| Teiiods | 64 | 70 | 68 | 72 | ±8 | 64 | 68 | 67 | 70 | ±14 | |
| Scincomorpha | 144 | 157 | 153 | 162 | ±13 | 153 | 160 | 160 | 166 | ±20 | |
| Skinks | 97 | 105 | 102 | 108 | ±10 | 99 | 106 | 105 | 109 | ±18 | |
| Lygosomines | 72 | 78 | 77 | 81 | ±9 | 75 | 80 | 79 | 82 | ±17 | |
| Gekkotans | 86 | 91 | 91 | 94 | ±10 | 102 | 107 | 108 | 111 | ±18 | |
| Diplodactylids | 49 | 52 | 52 | 54 | ±7 | 63 | 66 | 67 | 68 | ±15 | |
| Turtle-Archosaur | 232 | 237 | 265 | 265 | ±13 | 247 | 250 | 273 | 273 | ±18 | |
| Pleurodires | 114 | 121 | 136 | 137 | ±22 | 160 | 163 | 183 | 182 | ±38 | |
| Birds | 91 | 93 | 115 | 115 | ±9 | 104 | 105 | 135 | 136 | ±19 | |
| Neoaves | 53 | 54 | 67 | 67 | ±9 | 62 | 63 | 81 | 81 | ±17 | |
| Neognaths | 70 | 71 | 87 | 87 | ±9 | 70 | 71 | 92 | 92 | ±17 | |
| Galloanseriformes | 52 | 53 | 65 | 65 | ±7 | 53 | 54 | 68 | 68 | ±15 | |
| Crocodylians | 33 | 34 | 42 | 42 | ±6 | 55 | 57 | 78 | 78 | ±21 | |
| 68 | Alligatorines | 17 | 17 | 21 | 21 | ±5 | 32 | 32 | 46 | 46 | ±17 |
| 255 | Reptiles | 268 | 273 | 284 | 285 | ±11 | 275 | 278 | 288 | 289 | ±13 |
| 210 | Turtles | 178 | 185 | 208 | 207 | ±14 | 203 | 208 | 233 | 231 | ±28 |
| 168 | Scincomorph-Anguimorph | 156 | 170 | 166 | 176 | ±14 | 166 | 176 | 175 | 182 | ±19 |
| 62 | penguin-crane | 48 | 49 | 61 | 61 | ±10 | 50 | 51 | 64 | 64 | ±17 |
| 65 | Cetartiodactylans | 58 | 65 | 58 | 65 | ±11 | 65 | 64 | 65 | 64 | ±17 |
| 85 | Varanus-Shinisaurus | 86 | 107 | 91 | 109 | ±14 | 97 | 111 | 103 | 113 | ±20 |
| 245 | Archosaurs | 196 | 201 | 245 | 245 | ±15 | 192 | 194 | 245 | 245 | ±24 |
| 98 | Eutherians | 87 | 98 | 87 | 98 | ±9 | 100 | 98 | 100 | 98 | ±20 |
| 99 | Heloderma-Elgaria | 70 | 99 | 74 | 99 | ±12 | 77 | 99 | 81 | 99 | ±19 |
| 100 | Trionychoids | 91 | 100 | 109 | 100 | ±17 | 94 | 100 | 108 | 100 | ±29 |
| 315 | Amniotes | 315 | 315 | 315 | 315 | na | 315 | 315 | 315 | 315 | na |
Three fossil calibrations suggesting anomalously early divergences involving archosaurs therefore are both robust and concordant, suggesting that molecular divergence dates for the group are in error. Given the long sequences and robust tree and branch length estimates, the most likely cause for such an anomaly might be the PLRS transformation that generates the ultrametric, rate-smoothed tree. It might fail to recognize the full extent of the rate slowdown in the archosaur lineage, because (1) the methodology will intrinsically have difficulty with slow lineages as they yield fewer changes (in absolute terms) on which to evaluate rate smoothing models, and (2) there are long unbroken branches and therefore few nodes to help reveal where to distribute the rate change along different portions of these branches (see Welch and Bromham, 2005; Welch et al., 2005; Drummond et al., 2006, for discussion of effects of assuming autocorrelated rates).
If we provisionally assume that the fossil dates for archosaurs are correct and that the molecular dates are overly recent, it follows that at least some (“inconsistent”) fossil calibrations within archosaurs need to be employed. If we add the bird-croc = 245 Mya calibration to the amniote = 315 Mya calibration, this lengthens the branches around the archosaur region of the tree. This in turn drives down the inferred molecular rates for crocodiles and birds (considerably) and turtles (slightly), both for nucleotides and amino acids (see Fig. 4). Adding the alligator-caiman = 68 Mya calibration drives crocodile rates even lower, to be the slowest of all groups (Fig. 4). If we commence with the amniote = 315 Mya calibration alone and successively add nonarchosaur calibrations, the dates for all archosaur divergences are pushed deeper (though not as far as the fossil dates). However, divergences across the rest of the tree do not increase as much (Table 3). This pattern suggests that there is some systematic bias (e.g., a major slow down) in archosaurs that is being increasingly retrieved as more branch lengths outside archosaurs are correctly specified (by adding calibrations), though it takes calibrations actually within archosaurs to fully correct for this. Finally, it is notable that adding the archosaur = 245 Mya calibration alone to the initial amniote = 315 Mya calibration substantially improves the fit across all the other candidate calibration nodes (especially those outside the core set of consistent nodes); indeed, this is the single best addition. Adding only already consistent calibrations (by definition) cannot substantially alter branch length proportions and thus cannot correct for errors in PLRS, and also cannot greatly improve fit across other candidate calibrations. Finally, although adding more calibrations can potentially improve accuracy, it also increasingly constrains more regions in the tree to closely mirror fossil dates. There is a trade-off between multiple calibrations and letting the molecular data speak for itself.
The Heloderma-Elgaria calibration is older than dates implied by most other calibrations, but without numerous adjacent calibrations it cannot be ascertained whether the calibration is an overestimate, or the reconstructed ages underestimates. We retain it here for two reasons: (1) it is commonly used, and (2) it makes the present results (young squamate divergences) conservative as excluding this calibration would reduce dates even further.
For these reasons, we focus on analyses using five calibrations spread across the major amniote lineages: bird-mammal (amniote) = 315 Mya; extant eutherians (elephant-pig) = 98 Mya; Heloderma-Elgaria = 99 Mya; bird-croc (archosaur) = 245 Mya; Apalone-Lissemys (trionychoid) = 100 Mya. All calibrations were treated as fixed rather than minima for reasons discussed above. We also enforce a maximum age constraint of 450 Mya for the root (lungfish-tetrapod divergence), to ameliorate the “model overfitting” artefact discussed above. The chronogram discussed here (Figure 5) is based on the nucleotide rate-smoothed tree. For major nodes, the median and 95% confidence intervals are provided in Table 3, including several sets of calibrations for both the nucleotide and amino acid data. For the basis of discussion we present ages as bracketed by nucleotide and amino acid estimates. It also gives reasonable estimates for the rest of the tree, including most of the other candidate calibration nodes (compare calibration date with single amniote calibration columns in Table 3).
General Discussion
The above analyses provide the first robust nuclear molecular clock divergence date estimates for all major tetrapod groups simultaneously. It thus complements previous studies of particular amniote clades, which often only included calibration points within those clades. In general, where this study disagrees with previous molecular work, the current dates are more consistent with the stratigraphic evidence.
Major Amniote Divergences
Although five calibrations were enforced, all other divergences within tetrapods were estimated. The split between amphibians and amniotes is estimated at ca. 354 Mya matching the fossil record (transitional tetrapod fossils 365 to 355 Mya: Daeschler et al., 2006), but we do not place much weight on this result as this node is below our most basal calibration. The divergence between archosaurs (plus turtles) and lepidosaurs was estimated at 285 to 289 Mya and indicates substantial fossil gaps: the first stem archosaurs appear ∼ 255 Mya (e.g., Brochu, 2001) and the first stem lepidosaurs occur ∼ 240 Mya (Evans, 2003). If the turtle-archosaur clade is accepted, the inferred divergence date between turtles and their nearest living relatives is 265 to 273 Mya. This relatively late date refutes the suggestion that captorhinids (Gaffney and Meylan, 1988) could be the nearest relatives of turtles, because captorhinids appear too early in the fossil record (∼ 300 Mya) to lie on the turtle stem. However, the other proposed turtle relatives (procolophonoids, pareiasaurs, rhynchosaurs, and sauropterygians: see Lee, 2001) all appear sufficiently late in the fossil record (< 250 Mya) to be consistent with the molecular evidence. The slow rate within turtles is not fully apparent until a calibration point is added within the group; adding a single calibration makes all reconstructed divergences within turtles much more consistent with the fossil record and broadly similar to those of a more comprehensive recent study (Near et al., 2005). Within lepidosaurs, the very early implied divergence between squamates and rhynchocephalians (at 268 to 275 Mya) makes the living Sphenodon even more distinctive than currently generally assumed. This split also implies a ∼ 40-My gap in the fossil record, the earliest fossil record of this split being the earliest rhynchocephalians at ∼ 225 Mya (e.g. Fraser and Benton, 1989). The first squamates appear much later (∼ 168 Mya), but these appear to belong to derived clades such as anguimorphs and scincomorphs and are consistent with an earlier date for squamate origins (Evans, 2003).
In principle, these dates could be compared to recent mtDNA studies (e.g., Janke et al., 2001; Rest et al., 2003; Pereira and Baker, 2006), but these studies employed similar calibration points bracketing these divergences (often amniote = 315 Mya and croc-bird = 245 Mya), thus constraining them to be similar in age to the inferred dates here. One consistent result of all analyses is the long branch and implied time interval (> 40 My) between the archosaur-lepidosaur and the bird-croc divergences. This contradicts the suggestion (Muller and Reisz, 2005) that these two divergences are closely spaced (∼ 255 and ∼ 245 Mya, respectively) and can both be used as calibration points (Sanders and Lee, 2007).
Amphibians
Although the amphibian divergence dates retrieved here need to be treated with caution, due to sparse taxon sampling and no internal calibration, the consistently young results are worth mentioning. Given that all the lissamphibian nodes are outside the most basal calibration employed, they may be prone to being over- (rather than under-) estimated, so the young dates reported here are unlikely to be due to this artefact. The basal (caecilian-batrachian) divergence within lissamphibians is dated at 292 to 322 Mya, less than other estimates (mtDNA of Zhang et al., 2005, 337 Mya; partial RAG-1 of San Mauro et al., 2005, 367 Mya). The basal divergence within Batrachia (between urodeles and anurans) is dated at 266 to 274 Mya, also younger than mtDNA (308 Mya) and RAG-1 (367 Mya) estimates; this younger date is more consistent with the relatively late appearance (245 Mya) in the fossil record of unequivocal batrachians (Heatwole and Carroll, 2001). Our estimates of major divergences in caecilians, anurans, and urodeles are all substantially younger than both previous molecular estimates. Although this makes our results more consistent with the known fossil record, further taxon sampling would be required to provide the internal calibrations needed to test the rate smoothing in this part of the tree.
Mammals
RAG-1 provides a robust and clear picture of both topology and divergence among the major mammal groups. The basal divergence between monotremes and therians is estimated at 207 to 227 Mya, comparable to Pereira and Baker (2006; 207 Mya) and van Rheede et al. (2006; 217 to 231 Mya). Within Theria, the marsupial-eutherian divergence of 182 to 195 Mya is broadly comparable with recent molecular analyses (Penny et al., 1999, 176 Mya; van Rheede et al., 2006, 186 to 193 Mya; Drummond et al., 2005, 170 Mya; Pereira and Baker, 2006, 191 Mya, Woodburne et al., 2003, 182 to 190 Mya; Hasegawa et al., 2003, 162 Mya) and much older than most palaeontological estimates (see Woodburne et al., 2003 for review). This fossil record probably needs to be reinterpreted. Thus, the divergences between the monotremes, marsupials, and eutherians are more closely spaced (∼ 28 My apart) than previous suggestions based on the fossil record. This short branch coupled with high divergences among all three groups, along with base composition biases in mtDNA, could explain lack of support in previous studies for the widely accepted marsupial-placental clade (see Phillips and Penny, 2003; Nilsson et al., 2004). What is particularly striking is the long stem lineages leading to each extant mammal radiation: ∼ 90 Mya stem for eutherians versus extant age of ∼ 100 Mya; ∼ 110 Mya versus ∼ 70 Mya for marsupials; ∼ 170 Mya versus ∼ 40 Mya for monotremes. This suggest high levels of extinction, and therefore potentially a large unknown diversity.
The basal divergence in living monotremes (platypus-echidna) is estimated at around 37 to 48 Mya, earlier than most other studies (e.g., Kirsch and Mayer, 1998; Belov and Helllman, 2003). Those latter studies formed the basis of the hypothesis that platypuses are ancestral to echidnas (implying secondary terrestriality and associated phenotypic reversals; see Musser, 2003; Dawkins, 2004). The oldest well-known fossil platypus (Obduradon) has the morphotype of living forms, and at 25 My old existed before the inferred platypus-echidna spilt (estimated to be as recent as 21 Mya); it was thus presumably a stem (“ancestral”) taxon to the platypus-echidna clade. However, the current analysis indicates that the platypus-echidna divergence (∼ 40 Mya) is sufficiently ancient to pre-date the earliest well-known platypus fossils (the Paleocene Monotrematum is very poorly known and its overall appearance is highly uncertain). Accordingly, there would be no need to invoke a platypus-like ancestry for living monotremes.
The dates for the marsupial radiation are similar to those obtained recently by Nilsson et al. (2004) and Drummond et al. (2006). The crown radiation is 66 to 73 My old, whereas the Australasian radiation (58 to 68 Mya) may slightly pre-date the separation of Australasia from South America, allowing the possibility that multiple lineages of Gondwanan marsupials were originally present in Australasia. This is consistent with the likely inclusion of the South American Dromiciops within the Australasian clade (Amrine-Madsen et al., 2003). The age of the extant eutherians has been estimated at 102 to 107 Mya consistently in recent multigene molecular clock studies (e.g., Hasegawa et al., 2003; Springer et al., 2003; Woodburne et al., 2003). This vindicates the fossil record (∼ 98 Mya calibration used here) and refutes hypotheses of much older cryptic lineages (e.g., Hedges et al., 1996; Penny et al., 1999). Similarly, the age of extant Cetartiodactyla is consistently estimated to around 65 Mya (e.g., Hasegawa et al., 2003). Of note is the young date for rodents of 21 to 27 Mya, in accord with nuclear analyses (Springer et al., 2003; Jansa et al., 2006) but again younger than mtDNA studies (Penny et al., 1999; Pereira and Baker, 2006). The RAG-1 analyses suggest that the latter dates are actually an artefact of rate and composition effects in mtDNA.
Archosaurs
Divergences within archosaurs are more consistent with the fossil record once the bird-croc = 245 Mya calibration is employed (see above). This additional calibration deepens crocodylian and basal bird divergences but has less effect on dates within Neoaves (Table 3). Using nucleotides, the extant crocodylian radiation is dated at 42 Mya, and the basal alligatorine (alligator-caiman) divergence is dated at 21 Mya; both these dates are inconsistent with the old fossil ages of crown crocodylians (85 Mya: Brochu, 2003) and crown alligators (66 to 71 Mya; Muller and Reisz, 2005). The amino acid estimates for these divergences are deeper, but even so, the 95% confidence intervals do not overlap substantially. This discrepancy has no obvious explanation and needs further investigation. Janke et al. (2005) used mitogenome data with the bird-croc = 245 Mya calibration to infer much older dates for crocodylians not supported by the fossil record: 137 to 164 Mya for extant crocodylians and 98 to 118 Mya for alligator-caiman. However, saturation causes mtDNA analyses to compress the basal branch leading to crocodylians relative to branches within the group (see Gatesy et al., 2003). This compression of the basal branch coupled with Janke et al.'s use of a deep external calibration would drag all inferred divergences within crocodylians deeper in time.
The basal bird split (paleognath-neognath) at 115 to 136 Mya is deeper than proposed by previous studies (Harrison et al., 2004; Slack et al., 2006; Ericson et al., 2006). Basal splits among Neoaves (67 to 81 Mya), occurring around or slightly before the KT boundary, are similar to most recent studies (e.g., Harrison et al., 2004; van Tuinen and Dyke, 2004; Slack et al., 2006; Ericson et al., 2006). Most studies of avian divergences use internal calibrations provided by bird fossils that may not be phylogenetically placed with great certainty (e.g., Cooper and Penny, 1997; Haddrath and Baker, 2001; Harrison et al., 2004; see also Slack et al., 2006, and Ericson et al., 2006). Our study (with calibrations throughout amniotes) provides additional support for the dates obtained in many recent studies. Studies employing such external (nonavian) calibrations have usually been based on very sparse taxon sampling (Hedges et al., 1996), whereas others were based on fast evolving mtDNA that would saturate on the timescale of the deeper calibrations used, such as the mammal-bird divergence (Rest et al., 2003; van Tuinen and Hadly, 2004; Pereira and Baker, 2006). Thus, the present study supports other recent studies indicating that the diversification of Neoaves occurred closer to the K-T boundary than previously suggested (e.g., Cooper and Penny, 1997; Van Tuinen and Hedges, 2001).
Squamates
Squamates are a major focus of this study and our results can be compared to the other three detailed analyses. Vidal and Hedges (2005) used nine nuclear genes; as half their concatenated sequence was RAG-1, their results might be expected to be similar to ours. However, two factors resulted in deeper estimated divergences. First, they interpreted Parviraptor (∼ 165 Mya) as a basal anguimorph (see Evans and Wang, 2005) and the Ptilotodon (∼ 110 Mya) as a teiid; these problematic taxa are discussed below. Second, all internal calibration points were treated only as minimum constraints, with the sole maximum age constraint being the squamate root node (≤ 251 Mya, as opposed to 190–201 Mya estimated here). Relaxed clock procedures would predispose such an analysis to stretch all basal branches until they hit the root constraint. As this root constraint is a liberal maximum possible age, this pattern would cause all basal node dates to be overestimated; this trend is evident in the results discussed below.
Wiens et al. (2006) used RAG-1 and also employed internal calibrations on a “backbone” tree to provide dates for primary divergences within squamates. However, inflation of basal dates was avoided as they imposed a tight maximum bound on the basal Sphenodon-squamate node, constraining this to be ≤ 227 Mya. Accordingly, their dates for basal squamate divergences are more recent (and often comparable to the dates obtained here). Wiens et al. 2006, also inserted mitochondrial trees for individual squamate groups onto the dated RAG-1 “backbone” tree. Kumazawa (2007) used complete mitochondrial genomes together with external calibration points. As discussed above, the combination of external calibrations and saturation-driven compression of basal nodes may contribute to the generally older dates.
The divergence dates between major squamates clades in this study are comparable to those in Wiens et al. (2006) but more recent than those proposed by Vidal and Hedges (2005) and Kumazawa (2007). For instance, the crown squamate radiation is 190 to 201 My old (∼ 179 Mya in Wiens et al., 2006; ∼ 240 Mya in Vidal and Hedges, 2005, and Kumazawa, 2007); divergences between snakes and anguimorphs + iguanians (Toxicofera) are 158 to 160 Mya (c.f. ∼ 164, ∼ 179, and ∼ 210 mya), between scincids and xantusiids 162 to 166 Mya (c.f. ∼ 158, ∼ 192, and ∼ 205 Mya). Thus, this study supports the shallower proposed time frame for squamate diversification. The branches at the base of the squamates tree (leading to gekkotans, dibamids, scincomorphs, scincids, cordyliforms + xantusiids, lacertoids + amphisbaenians, snakes, anguimorphs, and iguanians) are all relatively short, indicating that these major lineages all diverged within ∼ 50 Mya (see Townsend et al., 2004).
The identification of gekkotans as a basal (and thus early) squamate lineage implies a long stratigraphic gap between gekkotan origins (∼ 180 Mya) and first unequivocal examples (∼ 110 Mya; Evans, 2003). The early implied divergence of gekkotans from other squamates is thus consistent with the tentative identification of much older taxa (∼ 165 to 145 mya) as gekkotan relatives (see Evans, 2003) but does not constitute proof. The relatively recent crown gecko radiation, dated at 94 to 111 Mya here, is comparable to other studies (Vidal and Hedges, 2005, 111 Mya; Wiens et al., 2006, 87 Mya; contra Kumazawa, 2007, 180 Mya). Splits within diplodactylids are sufficiently young (54 to 68 Mya) to exclude Gondwanan vicariant tectonic scenarios (e.g., King, 1990; Han et al., 2004). However, Wiens et al.'s (2006) date for extant pygopods of ∼ 20 Mya is much younger than the 37 Mya estimated by Jennings et al. (2003) using an internal pygopod fossil calibration. As geckos appear to have slower rates than other squamates (see Fig. 2), and a long stem lineage, they may be systematically further underestimated and require more taxon sampling and internal calibrations (analogous to archosaurs). The age of crown acrodonts (80 to 94 Mya) is similar to Hugall and Lee (2004; 58 to 106 Mya) and Wiens et al. (2006; ∼ 79 Mya).
The extant snake radiation is estimated at ∼ 109 Mya, with cylindrophids and colubroids diverging 50 to 62 Mya. Wiens et al. (2006) obtained an older date for the snake radiation (∼ 131 Mya), probably as they employed a deep minimum age constraint for the cylindrophid-colubroid split (∼ 94 Mya). This contentious constraint (see Lee and Scanlon, 2002) was not employed in the current analysis, leading to shallower dates. The time frames for snake divergences in the current study and Wiens et al. (2006) are both younger than the dates presented by Noonan et al. (2006). The latter study inferred a date of 110 Mya for the cylindrophid-colubrid split. However, as all of their internal calibrations were minimal constraints, the analysis was predisposed towards stretching out the tree until the sole maximal constraint (on the root) is reached (see discussion above). Consistent with this, the posterior estimate for the root age (95% CI of 127 Mya) approaches the 130 Mya maximum allowed. As their tree is stretched towards a liberal maximum, the dates in Noonan et al. (2006) are most likely (substantial) overestimates.
Several palaeontological claims are seriously challenged by the relative recency of certain splits within squamates. The iguanid-acrodont split is dated at around 142 to 148 Mya (Wiens et al., 2006; ∼ 146 Mya), this is sufficiently recent to refute the identification of the ∼ 220 Myo (million-year-old) Tikiguania as an acrodont (Datta and Ray, 2006) and challenges the assignment of the fragmentary 165 Myo Bharatagama to the same group (Evans et al., 2002). Both taxa more plausibly represent a convergent early development of acrodonty. The latter interpretation is consistent with the observation that apart from this anomalously early putative acrodont, there are no other acrodont or iguanid fossils until about 110 Mya (Gao and Nessov, 1998). Both Vidal and Hedges (2005) and Wiens et al. (2006) used the teiid-gymnophthalmid split as a calibration; therefore, our date of 70 to 72 Mya is the only unconstrained molecular estimate. The retrieved date is inconsistent with both calibrations used. Vidal and Hedges accepted the ∼ 110 Myo Ptilotodon as a teiid, but this is questionable given that it is a tiny jaw fragment that exhibits no unique teiid synapomorphies (Nydam and Cifelli, 2002). The use of fossil Bicuspidon (Wiens et al., 2006) is also problematic. This very incomplete fossil (a jaw fragment) has affinities with polyglyphanodontids, which were formerly assumed to be teiid relatives (e.g., Nydam and Cifelli, 2002) but might be very remotely related, lying outside Scincomorpha altogether (Lee, 2005). The identity of the very early (∼ 165 Mya) Parviraptor as a basal varanoid is based on lower jaw characters and also needs to be reassessed (Evans and Wang, 2005). Varanoids (varanids and Heloderma) emerge as polyphyletic based on molecular data (Townsend et al., 2004; Vidal and Hedges, 2004), suggesting that similar jaw morphologies have evolved at least three times within squamates (varanids, Heloderma and snakes). Similarly, the split between anilioids and advanced snakes (caenophidians), at 50 to 62 Mya, is recent enough to refute the referral of ∼ 95 Myo vertebrae to derived caenophidians (colubroids); the earliest unequivocal colubroids are less than half that age (see Head et al., 2005).
Concluding Methodological Remarks
Our intention here is to let the RAG-1 data provide relatively independent dating estimates simultaneously across all the major tetrapod groups, free of potentially problematic calibrations within those groups, thus complementing previous within-group studies. Comparison with these studies reveals consistent patterns of similarity and differences, and raise the following important issues.
Choice of calibrations. Filtering calibrations merely on the basis of consistency with other calibrations may well exclude important and accurate calibrations. For RAG-1, it appears that PLRS reconstructs overly shallow divergences within archosaurs, with the result that well-corroborated fossil calibrations within archosaurs appeared inconsistent (too old) relative to other calibrations. Rather than reject these, however, in this case it appears prudent to retain these calibrations as local correctors, warping the tree in this region. Here, the good fossil record allowed one to conclude that the calibration was probably correct, and the molecular dates misleading. However, in cases where the fossil record is not so good, the situation is more ambiguous, raising a dilemma in choosing a subset of good calibrations: consistent ones are somewhat redundant, while inconsistent ones may be needed to correct weaknesses in the data and/or method used to adjust for rate variation.
Type of calibrations. Calibrations may be enforced as single dates, bounded ranges, or as minima. A current trend of applying internal fossil calibrations as minima with a maximum root constraint appears logical but actually leads to a consistent bias inflating dating estimates (as noted by Yang and Rannala, 2006). In such analyses, the single (possibly anomalous) internal calibration suggesting greatest overall tree depth can largely scale the tree (and all other calibrations will have little effect), whereas model fitting artefacts can further increase basal branch lengths until the maximum age constraint is reached. The retrieved dates for each node will be estimates of maximum possible age.
Biases in branch length estimation across different types of molecular data. Here, saturation of mtDNA appears to explain a consistent pattern of deep mtDNA ages in amphibians, mammals, and archosaurs compared to nuclear data estimates (as seen in Penny et al., 1999; Zhang et al., 2005; Janke et al., 2005; Pereira and Baker, 2006). Greater compression of deeper branches coupled with use of a deep calibration external to the group of interest will drag estimates of within-group divergences deeper in time.
Finally, there is a trade-off between enforcing multiple calibrations to improve the overall estimate, and letting the molecular data speak for itself (e.g., Near et al., 2005). If correct, multiple calibrations can improve accuracy in regions of the tree by overriding poorly estimated divergences and rate changes. However, imposing too many calibrations (including speculative and uncorroborated ones) can constrain the result to say nothing more than those a priori dating assumptions, obscuring information (and weaknesses) in the molecular data.
Acknowledgements
We thank the Australian Research Council for financial support, and J. Gatesy, M. Hedin, R. Page, and an anonymous reviewer for comments.





