Using More than the Oldest Fossils: Dating Osmundaceae with Three Bayesian Clock Approaches

—A major concern in molecular clock dating is how to use information from the fossil record to calibrate genetic distances from DNA sequences. Here we apply three Bayesian dating methods that differ in how calibration is achieved— " node dating " (ND) in BEAST, " total evidence " (TE) dating in MrBayes, and the " fossilized birth–death " (FBD) in FDPPDiv— to infer divergence times in the royal ferns. Osmundaceae have 16–17 species in four genera, two mainly in the Northern Hemisphere and two in South Africa and Australasia; they are the sister clade to the remaining leptosporangiate ferns. Their fossil record consists of at least 150 species in ∼17 genera. For ND, we used the five oldest fossils, whereas for TE and FBD dating, which do not require forcing fossils to nodes and thus can use more fossils, we included up to 36 rhizomes and frond compression/impression fossils, which for TE dating were scored for 33 morphological characters. We also subsampled 10%, 25%, and 50% of the 36 fossils to assess model sensitivity. FBD-derived divergence ages were generally greater than those inferred from ND; two of seven TE-derived ages agreed with FBD-obtained ages, the others were much younger or much older than ND or FBD ages. We prefer the FBD-derived ages because they best fit the Osmundales fossil record (including Triassic fossils not used in our study). Under the preferred model, the clade encompassing extant Osmundaceae (and many fossils) dates to the latest Paleozoic to Early Triassic; divergences of the extant species occurred during the Neogene. Under the assumption of constant speciation and extinction rates, the FBD approach yielded speciation and extinction rates that overlapped those obtained from just neontological data. However, FBD estimates of speciation and extinction are sensitive to violations in the assumption of continuous fossil sampling; therefore, these estimates should be treated with caution. Calibration is the single largest problem in molecular clock dating, influencing not only estimates of divergence times but also evaluation of the heterogeneity in substitution accumulation, since rates always derive from calibrated trees. There are many ways to calibrate genetic distances. They include fossils providing minimum divergence times (Sarich and Wilson 1967; Christin et al. 2014), oceanic islands with endemic radiations providing maximum ages of cladogenesis (Schaefer et al. 2009), ancient DNA of one's focal group (Korber et al. 2000), host ages as maximal ages for obligate parasites (Rector et al. 2007; …

Calibration is the single largest problem in molecular clock dating, influencing not only estimates of divergence times but also evaluation of the heterogeneity in substitution accumulation, since rates always derive from calibrated trees. There are many ways to calibrate genetic distances. They include fossils providing minimum divergence times (Sarich and Wilson 1967;Christin et al. 2014), oceanic islands with endemic radiations providing maximum ages of cladogenesis (Schaefer et al. 2009), ancient DNA of one's focal group (Korber et al. 2000), host ages as maximal ages for obligate parasites (Rector et al. 2007;Bellot and Renner 2014), ratios of substitution rates between hosts and parasites (Ricklefs and Outlaw 2010), published rates from other studies (e.g., Villarreal and Renner 2014), and node ages obtained in other studies, the so-called secondary calibration approach. The most widely used of these approaches is calibration with fossils. Since the introduction of Bayesian relaxed clock approaches that implement different strict and relaxed clock models, it is possible to accommodate prior notions about how tightly a fossil may fit a particular node with different prior distributions. In this framework, fossil ages can be used as point calibrations, hard minimum bounds, hard maximum bounds, soft maximum bounds, or to center a normal, lognormal, exponential, or uniform distribution. These distributions have large effects on the obtained ages (Ho and Phillips 2009;Warnock et al. 2012), and no amount of sequence data can correct the influence of incorrect prior constraints (Yang and Rannala 2006).
The use of several fossils to calibrate nested nodes in a tree has been suggested as a possible solution (Near et al. 2005), although this does not circumvent the problem of oldest fossils having a disproportionate effect on the outcomes (Parham and Irmis 2008). A case in point is the crown age of the flowering plants (angiosperms). Numerous molecular clock studies have constrained the relevant node to maximally 135 Ma based on a few pollen grains from Israel (dated to 132.9 Ma) that are the oldest widely accepted record of flowering plants (Brenner 1996). When left unconstrained, the angiosperm crown age is usually much older, for example, 228 (193-270) Ma (Smith et al. 2010), an estimate only slightly younger than angiosperm-like pollen from the Middle Triassic, dated to 247.2-242 Ma Feist-Burkhardt 2004, 2013). This example dramatically illustrates the problems stemming from the current need to assign oldest fossils to particular nodes, a problem worsened, not alleviated, by competing "oldest" fossils. Another problem with multiple fossils is that the effective calibrations may not resemble the specified calibrations because the various priors interact with each other, with the tree prior, and with other priors, such as monophyly constraints (Inoue et al. 2010;Heled and Drummond 2012). Two Bayesian clock methods exist that do not rely on node dating (ND) with one or more "oldest" fossils. They are total evidence (TE) dating (Ronquist et al. 2012a) and fossilized birth-death (FBD) dating (Heath et al. 2014). TE dating combines morphological data from extant and extinct species with DNA data to infer node ages. Unlike ND, total evidence dating can be applied to a set of fossils without fixing them to specific nodes in the tree. It relies on the morphological similarity between a fossil and the reconstructed ancestors in the extant tree to infer the lengths of extinct side branches on which a fossil sits (Ronquist et al. 2012a). Total evidence dating uses a uniform prior on the clock trees, even though trees will include terminals of different ages because of extinct side branches. The FBD approach also allows the use of multiple fossils per lineage/node, both old and young, but does not require a morphological data matrix as does TE nor prior age densities on fossils as does ND. Thus far, the FBD approach has been applied to bears (Ursidae), a small clade with a fossil record from the mid-Eocene to the mid-Pliocene (Heath et al. 2014). The method uses a complex prior on the clock trees to model taxon sampling completeness and speciation, extinction, and fossilization rates.
Here we compare ND, TE dating, and FBD in a fossil-rich and phylogenetically pivotal group of plants, the royal ferns (Osmundaceae). The royal ferns are monophyletic and are the sister clade to all remaining leptosporangiate ferns (Pryer et al. 2004;Schuettpelz and Pryer 2007). They comprise approximately 16-17 species in four genera, Osmunda, Osmundastrum, Leptopteris, and Todea (Metzgar et al. 2008). The first two occur mostly in the Northern Hemisphere extending into the humid tropics, the latter two in South Africa and Australasia (Kubitzki 1990). Royal ferns have an exceptional fossil record (Miller 1971;Tidwell and Ash 1994;Wang et al. 2014;Bomfleur et al. 2014b), with 150 species and 17 genera, ranging from the Permian to Neogene. Most of the fossils are foliar remains, but many are anatomically preserved (permineralized) axes. A molecular clock model for ferns that included two Osmundaceae and that was calibrated using numerous fossils, inferred a stem age for the Osmundaceae of 320 Ma (Late Carboniferous; Pryer et al. 2004). This result, however, was obtained with the crown age of (the two) Osmundaceae constrained to 206 Ma (Late Triassic) based on a point calibration derived from the rhizome fossil Ashicaulis herbstii from the Upper Triassic of Argentina (Tidwell and Ash 1994). This fossil is a doubtful member of modern (crown) Osmundaceae (Miller 1971;Tidwell and Ash 1994;Wang et al. 2014). Within-Osmundaceae divergence events have not been the focus of a previous molecular clock study. The discovery of an unusually well-preserved 182-190 myrold Osmunda fossil from Sweden (Bomfleur et al. 2014a) recently prompted a critical reevaluation of the rhizome fossils of modern Osmundaceae (Bomfleur et al. 2014b). This reappraisal (involving morphological data matrices) provided the opportunity to estimate the timing of major evolutionary events within the family, using both TE and FBD molecular clock dating. These two methods are the first to fully integrate fossils and molecular data, modeled as representing a single macroevolutionary process. They may result in older divergence times than traditional ND (Ronquist et al. 2012a: Hymenoptera, Heath et al. 2014: Ursidae) and can identify likely erroneous calibration fossils, as was the case for four of seven Hymenoptera calibration points (Ronquist et al. 2012a). To explore this possibility, we also applied traditional ND, using five oldest fossils alone or in combination.
Our phylogenies are based on ∼8.6 kb of plastid DNA data for 13 extant species, 33 morphological characters scored for 19 permineralized rhizomes (data from Bomfleur et al. 2014b), and 17 frond compression/impression fossils, assessed on the basis of key autapomorphic features for the present study. Incorporating fossils into tree models as undertaken here in principle allows inference of speciation and extinction rates with reasonable confidence as shown with simulated data by Heath et al. (2014), and is probably an improvement over inferring these rates just from neontological data. To test how rates inferred using the FBD process would differ from those obtained with just the 13-species tree of living Osmundaceae, we carried out a TreePar analysis (Stadler 2011).

DNA Sampling, Alignment, and Phylogenetic Analyses
We relied on the plastid DNA matrix of Metzgar et al. (2008), which consists of the protein coding rbcL, atpA, and rps4 genes and the spacers rbcL-accD, atpB-rbcL, rps4-trnS, trnG-trnR, and trnL-trnF. A few regions were excluded due to ambiguities in the alignment (29 bp in rbcL-accD, 129 bp in rbcL-atpB, 51 bp in rps4-trnS, 39 bp in trnG-trnR, and 146 bp in trnL-trnF) or missing data (rps4), resulting in a matrix of 13 species and 8616 nucleotide positions. We excluded the outgroup, Gleicheniales, because of long-branch attraction (Bomfleur et al. 2014b) and follow these authors in rooting Osmundaceae between Todea/Leptopteris and Osmunda/Osmundastrum instead of between Osmundastrum and the remaining three genera (Yatabe et al. 2005;Metzgar et al. 2008). Sampling covers the seven species of Osmunda from the three subgenera Claytosmunda, Osmunda, and Plenasium; the single species of Osmundastrum, three of the approximately six species of Leptopteris, and both species of Todea (the second species lacked morphological data and was therefore excluded from some analyses). Osmundaceae generic and subgeneric classification schemes proposed by Miller (1971) and Yatabe et al. (2005) are summarized in Fig. S1 (available on Dryad at http://dx.doi.org/10.5061/dryad.m6t1m). Species names and authors, herbarium vouchers, deposition in herbaria, and geographic origin of material are provided by Metzgar et al. (2008).
Phylogenetic analyses relied on maximum likelihood (ML) as implemented in RAxML 8.0.22 (Stamatakis 398 SYSTEMATIC BIOLOGY VOL. 64 FIGURE 1. Fossil record of the modern Osmundaceae mapped on a maximum likelihood tree from a plastid DNA matrix of 8616 aligned positions analyzed under the GTR + model with two data partitions. Rhizomes in red, fronds in green, with fossils either assigned to a branch (stippled lines) or a tree portion (shaded areas). Nodes (A-E) for which minimum age constraints were used in node datin runs are shown in the inset. Stratigraphic ranges of fossils abbreviated as follows: LT = Late Triassic; EJ, MJ, LJ = Early, Middle, and Late Jurassic, respectively; LC = Late Cretaceous; PG = Paleogene; NG = Neogene. 2014), using the GTR + substitution model. We ran both unpartitioned and partitioned analyses in which RAxML found separate substitution rates for the two genes (rbcL and atpA) and the five spacers. Support for the ML topology was assessed with the rapid bootstrap/automatic bootstop implementation in RAxML (Stamatakis et al. 2008;Pattengale et al. 2009).

ND with Oldest Fossils, TE Dating, and FBD Dating
The 36 fossils included in our study (listed in Table S1 with supporting references, localities, ages, and other details available on Dryad at http://dx.doi.org/ 10.5061/dryad.m6t1m) are unambiguous members of modern Osmundaceae. The 19 rhizomes of Jurassic to Neogene age were included in a morphological matrix by Bomfleur et al. (2014b). The 17 Triassic to Neogene fronds, including 14 Osmunda spp., two Osmundopsis spp., and Todea amissa, are placed within modern Osmundaceae based on lineage-diagnostic fertile or sterile features (references in Tables S1 and S2, available on Dryad at http://dx.doi.org/10.5061/dryad.m6t1m). The species-rich fossil genera Cladophlebis (including more than 80 described fossil species) and Todites (including more than 35 fossil species), usually identified as osmundaceous foliage fossils (Tidwell and Ash 1994), possess insufficient diagnostic characters for unambiguous assignment; some may represent one of the numerous extinct lineages of Osmundales.
The oldest fronds (O. claytoniites) with features diagnostic of modern members of Osmunda/ Osmundastrum are of Late Triassic age and were associated with the root node (Node A in Fig. 1). Jurassic fronds were also associated with the root because they exhibit only symplesiomorphies of the Osmunda/Osmundastrum clade or parallelisms limited to its subclades. Fronds with apomorphic characters diagnostic of Osmunda or its subgenera were associated with Nodes C and D ( Fig. 1). They are of Early Cretaceous and younger ages. Todea amissa (early Eocene, Argentina) has fronds characteristic of Todea and was thus linked to Node B (Carvalho et al. 2013).
For ND in BEAST v1.8 ), we used the five oldest fossils as constraints. We ran 2015 GRIMM ET AL.-DATING OSMUNDACEAE WITH THREE BAYESIAN CLOCK APPROACHES 399 BEAST with the tree topology fixed to the ML topology, a partitioning scheme chosen by jModeltest (Darriba et al. 2012) comparing 24 models under the Bayesian Information Criterion: atpB-rbcL HKY+I, atpA 1st codon HKY+I, atpA 2nd codon HKY+I, atpA 3rd codon HKY, rbcL-accD HKY+I, rbcL1st codon HKY, rbcL 2nd codon HKY, rbcL 3rd codon HKY + I, rps4-trnS HKY+ , trnG-trnR HKY+ , trnL-trnF HKY+ , an uncorrelated lognormal clock model, and an exponential prior age distribution on each constraint, with the offset value set to the minimum age of the stratum comprising the respective fossil and the mean so that the maximum age of the stratum was included in the 97.5% quantile. One frond fossil was assigned a point age of 52 Ma based on Carvalho et al. (2013; Table S1, available on Dryad at http://dx.doi.org/10.5061/dryad.m6t1m). Each analysis was run twice for 2*10 7 generations, sampling trees every 1000th generation. The effect of the priors on the posterior values was evaluated by running an additional analysis without the data (the DNA alignment). The parameters of all runs were evaluated using Tracer v1.5 (Rambaut and Drummond 2009) to confirm that (i) each Markov chain reached stationarity, (ii) the Effective Sample Sizes (ESS) were >200 for all optimized parameters, and (iii) independent runs produced convergent results. In each Markov chain Monte Carlo (MCMC) run, the first 10% of the samples were discarded as burn-in; the remaining samples were summarized in TreeAnnotator (part of the BEAST package) and visualized using FigTree (Rambaut 2014).
For total evidence dating in MrBayes 3.2 (Ronquist et al. 2012b) we used the same DNA matrix as before (except that Todea papuana was excluded because many rhizome characters were unavailable) plus a morphological matrix with 33 characters for the 19 rhizome fossils (Bomfleur et al. 2014b). We used two data partitions, with GTR+ for the DNA matrix and Mk model for the morphological matrix (Lewis 2001), as implemented in MrBayes. No topological constraints were employed, and the tree prior was uniform (Ronquist et al. 2012a). Two rhizomes were assigned a point age of 16 Ma based on Pigg and Rothwell (2001;details shown in Table S1, available on Dryad at http://dx.doi.org/10.5061/dryad.m6t1m); all other fossils were assigned uniform distributions based on min/max ages of the fossil occurrences. Three independent MCMC runs, with eight heated chains each (temperature parameter set to 0.1), were run for 2*10 7 generations, sampling trees every 2000th generation. Convergence was checked as for the BEAST run, and in each MCMC run, the first 2 million generations of the samples were discarded as burn-in and the remainder summarized and visualized.
For FBD dating we relied on FDPPDiv (available at https://github.com/trayc7/FDPPDIV, commit v.3f18ed7db29f985f12785b948998f42dfa8323af), with the ML topology as input tree (Fig. S1, available on Dryad at http://dx.doi.org/10.5061/dryad.m6t1m). FDPPDiv treats the topology and branch lengths of the extant species tree as given and does not permit assignment of separate substitution rates to specific data partitions. As required for FDPPDiv, the age of each fossil was drawn before analysis from a uniform distribution of its age range (ranges are given in Table S1, available on Dryad at http://dx.doi.org/10.5061/dryad.m6t1m). We performed two MCMC runs of 2*10 7 generations, sampling every 1000th steps. Log files were then compared in Tracer to check that convergence had been reached.

Effects of Using Rhizomes versus Fronds and Different Fossil
Percentages To be scored for the morphological matrix required for TE dating, fossils need sufficient characters. The frond and rhizome fossils used here differ greatly in this respect, with rhizome fossils having many more codable characters. This meant that for TE dating, only rhizome fossils could be used. To be able to compare TE results with FBD results, we carried out additional FBD runs that used only the 19 rhizomes or only the 17 fronds. We also ran FBD analyses for which we randomly drew 10% (4 fossils), 25% (9 fossils), or 50% (18 fossils) of the 36 fossils, with each drawing repeated 10×. Otherwise, these runs relied on the same settings as used for the full data sets.

Inferring Speciation and Extinction with TreePar versus the FBD Approach
To quantify diversification through time we estimated tree-wide speciation and extinction rates using the R package TreePar (Stadler 2011) on just the extant species tree and the FBD process on the tree with all 36 fossils. FBD assumes a constant rate model, so we fitted only this model of the many available in TreePar.

Documentation
Input and output files (used matrices, trees, and calibration results) and OSM files are included in an archive hosted at www.dryad.org (doi:10.5061/dryad.m75t0).  Divergence ages a (median values and 95% highest posterior density intervals) and optimized substitution rates obtained with FBD dating, TE dating, or ND with just the five oldest fossils as minimum age constraints (nodes A-E in Fig. 1 the ND approach (in BEAST). All branches received 100% bootstrap support (Fig. S1, available on Dryad at http://dx.doi.org/10.5061/dryad.m6t1m). Results from ND with oldest fossils (using partitioned or unpartitioned DNA data) are shown in Table 1. The average median substitution rate (over all branches) was 1.79*10 −4 in the unpartitioned analysis and 1.48*10 −4 with two partitions, close to the 1.33*10 −4 optimized under the FBD model (Table 1). Regardless of data partitioning, the ages of nodes inferred from ND are younger than those inferred using FBD (Table 1; Table S3, available on Dryad at http:// dx.doi.org/10.5061/dryad.m6t1m), with differences of up to 82 myr (in the case of the Osmunda crown age) or 63 myr (for the Todea/Leptopteris divergence).
The TE runs with the 19 rhizome fossils added to the DNA matrix yielded a majority rule consensus tree with seven nodes also found in the DNA tree and the ages of which could thus be compared between methods (Fig. 1, Table 1; Figs. S1 and S2, available on Dryad at http://dx.doi.org/10.5061/dryad.m6t1m). Two of the seven TE-derived ages agreed with the FBD-obtained ages, the others were much younger or much older than ND or FBD ages. Some deep nodes came out younger because the oldest rhizome fossil is 40 myr younger than the oldest frond fossil (Fig. S2, available on Dryad at http://dx.doi.org/10.5061/dryad.m6t1m, shows both the TE-derived chronogram and an FBDderived chronogram with just rhizome fossils). Jurassic fossils, including the recently published Korsaröd fern fossil (Bomfleur et al. 2014a), which has characters intermediate between Osmundastrum and Osmunda (Bomfleur et al. 2014b), were placed basal to clades comprising the extant species of these genera, matching the fossils' plesiomorphic or ambiguous morphological characters, and this resulted in a Jurassic root age (Table 1). Fossils with more apomorphic features than their modern relatives, such as the Paleogene fossils of subgenus Plenasium, resulted in very old crown ages for this subgenus. The Osmundastrum lineage was resolved as sister to the Todea/Leptopteris lineage, whereas in the DNA-only tree it is sister to Osmunda.
The geographic history of modern Osmundaceae through time is visualized in Figure 2 next  Notes: Shown are the absolute (FBD abs )-and relative differences (FBD rel )-to the divergence ages as inferred using all 36 fossils and the FBD approach (Fig. 2). FBD time tree. The Osmunda/Osmundastrum lineage had established in the Southern Hemisphere (Antarctica) by the Late Triassic, but the small genera (lineages) surviving today apparently date to the late Mesozoic and Paleogene of the Northern Hemisphere. The Todea/Leptopteris lineage, today confined to southern Africa (Todea) and Australasia (both genera), is represented in the fossil record by one Northern Hemisphere Early Cretaceous rhizome and one frond from the Paleogene of South America.

Speciation and Extinction Rates from Neontological Data versus with Fossils Included
With the FBD method, the speciation rate () was estimated as 0.0299 (0.0099-0.0549), the extinction rate () as 0.0240 (0.0039-0.0495), indicating a high turnover and relatively slow diversification rate (0.006 per myr). The fossil recovery rate, which models how many lineages (extinct and extant) are covered by the fossil sample was = 0.01531, meaning that there is a 31% probability that a species will be represented in the fossil record. Inference (using TreePar) of speciation and extinction rates from just neontological data gave slightly higher speciation ( = 0.0314) and extinction rates ( = 0.0339; Table 3), but both values fell within the confidence intervals of the FBD-inferred rates.

Comparison of the Three Bayesian Calibration Approaches
The first method to implement a full integration between fossil and molecular data was the TE approach of Ronquist et al. (2012a). In their original study, Ronquist et al. included even poorly preserved fossils (14 of their 45 fossils could be coded for only 4-10% of the 343 morphological characters), yet recovered topologies did not change and the median node ages were similar to those obtained with subsets excluding these poorly preserved fossils. In our study, we had 19 rhizome fossils coded for up to 33 morphological traits (Bomfleur et al. 2014b), but no morphological trait matrix for the frond fossils, so they could not be used in TE dating. The strengths of TE dating, the parallel optimization of node heights, and phylogenetic placement of fossils are also its major weaknesses: The method forces workers to carefully analyze and score morphological traits, but where comprehensive morphological data are unavailable, it cannot be used.
Traditional Bayesian node-dating-using-oldestfossils (Drummond et al. 2006) does not require a morphological matrix and has proven its power in hundreds of studies. Where possible, workers have used several fossils to calibrate nested nodes in a tree, attempting to reduce disproportionate influences of single fossils (Parham and Irmis 2008). Recent work, however, shows that the ad hoc priors placed on the ages of fossils in traditional Bayesian calibration (e.g., in the software BEAST) can be problematic because the probability distribution for the age of each calibrated node comes from both the node-specific calibration prior and the tree-wide prior on node ages. This leads to incoherence in the model of branching times on the tree (Heled and Drummond 2012;Warnock et al. 2012;Heath et al. 2014;Silvestro et al. 2014). This problem is avoided by applying a birth-death process to uncalibrated nodes conditioned on the calibrated nodes (Yang and Rannala 2006), which seems a more realistic representation of the lineage-diversification process and is achieved in the FBD approach (Heath et al. 2014).
The FBD approach does not require a morphological matrix and can, in principle, use the entire fossil record of a focal group, which greatly reduces the impact of unrepresentative (or misinterpreted) oldest fossils. Not having to compile a morphological matrix will be welcome to phylogeneticists interested in  divergence times, yet without expertise and resources for morphological work on living and fossil taxa. For many groups, perhaps especially plants, building morphological matrices including fossil and extant taxa may not be feasible (as was the case here for Osmundaceae fronds). Weaknesses of the FBD approach are that it does not directly incorporate uncertainties around the tree topology and the fossil ages; cannot use morphological data even if available; and does not permit assignment of separate substitution rates to separate data partitions. Partitioning of substitution models, however, can cause statistical problems in clock dating (Dos Reis et al. 2014). In our data set, there were no strong differences between ages inferred from unpartitioned BEAST clock dating runs and partitioned ones ( Table 1).

Implications of the Inferred Divergence Scenario for the
Evolution of Osmundaceae Initial radiations in the ferns took place in the mid-Paleozoic (Taylor et al. 2009). Monilophytes, the group incorporating whisk ferns (Psilotales), adder's tongue ferns (Ophioglossales), horsetails (Equisetales), marattioid ferns (Marattiales), and leptosporangiate ferns (Polypodiales), have a fossil record extending back to at least the Late Devonian (Taylor et al. 2009). Osmundaceae in the broad sense (including the primitive Thamnopteroideae) were established in the Permian, with many well-preserved (permineralized) rhizomes from the Middle to Upper Permian of Australia and Russia (Gould 1970;McLoughlin 1992) and members of their sister family Guaireaceae recorded from coeval strata in South America and China (Herbst 1981;Wang et al. 2014). The Guaireaceae and Osmundaceae emerged during a phase of stepwise global warming in the wake of the Late Paleozoic Ice Age, with Thamnopteroideae then becoming extinct during the end-Permian biotic crisis. The core Osmundaceae persisted in moist temperate to tropical climates to the present and extended into high latitudes during phases of greenhouse climates in the Mesozoic and Paleogene (Collinson 2002).
Given this phylogenetic and fossil background, we prefer the older Osmundaceae divergence times inferred with the FBD method (and also partly the TE approach; Table 1) over the mostly younger ages inferred from node-dating-using-oldest-fossils. Osmundaceous fronds (of poorly understood affinity; Table S2, available on Dryad at http://dx.doi.org/10.5061/dryad.m6t1m: Todites, Rooitodites, Birtodites, and Elantodites) are common from the Late Triassic onwards. The FBDinferred Middle to Late Triassic stem ages of the lineages Osmunda/Osmundastrum and Leptopteris/Todea are consistent with these paleontological dates; these lineages apparently beginning to diverge in the aftermath of the end-Permian mass extinction (Tidwell andAsh 1994, Taylor et al. 2009; Fig. 2). The preferred chronogram (Fig. 2) further indicates segregation of Todea and Leptopteris and of the three subgenera of Osmunda during the mid-Cretaceous and radiation and establishment of extant species in the Neogene. Modern Osmundaceae appear to have originated in the humid temperate belt of southern Gondwana (see maps in Fig. 2). The modern distribution of the single species of Osmundastrum (= Osmunda cinnamomea) includes humid climate tracts of South America, eastern North America, and East Asia. Fossil evidence places Osmundastrum in Canada by the Late Cretaceous (Serbet and Rothwell 1999), so range expansion to both hemispheres may have occurred during the more humid phases of the mid-Mesozoic.
An implication of the inferred Late Triassic crown age of Osmunda/Osmundastrum is that Early to Middle Jurassic rhizomes, which are intermediate between Osmundastrum and Osmunda, represent stem group taxa of either Osmundastrum or Osmunda. For the recently described 182-190 myr-old Korsaröd fern rhizome from Sweden (Bomfleur et al. 2014a), the divergence times obtained here suggest that it represents an early precursor of the Osmundastrum lineage. The "Osmundastrum precursor" hypothesis is one of three alternatives that can be inferred from the set of analyses conducted by Bomfleur et al. (2014b). Our molecular (FBD) dating also provides a time frame for the morphological innovations in the Osmundastrum lineage, 404 SYSTEMATIC BIOLOGY VOL. 64 which occupies similar humid habitats as Osmunda, through a broad range of latitudes (previous paragraph). The post-Cretaceous increase in shaded, moist niches in broad-leaved angiosperm-dominated forests may have provided improved ecological conditions for Osmundaceae, particularly for a lineage such as Leptopteris, which cannot survive without permanent moisture (Brownsey and Perrie 2012).

Inferring Speciation and Extinction Rates with the FBD
Approach Incorporating many fossils allows confident inference of speciation and extinction rates as shown with simulated data by Heath et al. (2014). In our Osmundaceae data, speciation and extinction rates from the FBD approach and the neontological data (with TreePar) were similar (Table 3), with the confidence intervals around the FBD rate bracketing the TreePar rates. With very few tip species, TreePar sometimes infers higher extinction than speciation rates, which seems to have been the case here (13 tip species). A parameter that may influence (and distort) the inferred speciation and extinction rates is the sampling rate of the living species, which is set to "1" in FDPPDiv. Future within-species sampling may reveal that some Osmunda forms include multiple species. For instance, the extremely widespread O. (Osmundastrum) cinnamomea (approximately 5 common synonyms) and O. regalis (approximately 10 synonyms) show intraspecific morphological variation (not covered in our molecular data) that is comparable with inter-specific diversity in other Osmundaceae. This would mean that species sampling in this study might not have been complete. Estimates of speciation and extinction are also sensitive to violations in the assumption of continuous fossil sampling (T. Heath, personal communication, September 2014), and the fossil sampling rate that the FBD method inferred from our data, = 0.01531, which implies that a species has a 31% probability of being represented in the fossil record, seems extremely high. All this cautions against attaching too much weight to our estimates of diversification and turnover.

CONCLUSIONS
It is now feasible to analyze molecular and fossil data together, to jointly estimate speciation and extinction dates of fossil species and branching times of the phylogeny, assuming a common underlying birth-death process (Ronquist et al. 2012a;Slater et al. 2013;Silvestro et al. 2014;Heath et al. 2014). As shown by our experiments with just 10%, 25%, and 50% of the total 36 fossils, the FBD approach is relatively insensitive to fossil sampling density (as found by Heath et al. 2014), although the reconstructed tree ages are fairly sensitive to the oldest fossil. It is, however, extremely memory intensive; our data required 8 CPU processors and MCMC lengths of 20 million generations, taking about 7 h/run. The TE approach, at least for Osmundaceae, yielded an unexpected mix of young and old divergence times, and also an odd tree topology, whereas the FBD approach resulted in much older divergence times than did traditional ND.