## Abstract

Here we introduce a general class of multiple calibration birth–death tree priors for use in Bayesian phylogenetic inference. All tree priors in this class separate ancestral node heights into a set of “calibrated nodes” and “uncalibrated nodes” such that the marginal distribution of the calibrated nodes is user-specified whereas the density ratio of the birth–death prior is retained for trees with equal values for the calibrated nodes. We describe two formulations, one in which the calibration information informs the prior on ranked tree topologies, through the (conditional) prior, and the other which factorizes the prior on divergence times and ranked topologies, thus allowing uniform, or any arbitrary prior distribution on ranked topologies. Although the first of these formulations has some attractive properties, the algorithm we present for computing its prior density is computationally intensive. However, the second formulation is always faster and computationally efficient for up to six calibrations. We demonstrate the utility of the new class of multiple-calibration tree priors using both small simulations and a real-world analysis and compare the results to existing schemes. The two new calibrated tree priors described in this article offer greater flexibility and control of prior specification in calibrated time-tree inference and divergence time dating, and will remove the need for indirect approaches to the assessment of the combined effect of calibration densities and tree priors in Bayesian phylogenetic inference.

Divergence time dating and phylogenetic inference are related concerns. Recent advances in Bayesian phylogenetic inference (Rannala and Yang 1996; Yang and Rannala 1997; Huelsenbeck and Ronquist 2001; Drummond and Rambaut 2007) have culminated in the field of relaxed phylogenetic inference, in which both divergence times and phylogenetic relationships are simultaneously estimated (Drummond et al. 2006). This estimation is aided by relaxed molecular clocks (Thorne et al. 1998; Kishino et al. 2001; Thorne and Kishino 2002; Drummond et al. 2006; Rannala and Yang 2007) which reconcile nonclock-like evolution with an underlying time-tree in which common ancestors are placed on an axis of time. To produce results on an absolute time scale it is necessary to either provide information on the rate of molecular evolution or alternatively calibrate a subset of internal nodes with a calibration density (Thorne et al. 1998; Drummond et al. 2006; Yang and Rannala 2006). Either way, in a Bayesian setting, a *tree prior* must also be placed on all the uncalibrated divergence times. The tree prior is a function that assigns a prior probability density to every possible tree. Arguably the simplest tree priors are the one-parameter Yule model (Yule 1924) and the two-parameter birth–death model (Nee et al. 1994; Gernhard 2008). The latter has been suggested as an appropriate null model for species diversification Nee et al. (1994a) and has been extended to include additional parameters to model various types of incomplete sampling (Yang and Rannala 1997; Stadler 2009b; Höhna et al. 2011). The other commonly used tree prior, the coalescent (Kingman 1982), is typically deployed when all the samples are from the same species. The coalescent is not handled here but calibration information for a specific group of individuals usually does not exist. However, the calibrated prior can be used to calibrate a species tree, within which the gene trees follow the “multispecies coalescent” prior in a species-tree/gene-trees analysis (Heled and Drummond 2010).

In a Bayesian setting, combining a calibration density (on one or more divergences) with a tree prior into a single calibrated tree prior for divergence time estimation possesses a number of subtleties worthy of note, which we cover under the following headings.

## Fossil Bounds on a Single Divergence

Consider the simplest type of calibration to admit uncertainty: The placement of an upper and a lower limit on the age of a single divergence ($hC$) in the tree:

A node calibration of this type is associated with a specific subset of the taxa $C$. Throughout this article our analytical results and implementations require that these taxa are monophyletic in the phylogeny, and their most recent common ancestor is the divergence that is calibrated. The topology within the calibrated clade can be subject to uncertainty, as can the topology outside the calibrated clade, but the existence of the clade is a condition of the calibration.

This simple approach to calibration already has two quite distinct interpretations in a Bayesian setting when considered within the context of an overall tree prior on all divergence times. One interpretation is that the resulting marginal prior distribution on the calibrated divergence should obey the tree prior (e.g., Yule or birth–death) but be constrained to be within the upper and lower bounds, so that the full calibrated tree prior, $\rho G(\xb7)$, is:

There is a difference between these two prior formulations regardless of whether the tree topology is known or estimated. In a previous work Heled and Drummond (2012) described how to efficiently compute the latter formulation in the face of uncertainty in tree topology for arbitrary single-divergence calibration densities under the Yule tree prior.

## Nested Calibrations

It has been routine in almost all treatments of phylogenetic calibration so far to specify independent univariate priors for each calibrated divergence time. However, calibrated divergence times that are nested in the tree are necessarily interdependent, such that the more recent calibrated divergence of a nested pair must be younger than the older calibrated divergence. If the specified calibration densities overlap then the resulting marginals of the joint prior will necessarily differ from the specified calibration densities. We do not address this issue here. However, the correct solution to this problem is simply to specify a joint prior on the calibrated nodes that obeys the necessary condition that nested nodes are *order statistics* and, therefore, not free to vary independently.

## The Influence of Calibrations on the Tree Topology Prior

Heled and Drummond (2012) demonstrated that a natural interpretation of the “conditional-on-calibrated-age” construction of a calibrated tree prior produces a distribution that is non-uniform on ranked topologies. However, we show in this article that the tree prior can be decomposed into a prior on the node ages (both calibrated and uncalibrated) and a prior on the set of possible ranked histories. We show that this provides a means to compute a tree prior rapidly if a uniform prior on ranked trees is chosen. We compare this approach to a computational intensive alternative that weighs each ranked tree topology by its probability conditional on the divergence times of the calibrated nodes. The latter is a natural extension to our previous work to the case of multiple calibrations and a birth–death process prior. However, this extension turns out to be computationally expensive except for some special cases where a closed-form formula exists. We therefore advocate the former approach (that always applies a uniform prior to ranked trees) as a practical alternative.

## Methods

Consider the following notation:

$n$ Number of taxa.

$\Psi $ The set of all ranked binary topologies on $n$ taxa. We keep $n$ implicit to simplify the notation.

$\psi $ A ranked tree ($\psi \u2208\Psi $). See Gavryushkina et al. (2013) for a formal definition of a ranked tree.

$h={h1,h2,\cdots ,hn\u22121:hi\u2265hi+1\u22650}$, an ordered set of divergence ages.

$g=\u3008h,\psi \u3009$, a time tree on $n$ taxa.

$G$ the space of all time trees.

$\Lambda $ the parameters of the tree prior process. For the pure birth (Yule) prior $\Lambda ={\lambda}$, where $\lambda $ is the birth rate, while $\Lambda ={\lambda ,\mu ,\rho}$ for the birth–death prior, where $\mu $ is the death rate and $\rho $ is the sampling rate.

$\theta =\u3008\Omega ,R\u3009$, a pair of parameter vectors, one for the substitution process $\Omega $ and one for the rates of the molecular clock, $R$.

### Posterior Probability for Bayesian Inference

Without calibration, the posterior probability density of $(g,\Lambda ,\theta )$ given a sequence alignment, $D$ can be written:

The term $Pr{D|g,\theta}$ is the phylogenetic likelihood (Felsenstein 1981). The rates $R$ and divergence times $h$ combine to provide branch lengths in units of substitutions per site on the edges of $\psi $. The term $fG(g|\Lambda )$ is the uncalibrated tree prior and it can be readily factored in the following way:

$f(h|\Lambda )$ is easy to compute for the pure birth (Yule) prior, birth–death prior or any prior whose equivalence classes are defined entirely by the divergence time order statistics. Under the Yule or birth–death prior *without calibrations*, $Pr(\psi |\Lambda )=|\Psi |\u22121$, is a uniform prior on all ranked topologies. However, this factorization is no longer simple when calibrations are introduced (Heled and Drummond 2012), and so we must develop an alternative approach to describing the calibrated tree prior in the following sections, which we will call $\rho G(\xb7)$ to distinguish from the uncalibrated tree prior $fG(\xb7)$. Note that throughout the remaining sections the tree priors are always conditional on $\Lambda $, but we suppress the conditioning in the notation for the sake of clarity.

### Calibrated Birth–Death Density

We introduce some extra notation for calibrations:

$K$ Number of calibration points.

$\phi $ Set of conditions on $\Psi $, typically clade monophyly constraints. $\phi $ plays a part in the terms defined below, but because it is fixed in each case we mostly keep it implicit to make the equations easier to read.

$\Psi \phi $ The subset of all ranked topologies for which $\phi $ holds.

$i(\psi )=(i1,i2,\cdots ,iK)$, mapping a ranked tree to the ranks of the calibrated nodes. Typically those are the ranks of the clades in $\phi $, but $i$ may, for example, pick the rank of a clade's parent instead.

We use two additional mappings which are a function of $i$. $i\xaf(\psi )=(i\xaf1,i\xaf2,\cdots ,i\xafK)$ is the mapping of calibration ranks into their sort order. For example, if $i=(3,1,4)$ then $i\xaf=(2,1,3)$ and if $i=(7,4,2)$ then $i\xaf=(3,2,1)$.

Also, $i\u02c6(\psi )=(i\u02c61,i\u02c62,\cdots ,i\u02c6K)=(ii\xaf1,ii\xaf2,\cdots ,ii\xafK)$ are the ranks of the calibrated nodes sorted by age. For the two examples above we have, respectively, $i\u02c6=(1,3,4)$ and (2,4,7).

$\Psi \phi ,x$ The subset of $\psi \u2208\Psi \phi $ for which $i\xaf(\psi )$ is equal to the sorting order of the heights vector $x$. That is, all the ranked topologies which are compatible with the heights $x$.

$h\psi =(hi1,hi2,\cdots ,hiK)$, the heights of the calibration points on a given ranked tree $\psi $. For convenience $g\psi $ is the same as $h\psi $ when $g=\u3008h,\psi \u3009$.

$\rho h(h\psi )$ A $K$-dimensional calibration density.

Figure 1 illustrates the main elements of our notation on an example tree with seven taxa and three calibrated subclades.

In BEAST, the calibrated tree prior has been defined as,

*not equal*to $\rho h$.

In Heled and Drummond (2012) we showed that it is easy in principle to preserve the calibration marginal by scaling the prior with the conditional marginal value, that is, the total density of all trees whose calibration times are identical to the calibration times of $g$:

The same general principle works for multiple calibrations:

The notation for describing the calibrated prior is challenging because calibrated clade ages are not a simple subset of all ages. It may seem natural to define the joint density by defining the tree prior density as the product of conditional and calibration priors as done in Equation 3 of (Yang and Rannala 2006) and mirrored in our own Equation 3. But then Yang and Rannala deal only with trees whose ranked topology is known. For example, with this formulation one can easily forget that the space of possible values for the uncalibrated nodes depends on the tree topology and the calibrated nodes, and although this notational omission may be fine when the topology is fixed, it should be explicit when dealing with the whole of tree space. We think our notation is better suited for describing the properties of the prior in the context of the full tree space.

The conditional prior in equation (8) preserves the marginal by construction. This is easy to see by writing down the marginal density for $x$, a fixed vector of calibration values:

However, the usefulness of this prior depends upon the computational cost of evaluating $fh(x)$ as part of the full posterior. In a few cases we can obtain a simple formula and the cost is negligible, and for the rest we offer either (i) a general algorithm for computing the marginal by iteration or (ii) the *restricted conditional*, a faster alternate correction to be used when (i) is too slow. The iterative approach is based upon the *clade level partition*, which divides $\Psi \phi $ into disjoint subgroups whose marginal has a closed form, and we shall discuss the details later.

The restricted conditional prior is defined as follows

Here the correction is defined as the marginal of the tree prior density when keeping *both* the topology and calibrated ages fixed. This is equivalent to extending the approach taken by Yang and Rannala (2006) to the case of an unknown tree topology. Again, the marginal over tree space is preserved by construction,

### The Marginal Yule for Multiple Calibrations

We start by showing how to decompose the Yule density of genealogy $g=\u3008\psi ,h\u3009$ conditional on $\phi $. The decomposition is based on separating the heights into $K+1$*levels*, where each level spans the range between two consecutive calibration points.

- (14)$\u222babdx1\u222bax1dx2\u222bax2dx3\cdots \u222baxk\u22121dxk\u2009\u2009\lambda e\u2212\lambda x1\lambda e\u2212\lambda x2\cdots \lambda e\u2212\lambda xk=1k![\u222bab\lambda e\u2212\lambda x1dx1][\u222bab\lambda e\u2212\lambda x2dx2]\cdots [\u222bab\lambda e\u2212\lambda xkdxk]=1k!(e\u2212\lambda b\u2212e\u2212\lambda a)k$
- (15)$\u222ba\u221edx0\u222bax0dx1\u222bax1dx2\cdots \u222baxk\u22121dxk\u2009\u2009\lambda e\u22122\lambda x0\lambda e\u2212\lambda x1\cdots \lambda e\u2212\lambda xkbypropositionI=\u222ba\u221e\lambda e\u22122\lambda x01k!(e\u2212\lambda a\u2212e\u2212\lambda x0)kdx0by\u2009Equation\u2009(A.10)=1(k+2)!e\u2212(k+2)\lambda a.$

Proposition I gives the contribution of $k$ internal nodes located between two consecutive calibration points with ages $a$ and $b$. Proposition II gives the contribution of $k+1$ nodes older than the last calibration point.

When the calibration values are fixed to $x=(x1,x2,\cdots ,xK)$, the contribution of the ranked topology $\psi $ is

The above uses $x\xaf=(xi\xaf1,xi\xaf2,\cdots ,xi\xafK)$, the calibration height sorted by age. Now, let $cj$ be the number of internal nodes in each level, $cj=i\u02c6j+1\u2212i\u02c6j\u22121\u2009\u2009(0\u2264j\u2264K)$, and for convenience let $i\u02c60=0$ and $i\u02c6K+1=n$. Using Propositions I and II we get

The marginal density is the sum over all *valid* topologies

To be valid, the order of calibration points by age has to be compatible with $x$.

While explicitly summing over all topologies is not feasible, evaluating the sum is possible by partitioning $\Psi \phi $ into ${\Psi \phi 1,\Psi \phi 2,\cdots}$, where $\psi 1,\psi 2\u2208\Psi \phi k\u21d2i(\psi 1)=i(\psi 2)$. That is, topologies in the same partition have the same number of internal nodes in each *level*. Because equation (17) depends only on those counts ($ci$) and not on the exact ranking, we have $fT(x,\psi 1)=fT(x,\psi 2)$ for two topologies in the same partition. Finally,

### The Marginal Birth–Death Prior for Multiple Calibrations

The birth–death process starts with a single species, and evolves over time through existing species giving birth (splitting) to new species at constant rate $\lambda $ and dying (unobserved) at constant rate $\mu $ (Kendall 1948). Although this characterization is unique, there are several versions of the prior which differ in their start and end conditions. BEAST uses the birth–death-sampling$\rho $ process, which assumes a uniform distribution $[0,\u221e)$ on the time of the tree origin, and that the tips of the tree are sampled with probability $\rho $ to obtain exactly $n$ taxa. The density for this prior is given in equation (5) of (Stadler 2009b):

We obtain the marginal for the birth–death process using exactly the same procedure as described for the Yule, but using the birth–death analogous for Propositions I and II. We use the following definitions for convenience:

$p1(t)$ is the probability that a lineage leaves one descendant after time $t$, which is easy to integrate

and gives us the birth–death equivalent of Proposition I:

Which is proved in the Appendix. For the critical case $\lambda =\mu $ we take the limit and use $q1(t)=11+\lambda \u2032t$ in the formulas above.

### Partitioning and Counting

To evaluate the marginal (equations (19) and (10)) we need to establish a valid partitioning and count the number of ranked topologies in each partition. Ideally, the partition would be the smallest possible, that is $\psi 1,\psi 2\u2208\Psi \phi k\u21d4i(\psi 1)=i(\psi 2)$. Unfortunately, we were unable to derive a counting formula under this constraint and instead use the clade level partition, a refinement based on the number of lineages per level inside each calibrated clade. Formally, let $r(\psi )={r1,r2,\cdots ,rK}$ where $rj=(rj0,rj1,\cdots ,rjK)$ and $rjk$ is the number of subclades (not already counted) of the $k$-th calibration point whose rank is smaller than $ij$. Because $1+\u2211krjk=ij$, the equivalence classes induced by $r$ are a refinement of the ones induced by $i(\xb7)$. Furthermore, we can count the number of topologies in each class by using two generic combinatorial principles: First, the number of ways for lineages to coalesce in each level is independent of other levels, so the product of counts of all levels gives the total number of topologies. Second, when $n=n1+n2+\cdots +nj$ lineages enter a level and are reduced to $k=1+k2+\cdots +kj$, where lineages can coalesce only within their group ($ni\u2192ki$) and the root of the first group is calibrated ($k1=1$), the total number of ranked ways is $(n\u2212k\u22121n1\u22122,n2\u2212k2,\cdots ,nj\u2212kj)\u220fi=1jRniki$.

$Rnk$ is the number of ranked ways $n$ lineages can coalesce to $k$ (equation (A.1)), and for convenience $Rn=Rn1$.

Figure 2 illustrates the counting procedure on a small example tree.

The use of the clade level partition has an interesting consequence, which relates to the second property of the conditional prior, namely that trees are “Yule-like” (or “birth–death-like”) conditional on the calibrated ages. This means that the density ratio of trees with equal calibrated ages is the same as their density ratio under the uncalibrated tree prior alone (equation (3) in Heled and Drummond (2012)). This condition is relaxed for the restricted conditional prior, where by construction this ratio equality is true only for trees with the same ranked topology. However, because the marginal (equation (17)) depends only on the number of lineages between levels and not on the exact ranked topology, the space in which each tree is “birth–death-like” is in fact larger, containing all trees in the same partition.

### Enumerating Ranked Topologies Classes

Here we explain the procedure for explicitly enumerating all the elements of the clade level partition. The enumeration is based upon combining several iterators, one for every calibrated clade, which return the number of lineages in each level of that clade. Those counts are used to compute the marginal as explained in the previous section. The calibrated nodes and the root of the tree, which define the levels, are not included in the counts. We show the working via an example; the interested reader should consult the source code for the very low-level details. The iterator is built from the product of $K+1$ per-clade iterators, one for each calibration and one for the “free” lineages outside any calibrated clade. In fact, each calibrated clade is potentially composed of several calibrated subclades and some free lineages, and the iterator for the clade handles the free lineages and the surviving lineage from the root of each calibrated subclade. Figure 3 gives an example with three calibrated clades, $N$ with $n+2$ lineages, nested inside $M$ with $n+m+3$ lineages, and $L$ with $l+2$ lineages. The uppercase letters are the clade name, and the lowercase letter gives the number of additional lineages.

In addition there are $r\u22650$ free lineages, for a total of $n+m+l+5+r$ lineages in the tree. The $m+1$ lineages of $M$ not in $N$ coalesce on the way to the clade root with each other and with the roots of the nested clades, in this case $N$. The $r$ free lineages coalesce with the roots of L and M on the way up, and uncalibrated internal nodes can be in any of the four levels.

Because there are three calibrations there are four levels, separated by the dashed lines, and each per-clade iterator returns four numbers. The iterator of $N$ is trivial, always returning $(n,0,0,0)$, because its root defines the first level. The iterator for $L$ is simple too, because the lineages can coalesce only in the first two levels and there are no free lineages. The iterator returns $(l,0,0,0),(l\u22121,1,0,0),\u2026(0,l,0,0)$.

The iterator for $M$ takes care of $m+1$ free lineages which can coalesce in the first three levels. The iterator returns $(m,0,0,0),(m\u22121,1,0),(m\u22121,0,1),(m\u22122,2,0),(m\u22122,1,1),(m\u22122,0,2)\u2026,(0,0,m)$. Basically, the iterator first returns all the cases with $m$ internal nodes in the first level, then all cases with $m\u22121$ internal nodes in the first level, and so on. The same pattern holds (recursively) for the rest of the levels.

The last iterator takes care of the $r$ free lineages and the surviving lineages of any subclade, here the roots of M and L. In this example this iterator is only necessary if $r>0$, as otherwise there are only two lineages left to deal with. While the internal nodes can be in any of the four levels, there are some restrictions. In general, these restrictions can be quite involved. In this example, the restrictions arise because the enclosing clade (here the root of the tree) has more than one subclade. As a result we always have at least three lineages above $h2$, and because only two lineages coalesce at the root, the excess has to coalesce in the top two levels. So, the iterator returns $(r\u22121,0,1,0),(r\u22121,0,0,1),(r\u22122,1,1,0)\u2026,(0,0,0,r)$, filling up lower levels first as before, while keeping at least one event in the top two.

## Results

### Calibrating the Parent of One Clade

Sometimes the calibration information is about the time a particular clade (say a genus, or a species that is divided into subspecies) separated from other lineages in the tree. For a single lineage, the density is given in Heled and Drummond (2012)

Note that the parent age is equal to the (pendant) branch length, and in fact $fh(x)$ is the distribution of the branch length when conditioning on the number of leaves. Furthermore, because this holds for any branch, we can derive a mean of $1/2\lambda $, which reproduces a result discussed by Steel and Mooers (2010).

The result can be generalized to any clade $C$ of size $n$. In that case, let $\Psi \phi $ be the set of all genealogies of $n+l$ taxa with a clade on $n$ taxa ($n>1$ and $l>0$) (Fig. 4a).

We partition $\Psi \phi $ so that $\Psi \phi k$ contains all genealogies containing $k+1$ surviving lineages at $h$, the age of the calibrated parent. By the second counting principle, there are $RnRli(n\u22122+l\u2212in\u22122)$ ranked ways for lineages to coalesce in the first level, $Rik+1$ ways for $i$ lineages to reduce to $k+1$ in the second level, and then one of the $k+1$ coalesce with the parent of $C$. Then $k+1$ lineages coalesce to the root, giving

The total number of ranked trees in $\Psi \phi $ is

Note that the marginal does not depend the size of the tree, just on the size of the calibrated clade.

### Calibrating Two Nested Clades

Here we give the marginal density for two nested clades. When the enclosing clade is the root (Fig. 4b), the marginal is

And when the enclosing clade is proper (Fig. 4c), it is

See the Appendix for additional details on the derivation of those formulas.### Placing Additional Monophyly Constraints

It is important to keep in mind that placing additional constraints can invalidate the closed form equations for the marginal. However, it may still be possible to obtain a formula for the full set of constraints. For example, the marginal density for a clade of size $n$ in a $n+m+1$ taxa tree with an outgroup can be obtained by integrating out $h2$ in equation (34) and is equal to

which is not equal to the marginal for the same-sized tree where the monophyly on the $n+m$ clade is not enforced.However, we can derive the marginal in some cases that are not covered by the standard construction (root ages of clades and no extra constraints). For example, take the *BEAST analysis performed as part of the investigation of determining the Pipid root (Bewick et al. 2012). This analysis involves the genera *Xenopus*, *Silurana*, *Hymenochirus*, *Pipa*, and an outgroup. Five species in total with a four-taxon clade and a calibration on the age of the parent of *Pipa*. There are $6\xd73$ valid ranked topologies: nine of those have three internal nodes above the calibrated parent, six has two above and one below, and the remaining three has two below and one (the root) above.

The total density for $a$ internal nodes above and $b$ below by equation (17) is

### AFour-Taxon Tree with One Calibration

Following Heled and Drummond (2012) we consider the following four-taxon tree in which taxa a,B are constrained to be monophyletic and their most recent common ancestor is calibrated with density $fAB$.

There are four ranked topologies in this case, and the 2012 article gives the marginal density for each. Here we wish to contrast the three priors using concrete values: A birth rate of $\lambda =1/2$ and a uniform calibration prior ($fAB=U[4,6]$). Table 1 summarizes the results.

Multiplicative | Conditional | Restricted conditional | ||
---|---|---|---|---|

Prior “correction term” | ((a,b),(c,d)) $TCD<TAB$ | – | $3\lambda e\u22123\lambda h2$ | $12e\u22123\lambda h2(1\u2212e\u2212\lambda h2)$ |

((a,b),(c,d)) $TCD\u2265TAB$ | ||||

(((ab),c),d) | – | $3\lambda e\u22123\lambda h3$ | $4\lambda e\u22124\lambda h3$ | |

(((a,b),d),c) | ||||

Marginal Topology probability | ((a,b),(c,d)) | 93% | 94.2% | 50% |

(((a,b),c),d) | 3.5% | 2.9% | 25% | |

(((a,b),d),c) | 3.5% | 2.9% | 25% | |

Marginal calibration prior | $3\lambda e\u22123\lambda xe\u221212\lambda \u2212e\u221218\lambda $ | $16\u22124$ | $16\u22124$ |

Multiplicative | Conditional | Restricted conditional | ||
---|---|---|---|---|

Prior “correction term” | ((a,b),(c,d)) $TCD<TAB$ | – | $3\lambda e\u22123\lambda h2$ | $12e\u22123\lambda h2(1\u2212e\u2212\lambda h2)$ |

((a,b),(c,d)) $TCD\u2265TAB$ | ||||

(((ab),c),d) | – | $3\lambda e\u22123\lambda h3$ | $4\lambda e\u22124\lambda h3$ | |

(((a,b),d),c) | ||||

Marginal Topology probability | ((a,b),(c,d)) | 93% | 94.2% | 50% |

(((a,b),c),d) | 3.5% | 2.9% | 25% | |

(((a,b),d),c) | 3.5% | 2.9% | 25% | |

Marginal calibration prior | $3\lambda e\u22123\lambda xe\u221212\lambda \u2212e\u221218\lambda $ | $16\u22124$ | $16\u22124$ |

The prior is a pure birth process with a birth rate of $\lambda =12$ and a uniform calibration density between four and six is applied to the clade (a,b). The uncorrected (multiplicative) prior is $16\u221244!4\lambda 3e\u2212\lambda (2h1+h2+h3)$, and the table gives the conditional prior “correction terms” for each ranked topology, together with the induced prior probability of each unranked topology and the marginal density for the calibrated clade.

The table lists the “correction term” for each ranked topology, the marginal probability for each unranked topology, and the calibration marginal. As expected, the full and restricted conditional preserve the calibration density, whereas the marginal for the multiplicative prior is equal to the conditional marginal ($3\lambda e\u22123\lambda x$), bounded between 4 and 6. The marginal topology probability illustrates the difference between the full and restricted priors. The former is similar to the multiplicative prior, with a high probability on the balanced tree. In the space of Yule trees with birth rate $1/2$ and one internal node age between 4 and 6, the other age is far more likely to be smaller than the first. The latter, with equal weight for the two classes, matches the probabilities of the Yule prior without calibration.

*Three Calibrations for* Bombina

A recent study using 13 complete genomes investigated the phylogenetic relationships of the fire-bellied toads of the genus *Bombina* (Pabijan et al. 2013). The study contains several types of analysis, and Table 2 of the article lists the sources of the fossil dating used to calibrate the major mitochondrial lineages here. The authors kindly provided us one of the BEAST analyses which uses three nested calibration points, on five taxa, seven taxa, and the root. The results of running the Markov Chain Monte Carlo (MCMC) chain on the multiplicative prior by itself are shown in Figure 5a.

While the marginal for the two clades deviates only slightly from the specified calibration, the marginal for the root, with mean around 50, is much lower than the normal density calibration $N(\mu =125,\sigma =38)$. The marginals for the analysis using the conditional prior match the calibration densities as expected (Fig. 5b). We also reran the original analysis and a modified version with the conditional prior instead of the multiplicative prior, and the summary trees for the two runs are plotted side by side in Figure 6. Excluding the root, the two analyses produce almost identical divergence times. Becuase the root age dates the divergence of the Discoglossus outgroup, which is incidental in this study, the prior mismatch had no significant effect in this case.

Usually the reason for such a close match is hard to see, as the interaction between the prior and the posterior is too complex, but here it seems pretty clear. The analyses used the uncorrelated relaxed clock model (Drummond et al. 2006), and a younger or older root can be “accommodated” by decreasing or increasing the rates on the branches diverging from the root, while maintaining a similar genetic distance. Indeed, the average rate for the longer outgroup branch in the original analysis is 0.044, while the value for the conditional-prior analysis was 0.024. Because the two non-root calibration densities were almost identical, the other divergence times are practically identical.

*Two Fossil Calibrations for* Sparagnium

Another recent study used two nuclear genes and two chloroplast genes to investigate the systematics, biogeography, and character evolution of *Sparganium*, a group of aquatic monocots (Sulman et al. 2013). For the divergence dating analysis the authors used a concatenated (supermatrix) approach with two fossil calibrations as detailed in the *Calibration points for DNA* subsection of their paper. We sample both the multiplicative and conditional calibration priors for this data set. The calibration densities and the induced marginal priors are shown in Figure 7. Under the multiplicative prior, the internal calibration on 27 taxa has a slight preference for older ages whereas the root has a preference for younger ones, compared with the calibration densities. Note that the densities for the two nested clades overlap in the range 90–105 Mya, and so even the conditional prior cannot guarantee the exact marginal for both calibration densities, but the match seems good by visual inspection (Fig. 7).

We then performed both the original analysis with the multiplicative prior and a modified version with the conditional prior. The summary trees for the two runs are plotted side by side in Figure 8. In this case, the two calibrated (top) nodes have the same estimated divergence time in both analyses, but the non-calibrated nodes in the conditional prior analysis are around 23% younger than their counterparts in the multiplicative prior tree. We can verify that individual divergence times from the two analyses are significantly different by computing the standard error for each estimate, obtained by dividing the standard deviation of the posterior samples by the square root of the effective sample size. For example, the divergence time of the *Puya–Brocchinia* in the conditional prior analysis is $25.9\xb10.11$ versus $33.8\xb10.083$ in the multiplicative prior analysis. Similarly, the *Typha* root is $5.40\xb10.036$ versus $7.06\xb10.037$, and the *Sparganium* root is $9.75\xb10.051$ versus $12.62\xb10.060$.

## Conclusions

We have presented a general approach to specifying a birth–death process tree prior conditional on the heights of a set of calibrated nodes, in the context of the joint inference of topology and divergence times. We have described a few special cases where this prior density has a closed form solution and we have described a general, though computationally intensive, approach to numerical calculation of this conditional density for any number of calibrated nodes. As a result, an arbitrary marginal prior distribution can be precisely specified on the calibrated nodes.

We have also described how the conditional birth–death tree prior naturally induces a nonuniform distribution over ranked topologies. If this effect is unwanted, our approach can be modified to produce a uniform prior on ranked topologies (therefore permitting any arbitrary distribution on ranked tree topologies to be composed with the conditional birth–death prior on divergence times). This modification also renders a computationally efficient algorithm for calculation of the prior density.

To compute the conditional birth–death prior, it is necessary to compute the marginal density of the calibrated node heights averaged over all consistent time-trees. Although we have described some special cases where this marginal prior density of the calibrated nodes can be efficiently computed, it remains to be determined whether other cases have analytical closed-form solutions.

Our implementation is available in BEAST2 (Bouckaert et al. 2014). We regard the full conditional formulation as the correct approach, if one assumes that the birth–death process prior is the appropriate prior for the phylogenetic time-tree under estimation. We therefore recommend the full conditional formulation when computationally feasible (e.g., 2–3 calibrations and/or small numbers of taxa). The restricted formulation effectively removes influence of the birth–death prior on the estimation of the ranked topology and is a good alternative for analyses with larger number of calibrations or taxa for which computational considerations will preclude application of the full conditional. Both of these approaches relieve the practitioner from running their calibrated analysis in the absence of data to determine the resulting marginal distributions (e.g., compare Fig. 5a,b).

We examined and reran two recent BEAST analyses which used a calibrated prior. In both cases the induced prior calibration densities did not match the intended calibrations under the multiplicative prior, but did match as expected with the conditional prior. However, the relevant posterior estimates were not affected in the first case (*Bombina*), but were significantly different for the second (*Sparganium*). Hence, while sometimes the estimates are not affected by the change of prior, the two priors can give significantly different results for the same analysis settings and data.

It is clear that development of calibrated tree priors for Bayesian phylogenetic inference remains an area ripe for future development. Obvious next steps would include taking explicit account of the different sources of uncertainty in fossil ages and collection (uncertainty in geological dates, variation in fossil preservation rates, and paleontological discovery effort) and more sophisticated means of dealing with the phylogenetic placement of fossil information (uncertain placement of fossils based on morphological characters of fossils and/or tree prior assumptions). All of these factors are currently subsumed into whatever marginal distribution is specified on the set of calibrated nodes. In the mean time, the work presented here derives new results for multiple-calibration tree priors and in doing so illustrates some of the subtle choices open to the practitioner when calibrating birth–death tree priors.

## Funding

J.H. and A.J.D. were supported by a Rutherford Discovery Fellowship from the Royal Society of New Zealand awarded [to A.J.D].

## Acknowledgments

The authors wish to thank Mike Steel for helpful discussion and comments.

### Appendix

#### Two Nested Clades for the Yule Prior

$Rnk$ is the number of ranked ways $n$ lineages can coalesce to $k$

##### Root and Clade

For the marginal of a clade of $n$ taxa and the root in a $n+m$ taxa tree we partition $\Psi \phi $ so that $\Psi \phi k$ contain all topologies with $k+1$ surviving lineages at time $h$ (Fig. 4b). The size of each subset is

Plugging those counts into equation (19) we get

which simplifies to equation (33), because without the $(k+12)$, the sum is the binomial expansion of $(u+v)l\u22121$, and with the combinatorial identity$(x)n$ is the Pochhammer symbol, the falling factorial. Here $(k+12)=12(k)2+(k)1$.

##### Two nested clades

When the top clade is not the root we need to handle three levels. Let the number of surviving lineages at $h2$ be $m1$ and $l1$, and $l2$ at $h1$ (Fig. 4c). We partition $\Psi \phi $ according to $m1$, $l1$ and $l2$. that is topologies with the equal values are in the same class.

The number of internal nodes at the three levels is

This can be obtained either from summing over all $\Psi \phi m1,l1,l2$ terms or more simply by applying Equation (A.3) twice, because the internal clade $N$ does not interact with the free global lineages. Again pluggin those counts into equation (17) we get

The rest is tedious manipulations similar to those in the root and clade case above.

#### Integral Identity used in Obtaining the Yule Marginal

The last step used the well-known combinatorial identity (e.g., Sprugnoli (2006), p. 74)

#### The Birth–Death Prior Marginal

For convenience, let $z=xi1$ be the age of the last calibration point, and $c\u02c6=c1\u22121$, the number of lineages between the root and the last calibration point (excluding the root).

The following solution for the integral can be verified by taking the derivative of the right-hand side

Substituting in equation (A.12) and simplifying gives

The last step uses equation (A.5) for the second sum. Now, after canceling terms and simplifying we are left with

## References

*bombina*(anura: Bombinatoridae)