The Impact of the Rate Prior on Bayesian Estimation of Divergence Times with Multiple Loci

Bayesian methods provide a powerful way to estimate species divergence times by combining information from molecular sequences with information from the fossil record. With the explosive increase of genomic data, divergence time estimation increasingly uses data of multiple loci (genes or site partitions). Widely used computer programs to estimate divergence times use independent and identically distributed (i.i.d.) priors on the substitution rates for different loci. The i.i.d. prior is problematic. As the number of loci (L) increases, the prior variance of the average rate across all loci goes to zero at the rate 1/L. As a consequence, the rate prior dominates posterior time estimates when many loci are analyzed, and if the rate prior is misspecified, the estimated divergence times will converge to wrong values with very narrow credibility intervals. Here we develop a new prior on the locus rates based on the Dirichlet distribution that corrects the problematic behavior of the i.i.d. prior. We use computer simulation and real data analysis to highlight the differences between the old and new priors. For a dataset for six primate species, we show that with the old i.i.d. prior, if the prior rate is too high (or too low), the estimated divergence times are too young (or too old), outside the bounds imposed by the fossil calibrations. In contrast, with the new Dirichlet prior, posterior time estimates are insensitive to the rate prior and are compatible with the fossil calibrations. We re-analyzed a phylogenomic data set of 36 mammal species and show that using many fossil calibrations can alleviate the adverse impact of a misspecified rate prior to some extent. We recommend the use of the new Dirichlet prior in Bayesian divergence time estimation. [Bayesian inference, divergence time, relaxed clock, rate prior, partition analysis.]

Bayesian estimation of species divergence times from molecular sequence data is an unconventional statistical estimation problem. Molecular sequences provide information about the distances between species in a phylogeny, but not about the ages of clades or the molecular evolutionary rate, so that the model is not fully identifiable. Usually information from the fossil record is used to calibrate molecular trees and estimate clade ages (Thorne et al. 1998;Drummond et al. 2006;Yang and Rannala 2006;Lepage et al. 2007). Yang and Rannala (2006) and Rannala and Yang (2007) have shown that as the number of loci and the number of sites in molecular data increase, the uncertainty in posterior time estimates (measured by, e.g., the posterior variance) does not go to zero, but converges to a limiting value imposed by the uncertainty in the fossil calibrations. Therefore, although the uncertainty in time estimates cannot be eliminated, using many loci is desirable to reduce the posterior variance of time estimates as much as possible. With the growth of molecular sequence data, divergence time estimation will increasingly be conducted using multiple loci (or site partitions). Current Bayesian divergence time estimation programs such as MCMCtree (Yang 2007), BEAST , MrBayes (Ronquist et al. 2012), etc., allow different loci (site partitions) to have different overall rates, but use i.i.d. priors for the locus rates (the substitution rate per site valid for the locus). That is, the locus rates are assumed to be independent and identically distributed random variables from a common distribution such as the gamma or log-normal. However, this prior suffers from two major problems. First, with this prior, the prior uncertainty about the average rate over loci disappears when the number of loci increases. Suppose the locus rate, that is, the average substitution rate among the branches of the phylogeny at locus i, is i , for i = 1, ... ,L, where L is the number of loci. If the i are i.i.d. with mean m and variance v, the mean rate across all loci, = L i=1 i /L, will have mean m and variance v/L, so that as the number of loci goes to infinity (L →∞), the prior variance of goes to zero (v/L → 0). Thus the prior makes increasingly strong and possibly implausible statement about the average rate (). Second, the rate prior may exert an unexpectedly strong influence on the posterior time estimates. From the infinite-sites theory of Yang and Rannala (2006) and Rannala and Yang (2007), forcing the prior variance of to zero will cause the posterior estimates of divergence times to approach point values (with zero variance) with the increase in the amount of sequence data (the number of sites at each locus and the number of loci). If the prior on locus rates is misspecified, posterior time estimates will be affected as well. For example if the prior rate is too high (or too low), the estimated times will be too young (or too old).
Those problems are general and affect posterior time estimation under all three commonly used clock models: the strict clock, the independent-rates model 556 SYSTEMATIC BIOLOGY VOL. 63 FIGURE 1. Posterior estimates of the human-chimpanzee divergence time under the i.i.d. prior for locus rates as the number of loci (L) increases and when the rate prior is misspecified. Genes were sampled randomly (without replacement) from six primate genomes, and then analyzed with the program MCMCtree. Fossil-based calibrations are placed on all five nodes in the tree, including the constraint of between 5.7 and 10 Ma for the human-chimpanzee divergence (see Fig. 4). Three priors on locus rates were used: (1) A fast rate, i ∼ G(2,2) (diamonds); (2) A medium rate i ∼ G(2,20) (empty circles); and (3) A slow rate, i ∼ G(2,200) (black circles). These priors have means 1, 0.1, and 0.01 respectively, in substitutions per site per 10 8 years. When the prior rate is too fast, the estimated time becomes younger as L is increased. On the other hand, when the prior rate is too slow, the time becomes older with increased L. In both cases, the posterior times are outside the fossil bounds (dashed lines) when L = 300 loci are used. For the medium rate, the time also becomes younger. The data and fossil calibrations are from dos Reis and Yang (2013). The data set is analyzed later in this article, where full details of the analysis are given. Estimates for other node ages are given in Table 3. (Drummond et al. 2006;Rannala and Yang 2007), and the correlated-rates model (Thorne et al. 1998;Rannala and Yang 2007), as long as we use multiple loci and independent priors on locus rates (see also Huelsenbeck et al. 2000;Lepage et al. 2007;Heath et al. 2012 for more clock models). Figure 1 illustrates the problems, using the divergence between the human and chimpanzee as an example. The fossil record indicates that the human and chimpanzee diverged around 5.7 to 10 Ma (Benton et al. 2009) and molecular studies indicate a mean rate at the 3rd codon positions of protein-coding genes of around 10 −9 site −1 year −1 (e.g., dos Reis and Yang 2013). When we use 300 loci (3rd codon positions in 300 genes) and a prior mean rate that is too high (10 −8 site −1 year −1 ), the estimated divergence time is too young, at 3.3 Ma (with 95% credibility interval: 2.9-3.6), whereas for a prior mean rate that is too low (10 −10 site −1 year −1 ) the estimated time is too old, at 26.4 Ma (24.7-28.2). In both cases the estimated times are outside the fossil calibration bounds. In contrast to a typical Bayesian analysis, in which the impact of the prior becomes less important when more data are available, here the prior becomes more influential when more data (more loci) are analyzed.
In this article we implement a new prior on rates for loci that is robust to rate prior misspecification and that does not produce overly precise time estimates with many loci. We use computer simulation and real data analysis to study the different effects of the old and new rate priors on divergence time estimation.

THEORY
The i.i.d. Prior on Rates for Loci Here we review the i.i.d. prior on rates for loci implemented in current Bayesian molecular clock dating programs, to introduce the notation and to illustrate the problems of the i.i.d. prior. Let the mean rate for locus i be i , with i = 1,2,...,L. Under the global clock model, i is the rate for all branches at the locus. Under the independent-rates model (Drummond et al. 2006;Rannala and Yang 2007), i is the mean of the common distribution for all branch rates; for example, the rates for branches in the tree at locus i may be i.i.d. variables from the log-normal or gamma distribution with mean i . Under the correlated-rates model (Thorne et al. 1998;Kishino et al. 2001;Rannala and Yang 2007), i is the rate at the root of the tree at the locus, from which rates for other nodes or branches evolve according to a stochastic process such as the geometric Brownian motion.
The posterior distribution of times (t), branch rates (r), and locus rates ={ 1 ,..., L }, given the molecular data (D), is where f (t) is the prior on times, f () is the prior on the L locus rates, f (r|t,) is the prior on branch rates, and f (D|t,r,) is the likelihood. The branch rates, r, are among loci and branches of the phylogeny. For example, if we analyze a phylogeny with 10 branches using 20 loci, we estimate 20 locus rates () and 200 branch rates. If we assume that the rates among loci are independent, the prior on the locus rates is Multiplication of the independent densities together as in equation (2) leads to the problem that the prior (and thus the posterior) variance of the mean rate across loci goes to zero as L goes to infinity. Because times and rates are confounded (i.e., the likelihood function depends only on the product of times and rates), the informative rate prior has an undue and undesirable impact on the posterior distribution. Therefore, we propose a new prior on rates for loci.

2014
DOS REIS ET AL.-PRIOR IMPACT ON TIME ESTIMATION

557
A New Prior on Rates for Loci We implement a new prior on substitution rates for loci based on the Dirichlet distribution. This is similar to the Dirichlet prior of mutation rates among loci of Burgess and Yang (2008) for estimation of ancestral population sizes and to the compound-Dirichlet prior for branch lengths of Rannala et al. (2012, see also Zhang et al. 2012 for Bayesian phylogenetics. Our new prior is for the mean rates for the L loci, ={ 1 ,..., L }, and affects the implementations under all three clock models: the global clock, the independent-rates model, and the correlated-rates model.
We first assign a gamma prior on the mean rate = This has mean / and variance / 2 . A small , such as 1 or 2, means that the prior will be fairly diffuse about the mean rate over loci (). Next we partition the total rate L among the L loci using a Dirichlet distribution. In other words, the proportions y i = i /(L), i = 1,2,...,L-1, have a symmetrical Dirichlet distribution with concentration parameter , with density f (y 1 ,y 2 ,...,y L- where y L = 1-y 1 -y 2 -... -y L-1 . A smaller means greater variation in rates among loci. If = 1, the distribution is called uniform Dirichlet, which is a multivariate generalization to the U(0, 1) distribution. By applying a variable transform (y 1 ,y 2 ,...,y L-1 ,) ↔ ( 1 ,..., L−1 , L ), we obtain the joint distribution of the rates for the L loci as Note that for the special case = /L, equation 5 reduces to the joint density of L independent gamma variables: i ∼ G( /L, /L). The marginal mean and marginal variance of i are The marginal prior density f ( i ) implied by the new Dirichlet prior for locus rates. Given, y i = i /(L) ∼ Beta(,(L−1)). We simulated 10 7 values of and y i , calculated i = Ly i and plotted the kernel density estimate.
and the correlation between any pair i and j is .
(8) Figure 2 shows the marginal density f ( i ) implied by the new Dirichlet prior. If the parameters ( , ,) are fixed and the number of loci (L) increases, the marginal density of i becomes more diffuse, with a longer tail (and a larger variance).
In the relaxed-clock models, parameter 2 i specify how variable the rate is among branches or how seriously violated the molecular clock is at locus i (e.g., Rannala and Yang 2007). For example, 2 i may be the variance of the log-rate in the log-normal distribution in the independent-rates model. In current Bayesian-dating programs, i.i.d. priors have been assigned to the variance parameters among loci, for example, 2 i ∼ G(,). We also implement the Dirichlet prior for the locus-specific 2 i . Our preliminary tests suggest that the prior on 2 i does not have such a dramatic impact on posterior time estimates as the prior on locus rates ( i ). The new prior for i and 2 i has been implemented in the program MCMCtree in the PAML package (Yang 2007) version 4.8. Our modification here affects only the calculation of the priors for i and 2 i and the proposal steps to modify those parameters in the MCMC algorithm remain largely unchanged.

The Case of Finite Number of Loci and Infinite Number of Sites
We analyze the simple case of estimating the divergence time between two species under the strict clock, to examine the effects of the old i.i.d. prior and the new Dirichlet prior. First we consider data in which the number of sites at each locus is N =∞ but the number of loci is finite. Because each locus is infinitely long, the molecular distances, d i = 2t i , are known without error. This case is analyzed using the infinite-sites theory of Yang and Rannala (2006), which examines the asymptotic dynamics of posterior time estimates when the amount of sequence data goes to infinity (this should not be confused with the infinitesites model in population genetics; see Kimura 1969Kimura , 1983. The posterior distribution of the time given the distances, d = (d i ), under the i.i.d. rate prior is given by the infinite-sites theory (Yang and Rannala 2006, eq. 21) extended to L loci: Consider the true time to be t = 1. If one time unit is 100 Myr, the true age will be 100 Ma. Suppose we sample 100 locus rates from a gamma distribution i ∼ G(2,4), with mean E( i ) = 0.5, corresponding to 0.5 substitutions per site per 100 Myr, and set d i = 2 i t. The infinite-sites data at the 100 loci are then represented by the 100 d i variables. These are analyzed using three locus rate priors: G(2,40), with mean 0.05 (slow); G(2,4), with mean 0.5 (good); and G(2,0.4), with mean 5 (fast). The time prior is t ∼ G(100,100), with mean 1, corresponding to a fossil calibration of 81-121 Ma (95% interval) for a true age of 100 Ma. The posterior distributions of time t (given by equation 9) under the three locus-rate priors are shown in Figure 3a. First, when the prior rate is good, the posterior density for time t is narrower than the prior density and located around the true time t = 1. Second, when the prior rate is ten times too fast, the posterior time density is very narrow and the posterior mean time is too young at 0.17. Finally, when the prior rate is ten times too slow, the posterior time density is wide and the posterior mean time is too old, at 3.97. Note that the posterior standard deviation of the time is proportional to the posterior mean time, or in other words the coefficient of variation (the standard deviation over the mean) is constant in the three cases, as predicted by the infinite-sites theory (Yang and Rannala 2006).
With the new Dirichlet prior, the posterior of time t given the distances is (10) where f is now given by equation (5). Consider the example above, except that this time the infinite data is analyzed with the new Dirichlet prior, and with three priors on : G(2,40), G(2,4) and G(2,0.4). The posterior distribution of t under the three locus-rate priors are shown in Figure 3b. In this case, the prior distribution of t, and the posterior distribution of t for the good and fast priors are all nearly identical, and centered around the true time t = 1. For the slow prior, the posterior distribution of t is shifted to the right (old ages) and centered around 1.16. Overall the posterior is much more robust to prior misspecification when the new Dirichlet prior is used than when the old i.i.d. prior is used.

The Case of Finite Number of Loci and
Finite Number of Sites Next, we consider the case of finite data, with N = 1000 sites at each locus. Data are simulated and analyzed using the JC69 substitution model (Jukes and Cantor 1969). The true time is t = 1, and the true mean rate across loci is = 0.5. Suppose one time unit is 100 Myr. Then the true age of divergence is 100 Ma, and the true rate is 5×10 −9 site −1 year −1 . We simulate L = 1,2,10, and 100 loci, with 100 replicate data sets for each L. For locus i, we sample two rates for the two branches of the tree (r i,1 and r i,2 ) from the log-normal distribution: r ∼ LN(log i − 2 /2, 2 ) with 2 = 0.1. The molecular distance between the two species for locus i is d i = (r i,1 +r i,2 )t. The program EVOLVER is then used to simulate the sequence alignments for each locus according to the value of d i .
The program MCMCtree is used to estimate the divergence time using the simulated alignments. The priors are the same as in the analysis of the inifinite-sites data, with t ∼ G(100,100) (with 95% interval 0.81-1.21). Using the old i.i.d. prior, we analyze each simulated data set three times, using three different locus rate priors: (2,40). With the new Dirichlet prior, we analyze each data set three times, using three priors on the mean locus rate: ∼ G(2,0.4), ∼ G(2,4), and ∼ G(2,40), with = 1. For each one of the six analyses, we calculate, the average of i over the L loci using the MCMC sample. We construct the posterior means and the 95% credibility interval (CI) of and t. Tables 1 and 2 show the results of the simulation experiment. With the old i.i.d. prior on locus rates, the posterior estimate of time t is sensitive to the rate prior and the number of loci. As L becomes larger, t becomes too young if the prior rate is too fast, or too old if the prior rate is too slow (Table 1). Also, as L increases, the posterior of the mean rate () converges to wrong values.  posterior mean for t is too old, at 3.765 (3.502, 4.041), and the posterior is too slow, at 0.119 (0.111, 0.128). In conclusion, for large L the posterior estimates of t and are too sensitive to rate prior misspecification. The situation is quite different for the new Dirichlet prior. The posterior estimate of t is rather insensitive to L and to the rate prior (Table 2). Furthermore, the posterior of the mean rate is close to the true mean rate (0.5), even when the prior rate is either too fast or too slow (Table 2). For example, with L = 100, the 560 SYSTEMATIC BIOLOGY VOL. 63 The phylogeny of six primate species with five fossil calibrations. The fossil bounds are soft, with 1% and 5% probabilities that minimum and maximum bounds are violated, respectively (Yang and Rannala 2006). The rationale for these fossil calibrations is given in Benton et al. (2009)

ANALYSIS OF A SIX-SPECIES PRIMATE PHYLOGENY
We use both the old i.i.d. prior and the new Dirichlet prior for locus rates to estimate the divergence times on the six-species primate phylogeny studied by dos Reis and Yang (2013). The phylogeny with fossil calibrations is given in Figure 4. We use soft-bound fossil calibrations constructed following Benton et al. (2009) and dos Reis et al. (2012). The data are a subset of the large alignment analyzed by dos Reis et al. (2012), with 9992 protein-coding genes after ambiguous codons or alignment gaps were removed. We used the third codon positions only, and sampled loci with N ≥ 200 codons (7947 genes) randomly without replacement, to generate data sets of L = 1,2,10, and 300 loci. We generated 100 replicates for each L. Divergence times were then estimated using MCMCtree. The birth-death process parameters are = BD = 1, = 0. The time unit is 100 Myr. We use the independent-rates model and calculate the likelihood exactly under the HKY+G 5 substitution model (Hasegawa et al. 1985;Yang 1994). For the old i.i.d. rate prior, three different priors for the locus rate are used: i ∼ G(2,2), i ∼ G (2,20), and i ∼ G(2,200), which have means 1, 0.1, and 0.01 corresponding to 10 −8 , 10 −9 , and 10 −10 substitutions per site per year. The first prior rate is too fast and the last too slow. For the new Dirichlet prior implementation, three priors on the mean locus rate are used, ∼ G(2,2), ∼ G(2,20), and ∼ G(2,200), with = 1.
The estimated divergence times using the old i.i.d. prior are shown in Table 3. The posterior time estimates are sensitive to the rate prior, in particular for the large number of loci, L = 300. Furthermore, the posterior rate estimates vary for the three rate priors. For example, with L = 300 and the fast rate prior, i ∼ G (2,2), the posterior mean of t 7 (the age of crown Anthropoids) is 32.9 Ma (29.5, 35.5 Ma). These ages are too young and the posterior mean is very close to the 33.7 Ma minimum fossil bound applied to this node (Fig. 4). In fact, all posterior time estimates are too young and the minimum fossil bounds are violated for all nodes except node 7 ( Fig. 4 and Table 3). Furthermore, the posterior mean rate is = 1.99×10 −9 site −1 year −1 (1.84, 2.22), about twice the ∼ 10 −9 site −1 year −1 rate generally accepted for third codon positions in Primates. In contrast, with L = 300 and the slow-rate prior, i ∼ G (2,200), the posterior mean of t 7 is 308.8 Ma (292.4, 325.8 Ma), much older than the 90 Ma maximum bound applied to this node (Fig. 4) and much older than the oldest mammal fossil ever found. Similarly, the ages for all other nodes are much older than their corresponding maximum fossil bounds. The mean rate is = 0.2×10 −9 , about five times less than the accepted rate of ∼ 10 −9 .
The estimated times using the new Dirichlet prior are shown in Table 4. In this case the posterior time estimates are rather insensitive to the rate prior, and the posterior of the average rate () for all cases are very similar. For L = 300 loci, the posterior of the mean rate is 0.96×10 −9 substitutions per site per year for all three rate priors. The new rate prior is clearly much better than the old one.

DIVERGENCE TIME OF MAMMALS
dos Reis et al. (2012) (see also dos Reis et al. 2014) estimated the divergence times of mammals using a data set of 36 mammal genomes (see Fig. 5 for the phylogeny and fossil calibrations). The data consists of a large alignment (~21 million base pairs) of the 1st and 2nd codon positions sampled from~14,000 genes, and divided into 20 partitions. They used 26 fossil calibrations and a diffuse prior on the mean rate per partition, i ∼ G(1,1), with a time unit of 100 Myr. This rate prior has a mean of 1, meaning 10 −8 substitutions per site per year. The parameters in the prior were chosen to give a large variance (and thus a diffuse prior), but the mean rate was too high: note that the estimated rate for even the third codon positions of those primate genes was only 0.96×10 −9 sites −1 year −1 (Table 4). dos Reis et al. (2012) obtained very precise divergence time estimates (Table 5) that pointed to a diversification of modern placental mammals after the Cretaceous-Paleogene extinction event 66 Ma. Concerned that the i.i.d. rate prior with a high mean on 20 loci (partitions) may have had an undue influence on posterior time estimates, we repeat the analysis here using the new Dirichlet prior. Other aspects of the analysis are identical to those of dos Reis et al. (2012). The likelihood is calculated approximately (dos Reis and Yang 2011). We use two priors, ∼ G(1,1) and ∼ G(2,40), both with = 1. The first prior has a mean rate that is too high (10 −8 site −1 year −1 ) and the second a mean rate that is  Notes: Three priors for locus rates are used. L is the number of loci sampled from genomic data of protein-coding genes (with only 3rd codon positions used). The results for L = 0 correspond to the prior. The mean rate () is calculated by averaging locus rates over loci from the MCMC samples. The results are averages of 100 replicates.   Table 3. 20 times smaller (0.05×10 −8 ), and appears to be more reasonable. Table 5 shows the results. Time estimates are relatively insensitive to the rate prior, and the results using the new Dirichlet prior are very similar to those using the old i.i.d. prior (Fig. 6a). Surprisingly, time estimates under the new prior tend to be more precise than those under the old i.i.d. prior (Fig. 6b). This trend is opposite to what we expected. We speculate that the large number of fossil calibrations on the mammal tree may have alleviated the impact of the misspecified rate prior on the posterior distribution of times. The timetree is shown in Figure 5. In accordance with previous results (dos Reis et al. 2012Reis et al. , 2014 we find that Placentalia originated in the Cretaceous before the K-Pg extinction 66 Ma, but the majority of crown placental orders originated in the Paleogene after the extinction (see also Meredith et al. 2011). 562 SYSTEMATIC BIOLOGY VOL. 63 FIGURE 5. Timetree of mammals. Posterior divergence times were estimated using the new Dirichlet prior with ∼ G(2,40) and = 1. This prior has mean rate 0.05×10 −8 substitutions/site/year. Gray bars are 95% CIs. Nodes are numbered as in Table 5. The fossil calibrations are explained in detail in dos Reis et al. (2012). We summarize them here (in Ma, clade names refer to the crown groups): (37) Mammalia,min. 162.9,max. 191.1. (38) Theria,min. 124,max. 171.2. (39) min. 11.2,max. 33.7. (70) Homininae,min. 7.25. (71) Hominini,min. 5.7,max. 10.0. All bounds are soft with 0.1% and 2.5% probabilities for left (min.) and right (max.) tails, respectively (Yang and Rannala 2006).

DISCUSSION
In Bayesian dating analysis, specification of the prior on divergence times is well recognized to be a complicated process, especially as the time prior incorporates fossil calibrations. As a result, much attention has been paid to the construction of the time prior (Kishino et al. 2001;Yang and Rannala 2006;Inoue et al. 2010;Heled and Drummond 2012). In contrast, less attention has been paid to the rate prior, perhaps because specification of the rate prior seems straightforward and the i.i.d. prior used in current computer programs appears to be quite innocent. However, times and rates are confounded parameters in the likelihood function, and as a result of the lack of identifiability, the priors for both sets of parameters will remain important even if an infinite amount of sequence data is available. This is quite unlike conventional Baysian inference, where priors become unimportant as more and more data is  174.5, 191.8) 185.7 (174.8, 191.9) 185.8 (174.4, 192. Reis et al. (2012). Note: Node numbers refer to those of Figure 5.
analyzed. Seemingly diffuse priors on the locus rates such as an exponential distribution with a large variance can have an unexpectedly strong effect on posterior time estimates. In this regard improper priors on the rates, available in some dating programs, may be the worst and should not be used. The i.i.d. prior on locus rates makes an increasingly strong statement about the average locus rate with the increase of the number of loci, leading to very precise and over-confident posterior estimates when a large number of loci is included in the data. If the rate prior is unreasonable, the time estimates will be wrong with very narrow intervals. Although large uncertainty in posterior time estimates may not be desirable, the reduced uncertainty in time estimates caused by the i.i.d. prior is misleading, as the uncertainties associated with the fossil calibrations should remain even with many loci. Furthermore, in real data analysis it is impossible to predict the true rate, and the rate prior will always be misspecified to some extent, so that default priors that do not have an undue influence on the posterior may have a merit. Our study of the infinite-sites case as well as analysis of simulated and real data suggest that the new Dirichlet prior may circumvent both problems of false precision and undue prior influence associated with the current i.i.d. prior.
The extremely strong negative correlation between times and rates suggests that ideally one should specify the prior for times and rates jointly. However, specification of such a joint prior appears extremely difficult. Indeed our knowledge of the absolute rate appears to depend critically on our assumptions about the absolute times or interpretations of the fossil record. For example, despite the fact that the sequencing of the human and chimpanzee genomes has led to extremely precise estimates of the sequence divergence between the two species (1.3%; see, e.g., Burgess and Yang 2008), resolving this distance into absolute time and rate remains elusive . In this article, we have the less ambitious goal of constructing a rate prior that does not have a great impact on the posterior time estimates. Users of dating programs other than MCMCtree v4.8 (which now implements the new Dirichlet rate prior) can use the following approach to construct an i.i.d. locus-rate prior that appears robust to rate prior misspecification and that avoids a decrease of prior uncertainty with the increase of the number of loci or site partitions (L). First, note that when = /L, equation (5) reduces to the density of L independent gamma variables, with i ∼ G( /L, /L). Then the i have mean m = / and variance v = L / 2 . Therefore, one may specify a gamma prior on i with shape parameter /L and rate parameter /L (or scale parameter s = L/ ). Even though this i.i.d. prior does not have the flexibility of the Dirichlet prior implemented in this article (for example, in equation 5 is always fixed at /L in the i.i.d. prior) and its specification depends on the number of loci in the data set being analyzed, our preliminary test suggests that it may produce similar time estimates to the Dirichlet prior when the number of loci is not very large.
Finally, we note that there are other sources of errors or uncertainties involved in divergence time estimation that are not dealt with in this study. Foremost is the difficulties with the interpretation of the fossil record to formulate calibrations in a molecular clock dating analysis. For example, one never really knows the difference between the age of a fossil and the age of the node that is being calibrated by the fossil, and even the placement of the fossil on the phylogeny may also be uncertain. In this article, we do not explicitly deal with such factors that affect the quality of fossil calibrations but assume that the fossil bounds or calibration densities adequately summarize the information and uncertainties in the fossil record. Furthermore, gene genealogies at individual loci or genomic regions may differ from the species tree due to polymorphism in ancestral species, and the coalescent times of lineages within a locus may be older than the time of divergence between species (Burgess and Yang 2008). This source of uncertainty may not be important when ancient divergences are studied, but it can be considerable when divergence times between closely related species such as human and the chimpanzee are estimated. For example, at about 70% of loci, human and chimpanzee are more closely related to each other than each is to gorilla (Burgess and Yang 2008;), but at the remaining loci the (true) gene tree differs from the species tree. This source of uncertainty is ignored in our study here.