Relative efficiencies of simple and complex substitution models in estimating divergence times in phylogenomics.

The conventional wisdom in molecular evolution is to apply parameter-rich models of nucleotide and amino acid substitutions for estimating divergence times. However, the actual extent of the difference between time estimates produced by highly complex models compared to those from simple models is yet to be quantified for contemporary datasets that frequently contain sequences from many species and genes. In a reanalysis of many large multispecies alignments from diverse groups of taxa using the same tree topologies and calibrations, we found that the use of the simplest models can produce divergence time estimates and credibility intervals similar to those obtained from the complex models applied in the original studies. This result is surprising because the use of simple models underestimates sequence divergence for all the datasets analyzed. We find three fundamental reasons for the observed robustness of time estimates to model complexity in many practical datasets. First, the estimates of branch lengths and node-to-tip distances under the simplest model show an approximately linear relationship with those produced by using the most complex models applied, especially for datasets with many sequences. Second, relaxed clock methods automatically adjust rates on branches that experience considerable underestimation of sequence divergences, resulting in time estimates that are similar to those from complex models. And, third, the inclusion of even a few good calibrations in an analysis can reduce the difference in time estimates from simple and complex models. The robustness of time estimates to model complexity in these empirical data analyses is encouraging, because all phylogenomics studies use statistical models that are oversimplified descriptions of actual evolutionary substitution processes.


Introduction
Models of nucleotide and amino acid substitution are of fundamental importance in molecular phylogenetic analyses (Nei and Kumar 2000;Yang 2006;Arenas 2015). Many sophisticated substitution models have been developed, and the complexity of models developed for use in phylogenomic studies continues to increase (Arenas 2015;Abadi et al. 2019). Indeed, complex models can provide a more complete description of nucleotide and amino acid substitution processes that involve transition/transversion rate differences, biased base compositions, inequality of evolutionary rates among sites, and substitution pattern heterogeneity among genomic regions and sequence partitions (Sumner et al. 2012;Arenas 2015). Because the difference between the estimated and actual numbers of substitutions grows quickly and nonlinearly over time (Nei and Kumar 2000;Yang 2006), researchers often select the most complex model available to improve the accuracy of divergence time estimates (Arbogast et al. 2002;Sumner et al. 2012;Arenas 2015;Abadi et al. 2019).
Do the time estimates from the most complex models differ significantly from those produced using a relatively simple model when analyzing large data sets that contain many species, genes, and calibrations? The answer to this question is of high practical significance. If it is affirmative, time estimates may be vulnerable to both incorrect model specification and the overall limitations of current substitution models. On the other hand, if time estimates are generally similar for simple and complex substitution models, then currently available models could be deemed sufficient for molecular dating, because even the most complex model is a simplification of actual evolutionary processes that are much more complex due to differences in regional mutation patterns and spatial and temporal selective pressures (Arenas 2015;Abadi et al. 2019).
Although no studies have directly examined the impact of model complexity on time estimation in phylogenomic investigations, there have been reports that simple substitution models often perform similarly to or only slightly worse than complex substitution models in some types of phylogenetic inferences (Tamura et al. 2004;Yoshida and Nei 2016;Spielman and Kosakovsky Pond 2018;Abadi et al. 2019;Dornburg et al. 2019;Spielman 2019). However, in molecular dating, it is intuitively assumed that underestimation of sequence divergences caused by the use of simple models will result in significantly distorted time estimates. Here, we used diverse large data sets to test the conventional wisdom that the use of simple models will result in poor estimates of divergence times (table 1). To detect potential benefits offered by the use of complex models, we compared Bayesian estimates of divergence times obtained when using extremely simple models with those obtained when using complex models that contained many biological attributes and parameters (table 1).
We focus on models of the general time-reversible (GTR) class because they are employed in all current empirical dating analyses. Although many new substitution models have been proposed to relax the assumptions of stationarity, reversibility, and homogeneity of base substitution patterns, as well as to incorporate complex structural constraints and epistatic fitness landscapes, these advanced models are not yet available in popular phylogenetic software packages due to their implicit complexity and onerous computational requirements (Jayaswal et al. 2014;Arenas 2015;Arenas et al. 2015;Usmanova et al. 2015). Therefore, in our reanalysis, we used the models selected as the best-fit models in the source phylogenomic studies as "complex models" (table 1). The "simple models" used were the Jukes-Cantor (JC) model for nucleotide substitutions (Jukes and Cantor 1969) and the Poisson model for amino acid substitutions (Nei and Kumar 2000). Both assume that all substitution types are equally likely at a given site, an assumption that is always violated in reality. To ensure the most powerful contrast, we applied these simplest models without partitioning the sequence alignment by genes, genomic features (e.g., codon positions), or sets of positions with similar substitution patterns (see Materials and Methods).
We also used the same tree topology as in the source publications in our comparisons, because it is a common practice in phylogenomic studies to use the phylogeny reliably inferred by using a sophisticated tree-building method as a fixed topology for molecular dating (e.g., Li et al. 2019;Oliveros et al. 2019). In addition, the use of the same published tree topologies ensured consistent placement of calibrations and eliminated the confounding effects of using alternative phylogenies and calibrations.

A Plant Data Set Analysis
We first present results from a reanalysis of a large-sequence alignment containing 103 plant species (Morris et al. 2018) ("Plants," table 1). Pairwise sequence distances ranged from 0.01 to 3.18 (median ¼ 0.83) nucleotide substitutions per site ( fig. 1a). The GTR model with rate variability among sites (þC) was used in the original analyses and is presented here as the complex model (Morris et al. 2018). For comparison, we analyzed this data set using the JC model as our simple model. The difference in the maximum likelihood (ML) values for the GTR þ C and JC models was very large (DlnL ¼ 14757.9) and highly significant (P < 10 À16 ). Conventionally, the JC model would be a poor choice for molecular dating analysis. Indeed, the use of the JC model led to the underestimation of pairwise evolutionary distances by as much as 73%, and there was severe substitution saturation resulting in a classic curvilinear trend ( fig. 1a, the gray area).
Surprisingly, Bayesian estimates of divergence times obtained using the JC model were very similar to those obtained using the GTR þ C model when the same sequence alignment, topology, and calibrations were used ( fig. 1b). This trend was also observed for divergence time estimates obtained via substitution models of intermediate complexity ( fig. 1c). The linear regression slope between time estimates under the JC and GTR þ C models was 0.97, with low dispersion (R 2 ¼ 0.99). Although time estimates generated using the GTR þ C and JC models showed high overall similarity, a few node times showed local discrepancies (e.g.,  Tao et al. . doi:10.1093/molbev/msaa049 MBE arrows in fig. 1b). However, these node times generated using the JC model fell within the 95% credibility intervals (CrIs) obtained using the GTR þ C model ( fig. 1g). CrIs from the JC and GTR þ C models overlapped for every node ( fig. 1g), suggesting that estimates of divergence times and CrIs from the simplest model will be as useful as those obtained via complex substitution models in downstream biological analyses and hypothesis testing.
We hypothesized that the inclusion of 37 calibration points, and their associated probability densities, in the "Plants" data set constrained the node time estimates and eliminated the bias anticipated to be caused by the use of the simple JC model. So, we compared times obtained using the GTR þ C and JC models after eliminating all the internal calibrations but retaining the original root calibration. The linear pattern persisted (slope ¼ 0.95, R 2 ¼ 0.99, fig. 1d). We then tested the possibility that a well-constrained root calibration caused the observed linear relationship of dates from simple and complex models, even though the root calibration was expected to only dictate the overall time span, rather than the patterns of individual node times. So, we reanalyzed the phylogeny, making the probability density distribution of the root calibration diffused and adding a randomly selected internal calibration in the Bayesian analysis    fig. 1e). Therefore, dates from simple and complex models show good linear relationships in Bayesian analyses with even a few calibrations.
To eliminate the effect of calibrations in mediating the similarity of times obtained using simple and complex models, we estimated divergence times using the RelTime method in which no calibrations and no branch rate model are required. For this data set, divergence times under the JC and GTR þ C models were very similar ( fig. 1f), and the confidence intervals also showed broad overlap. This result suggested that the specification of a branch rate model and calibrations used in the Bayesian analysis for the data set analyzed are unlikely to explain the high similarity in divergence times between the JC and GTR þ C model. Overall, substitution model complexity appears to have a limited impact on divergence time estimates for the "Plants" data set.

The Complexity of the Substitution Model Has Limited Impact on Time Inference for Many Data Sets
We examined the similarity of times estimated via simple and complex models for many other data sets, which contained small or large numbers of species (43-274), had varying evolutionary time depths (102-4,511 My), evolved with slow or fast rates (0.21-2.28 substitution per site per billion years), or employed small or large numbers of calibration points (8-64) (table 1). For all these data sets, the use of simple models underestimated pairwise evolutionary distances and showed curvilinear relationships with distances estimated using complex models ( fig. 2). The curvilinear relationship was particularly dramatic for more ancient divergences due to substitutional saturation ( fig. 2a).
Despite the curvilinear relationship of pairwise distances estimated using simple and complex models, their divergence time estimates showed strong linear relationships for nuclear nucleotide, mitochondrial nucleotide, and amino acid sequence alignments ( The mean of the relative absolute difference between times estimated by simple and complex models was small (<6.2%), but some node times deviated considerably from the 1:1 linear trend for these two models. For example, three nodes in the "Spiders" data set showed 15-23% difference between time estimates obtained from simple and complex models ( fig. 3a). Interestingly, these differences disappeared (<1%) when all the internal calibrations were excluded ( fig. 3b). In contrast to the patterns observed for the "Spiders" data set, divergence estimates from the "Eukaryotes & Prokaryotes" data set showed larger differences between simple and complex models when all the internal calibrations were removed. However, Bayesian CrIs obtained using simple models often contained the point estimates obtained using complex models, and vice versa ( (88-90.6 My) models rejected the evolutionary model in which the last common ancestor of placental mammals appeared after the Cretaceous-Paleogene (K-Pg) event, consistent with conclusions in some previous studies (Hedges et al. 1996;Kumar and Hedges 1998;dos Reis et al. 2012  Dating with Simple versus Complex Models . doi:10.1093/molbev/msaa049 MBE 2018). Divergence times are the ratios of the linear combinations of branch length estimates (and of node-to-tip distances), which allows us to predict that the relative branch lengths between simple and complex models will be similar. Indeed, linear models (through the origin) described the relationship between branch lengths from simple and complex models for all empirical data sets we examined ( fig. 5). This trend is dramatically different from that observed for pairwise distances, where a curvilinear relationship was observed in every case ( fig. 2). Linear regression slopes of branch lengths are all <1 because simple models underestimate sequence divergences. However, branch lengths were uniformly underestimated via simple models for the empirical data sets tested, resulting in similar relative branch lengths between simple and complex model scenarios (fig. 5) as well as similar relative node-to-tip distances (supplementary fig. S5, Supplementary Material online). Because divergence times are a function of the ratios of the linear combinations of branch lengths, and the ratios of node-to-tip distances, they become comparable under simple and complex models.
Despite the linear patterns described earlier, we expected a greater magnitude of underestimation for longer branches and deeper sequence divergences when using simple models (see fig. 2a), which could distort time estimates. This effect is indeed observed for the longest branches in most of the data sets analyzed, as they show significant deviation from the linear trend ( fig. 5). We confirmed this pattern in a systematic analysis of short, long, and intermediate branch lengths (fig. 6a); and shallow, deep, and intermediate node-to-tip distances ( fig. 6c). However, divergence time estimates on deep nodes and branch times on long branches were often not very different between simple and complex models ( fig. 6b and d).
These results suggest that relaxed clock methods automatically adjust evolutionary rates within a phylogeny to produce robust time estimates for the empirical data sets analyzed.
However, this adjustment may not be possible for phylogenies with few species or sparse taxon sampling within some clades, producing isolated long branches and causing time estimates from simple models to differ from complex models. For example, many very long branches may appear in an unbalanced phylogeny (supplementary fig. S6a, Supplementary Material online). Interestingly, we found that the slope of times estimated using simple and complex models was close to 1 when the rates were similar to those observed in empirical data sets (1Â category in supplementary fig. S6b and c, Supplementary Material online), which is consistent with results from our empirical analyses. For data sets where evolutionary rates are extremely fast (e.g., 10Â), or sequence divergences are very large (see Materials and Methods), we found that the use of a single, very shallow calibration often produced overly young estimates (blue dots, 2Â-10Â in supplementary fig. S6c, Supplementary Material online). It is because the underestimation of lengths for long branches is more severe when using a simple model, and the rate adjustment offered by relaxed clock methods is not as effective. However, the use of good calibrations at deeper nodes alleviated the discrepancy and produced less biased time estimates for the simple model (green and pink dots, 2Â-10Â in supplementary fig. S6c, Supplementary Material online). Because reliable alignment of sequences showing large divergences is very challenging (Edwards et al. 1995), researchers will (and should) generally be apprehensive of using highly divergent genes (e.g., 10Â).

Increasing Numbers of Sequences Makes Time Estimates from Simple and Complex Models More Similar
We investigated the effect of the number of sequences in a data set on the similarity of estimated dates from simple and complex models, because a data set with sparse taxon  MBE sampling may show a greater discrepancy. We evaluated the linearity of the relationship of branch lengths generated via simple and complex models for data subsets with increasingly larger numbers of sequences subsampled from the "Plants" data set. We found that when the number of sequences sampled was small (e.g., 10), the linear relationships between branch lengths were weak for some subsets (lower linear coefficient, R 2 ) and strong for others (higher R 2 ) ( fig. 7). With an increasing number of sequences, the dispersion of R 2 values became smaller, resulting in a more robust linear relationship between branch lengths ( fig. 7). In the empirical data sets analyzed, the dispersion of R 2 became very small for data sets that contain as few as 40 sequences. Therefore, a stronger linear relationship of branch length estimates between simple and complex models exists for data sets with many species, resulting in similar divergence time estimates. Our results may explain the inconsistent patterns reported in previous studies (Yang 1996;Schenk and Hufford 2010), which analyzed relatively small data sets (5-25 sequences).

Effects of Irreversibility and Nonstationarity of Substitution Patterns
As mentioned earlier, our primary focus is on complex models used in practical analyses, which are from the GTR class. However, it is important to consider whether a violation of the assumption of model stationarity and time-reversibility has biased time estimates significantly. Because non-GTR substitution models are not available for use in the Bayesian dating software, we used the RelTime method to infer divergence times directly for phylogenies in which branch lengths were estimated under an unrestricted model (in which the time-reversibility in substitution models is not assumed) and a model in which the stationarity of the substitution patterns is not assumed (see Materials and Methods). We found that the JC model and non-GTR models produced similar divergence time estimates and confidence intervals ( fig. 8a and b). It is because branch length estimates obtained using the JC and non-GTR models showed strong linear relationships ( fig. 8c and d). Simple and complex models generated comparable time estimates and, hence, may provide equivalent statistical power for hypothesis testing within the range of evolutionary conditions observed in empirical data sets analyzed for this study.

Discussion
Divergence times estimated using simple and complex substitution models were remarkably similar across a range of phylogenomic data sets, a pattern that we observed for both Bayesian and RelTime methods. More surprisingly, similar estimates were observed even when a small number of calibrations (e.g., only the root calibration) was used in Bayesian and non-Bayesian analyses. We found that three fundamental reasons can explain the observed robustness of time estimates to model complexity in many phylogenomic data sets. First, the estimates of branch lengths and node-to-tip distances under simple and complex models show strong linear relationships, especially for data sets with many sequences. Second, relaxed clock methods are able to automatically adjust evolutionary rates on branches that experience underestimation of sequence divergences, resulting in time estimates that are similar to those from complex models. Third, the use of calibrations (especially deep calibrations) further narrows differences in time estimates from simple and complex models, as calibrations often offer strong constraints on node ages. Although divergence time estimates derived via simple and complex models show remarkable overall similarity, use of different substitution models may produce very different point estimates for some nodes (e.g., >20% difference for node 3 in fig. 1g). It is mainly because the distributions of posterior times under simple and complex models differ for those nodes. Even for phylogenomic data sets, divergence time estimates are generally associated with large estimation errors due to the variance of branch length estimates, the degree of evolutionary rate heterogeneity, and the uncertainty related to clock calibration (Zhu et al. 2015;Tao et al. 2020). Therefore, CrIs, which represent the uncertainty surrounding divergence time estimates, are more useful than point estimates in biological hypothesis testing (Warnock et al. 2017). CrIs produced by simple and complex models largely overlapped ( fig. 4 and supplementary fig. S4, Supplementary Material online), and the widths of CrIs around time estimates were also very similar when the same set of calibrations were used ( fig. 1g and supplementary  fig. S3, Supplementary Material online).
Even though we found that simple models produced results comparable to those from complex models across many empirical data sets, we anticipate that there will be situations in which complex models are best suited for molecular dating. This includes the analysis of data sets in which the number of sequences is small, substitution patterns have shifted substantially in some groups, sequences divergences are large, or taxon sampling in some clades is so sparse as to create many long branches. Complex models are also  MBE required when using an external mutation or substitution rate to set evolutionary distances to times because actual branch lengths are best estimated via a complex model. Therefore, complex models are needed if one is interested in estimating not only divergence times but also absolute and relative evolutionary rates. Complex models may also be required when inferring the phylogeny and divergence times jointly, although use of a predetermined topology is a common practice in phylogenomic studies (e.g., Li et al. 2019;Oliveros et al. 2019). In the future, we plan to investigate the impact of substitution model complexity on the joint inference of phylogeny and times. A majority of phylogenomic analyses make a simplifying assumption that the same substitution model applies across all the sites in a concatenated data set or within each data partition. However, evolutionary dynamics and processes differ regionally (and even positionally) in mutation patterns, sequence contexts, and selective pressures   8. Relationships of RelTime divergence times estimated with branch lengths obtained using the JC model and models that are (a) non-timereversible and (b) non-stationary for all nucleotide data sets. Gray solid lines represent 95% confidence intervals. The sum of node ages was used to normalize divergence times and confidence intervals. Relationships of branch lengths obtained using the JC model and models that are (c) nontime-reversible and (d) non-stationary. The gray-dashed line represents the best-fit linear regression through the origin. The slope and coefficient of determination (R 2 ) for the linear regression are shown.
Dating with Simple versus Complex Models . doi:10.1093/molbev/msaa049 MBE Yang and Swanson 2002;Shapiro et al. 2006;Kosakovsky Pond et al. 2008;Bordner and Mittelmann 2014;Jayaswal et al. 2014;Arenas 2015). Therefore, one may imagine that even a seemingly complex substitution model (e.g., GTR þ C) is relatively simple when compared with the actual reality. The observation that the simplest models produce time estimates similar to those obtained using much more complex models may be used to suggest that the current model complexity is appropriate for estimating divergence times. But, extensive analyses of simulated data are required to fully explore the sufficiency of current substitution models and understand whether the difference between simple and complex models can be interpreted as a problem with simple models, which is beyond the scope of this article and an exciting future direction of research.

Empirical Data Acquisition
We selected eight large-scale empirical data sets distributed across the tree of life. Species groups, data types, sequence lengths, sequence counts, calibration counts, branch rate model, the number of partitions, and the original substitution models used in the majority of partitions are summarized in table 1. These studies used complex substitution models (e.g., GTR þ C) along with one or multiple partitions, and none of the studies selected the JC or Poisson models as the best model for any partitions. All empirical data were analyzed initially in MCMCTree (Yang 2007) to estimate the divergence times, except the "Spiders" data that were analyzed in RelTime and then reanalyzed in MCMCTree by Mello et al. (2017). In all analyses, we used the published topologies. We obtained the published divergence times and CrIs from the original studies, except for "Mammals A" and "Eukaryotes & Prokaryotes" data sets. Because the original studies of these two data sets did not provide CrIs of Bayesian time estimates, we reproduced timetrees for these two data sets with the same settings as used in the original studies with MCMCTree (v4.9h).
Because analyses of long sequences can require long computational times, mainly in ML branch length calculations, we used the original alignments when they were shorter than 10,000 sites (see table 1). Otherwise, we randomly selected 10,000 sites from the original alignments (10 K data sets) for all phylogenetic analyses. The exception was the "Plants" data, for which the original study (Morris et al. 2018) showed that similar time estimates were obtained by using the full alignment (856,439 sites) and a trimmed subsample with high site coverage (2,217 sites). We, therefore, used the 2,217 sites data for the "Plants" data set. The subsampled alignments were used in all following analyses of simple and complex models. All empirical data sets are available at https://doi.org/10.6084/ m9.figshare.11873874, last accessed March 9, 2020.

Relationship between Bayesian Time Estimates Using Simple and Complex Models
We estimated divergence times in MCMCTree (v4.9h) using simple models with topologies and calibrations from the original studies. All MCMCTree analyses were conducted using the approximate likelihood calculation. The substitution models used in the original studies were employed as the complex models (table 1). JC and Poisson models without the assumption of rate variation across sites under the gamma distribution were used as the simple model for nucleotide and amino acid sequences, respectively. For the "Plants" data, we also estimated divergence times using Kimura 2-parameter (K2) (Kimura 1980), Hasegawa-Kishino-Yano (HKY) (Hasegawa et al. 1985), and Tamura-Nei (TN) (Tamura and Nei 1993) models with and without a gamma parameter for accounting for rate variation across sites, and the GTR model without a gamma parameter. We used a single partition in all simple and complex model analyses, although multiple partitions might be used in the original studies. However, we used 29 partitions for the "Eukaryotes & Prokaryotes" data analysis since the original research (Betts et al. 2018) showed a strong influence of partitioning on time estimation. We used the same rate models, prior settings (e.g., tree prior and overall rate prior), and calibration constraints and densities as published in the original studies for all analyses. Two independent runs were conducted to ensure convergence and that ESS values were >200 after removing 10% burn-in samples for each run.
We first examined whether the use of a single partition and subsampled alignments would significantly impact the time estimates generated by the complex model analysis. Therefore, we compared the published times obtained using the full data sets and multiple partitions with times estimated using the subsampled alignments and a single partition. Concordant time estimates were found in all empirical data sets (supplementary fig. S7, Supplementary Material online). Thus, we considered the effect of data subsampling and partitioning to be small for the empirical data sets analyzed. Our observations are consistent with studies showing that site subsampling has a limited impact on the accuracy and precision of time estimates (dos Reis and Yang 2013;Zhu et al. 2015). Therefore, we used the divergence times, and CrIs obtained using the subsampled alignments, the original complex substitution model, and a single partition as the inferences from complex models when a reanalysis was needed. We then compared them with the results obtained using the subsampled alignments, simple models, and a single partition to eliminate any site-subsampling bias, and observed good linear relationships ( fig. 3a). We also found similar linear trends in direct comparisons between time estimates from simple models and published times for all data sets (results not shown), indicating that the length of sequences has limited impact on the robustness of time estimates to substitution model complexity. We also computed the mean of relative absolute error between times estimated under simple and complex models for each data set by using 1 where t c_i and t s_i is the node time estimated under the complex and simple model for node i, respectively, and n is the number of nodes. Although we only used MCMCTree for inferring Bayesian divergence times, we expect results to be similar when using other Bayesian dating software, for Tao et al. . doi:10.1093/molbev/msaa049 MBE example, BEAST2 (Bouckaert et al. 2014), because previous studies have shown that different Bayesian dating software packages tend to generate similar time estimates when priors and calibration constraints are consistent (Warnock et al. 2012(Warnock et al. , 2015.

Influence of Calibrations on the Relationship of Bayesian Times between Simple and Complex Models
We re-estimated divergence times in MCMCTree (v4.9h) using both simple and complex models without any internal calibrations to examine whether the use of many calibrations constrained the final time estimates and concealed bias caused by the use of simple models. In this case, only the age of root was constrained. A single partition and the subsampled alignments were used in the analysis. All other priors, including the root calibration, were the same as those used in the original studies. We also investigated whether the use of root calibration caused a linear relationship between time estimates under simple and complex models. We performed Bayesian analysis for each data set using a single internal calibration that was randomly selected from all the internal calibrations used in the original study and only the maximum constraint (t max ) for the root. The use of only a maximum constraint for the root resulted in a diffused uniform density [0, t max ].

Testing the Need for Multiple-Hits Correction in Pairwise Distance Estimation
We tested whether the concordance of times between simple and complex models arose because minimal multiple-hits correction was needed, which was often true for data sets that contained recently diverged or slow-evolving taxa (Nei and Kumar 2000). We estimated pairwise sequence distances using simple and complex models in MEGA X . We used TN and Jones-Taylor-Thornton (Jones et al. 1992) models with a gamma parameter estimated by the ML method as the complex model for nucleotide and amino acid data sets, respectively.

Testing the Relationship of Divergence Times Estimated Using Simple and Complex Models for a Non-Bayesian Approach
We estimated divergence times using the RelTime method (Tamura et al. 2012 in MEGA X. A single partition was used in these analyses. No calibrations are required for RelTime analyses. To directly compare the dates estimated by RelTime when no calibrations are used, we normalized time estimates to the sum of node ages. This normalization is simply a post hoc scaling of relative times and is not the same as assigning calibrations in the Bayesian approaches in which the tree prior interacts with calibration probability densities. Outgroups were removed in the time comparisons because RelTime does not produce time estimates for sequences in the outgroup.
Testing the Relationship of Branch Lengths and Nodeto-Tip Distances Estimated Using Simple and Complex Models For each data set, we estimated ML branch lengths using both simple and complex models and the published topology in MEGA X. A single partition was used in all ML analyses. We compared the branch lengths estimated using simple and complex models to obtain the relationship. We then calculated the node-to-tip distances using the resulting ML tree. For each node, the node-to-tip distance is the sum of the lengths of all paths from this node to all descendent tips divided by the total number of descendant tips.

Testing Relationships of Branch Lengths and Node-to-Tip Distances for Different Branches and Sequence Divergences
For each data set, we compared branch lengths and branch times estimated using simple complex models for short, intermediate, and long branches. Branch length categories were assigned by comparing individual branch length to the mean branch length across a given tree. Long branches were longer than 1 SD from the mean value of all branches. Short branches were those with lengths shorter than the mean value. The remaining branches were classified as intermediate branches.
We also compared node-to-tip distances from simple and complex models for shallow, intermediate, and deep nodes based on the timetree inferred using the complex model and no internal calibrations. The shallow region is the period spanning from 0 My to 30% of the root age. The deep region for all data sets, except for the "Eukaryotes & Prokaryotes" data, is the period spanning from 70% of the root age to the root age. For the "Eukaryotes & Prokaryotes" data, the deep region is the period spanning from 50% of the root age to the root age because all internal nodes are younger than 70% of the root age. The remaining timespan belongs to the intermediate region for all data sets. We computed the slopes of node-to-tip distances and of divergence times for nodes that were located in shallow, intermediate, and deep divergences regions.
Testing the Linearity of Relationships among Substitution Model Complexity, Number of Ingroup Sequences, and the Dispersion of Branch Lengths We first randomly sampled ten ingroup sequences from the full "Plant" data set (99 ingroup þ 4 outgroup sequences). We then used "expanded sampling" to generate data sets with 20, 30, 40, 50, 60, 70, 80, and 90 ingroup sequences, so that the larger data sets always contained the sequences in the smaller data sets. For example, we kept ten sampled sequences and sampled another ten sequences to generate a data set with 20 ingroup sequences. We repeated this procedure 20 times, so we had 20 replicates for each number of ingroup sequences. We estimated branch lengths for each replicate using JC, K2, HKY, TN, GTR, and the model used in the original study, GTR þ C. We compared the branch lengths estimated using the Dating with Simple versus Complex Models . doi:10.1093/molbev/msaa049 MBE simpler models with those estimated using the GTR þ C model and computed the coefficient of determination of linear regression through the origin (R 2 ). Therefore, for each subsampled category, we had 20 R 2 of branch lengths compared between analyses using the GTR þ C and a corresponding simpler model.
Testing the Relationship of Branch Lengths and Times Estimated Using the Simple Model and Models That Are Non-time-Reversible and Non-stationary To examine whether the linear relationships of divergence times and branch lengths between simple and complex models are unique phenomena for models in GTR class, we compared the ML branch lengths estimated using the simple model (JC) and models that are non-time-reversible or non-stationary for all nucleotide data sets. We obtained the branch lengths under non-time-reversible (i.e., unrestricted) model (model ¼ 10) (Yang 1994) and non-stationary model (nhomo ¼ 3) with a single partition in baseml (v4.9h) (Yang 2007). Because the direct usage of non-time-reversible and nonstationary models is not allowed in MCMCTree for estimating divergence times, we obtained time estimates using the RelTime method with the ML trees produced by baseml and without calibrations. We then normalized time estimates produced by RelTime to the sum of node ages to obtain the relationship.

Simulation
To assess the impact of sparse taxon sampling and long branches on divergence time estimation between simple and complex models, we conducted a computer simulation. We used a completely unbalanced tree with 16 tips as the model timetree (root age ¼ 3 time units, supplementary fig. S6a, Supplementary Material online) and simulated sequences under the strict clock with different mean rates. All sequences were simulated using SeqGen (Grassly et al. 1997) under the GTR þ C (a ¼ 0.25) model with 5,000 bp and a biased base composition (T ¼ 0.25, C ¼ 0.33, A ¼ 0.31, and G ¼ 0.11). We set the mean evolutionary rate of 0.1 substitutions per site per time unit as the baseline case (1Â) to make the simulated distribution of pairwise sequence distances (estimated under the TN þ C model) to be similar to that produced from the analysis of all eight empirical data sets analyzed (1Â in supplementary fig. S6b, Supplementary Material online). Then we accelerated the mean rate to be 2-, 4-, 6-, 8-, and 10 times faster. For the fastest rate simulated (10Â), the median of pairwise sequence distances was 4.7 substitutions per site, which is much larger than the empirical value (supplementary fig. S6b, Supplementary Material online). We inferred divergence times using MCMCTree (v4.9h) using the GTR þ C and the JC model. Because the likelihood ratio test rejected the assumption of the strict clock (P < 10 À20 ) when the JC model was used due to uneven underestimation of branch lengths, we relaxed the molecular clock when estimating divergence times. We used three different calibration strategies to investigate the impact of calibration positions on time inference: a precise calibration at a shallow node (o-p divergence in supplementary fig. S6a, Supplementary Material online) and a diffused root calibration; a precise calibration at a middle node (j-p divergence) and a diffused root calibration; and a precise root calibration. All priors (e.g., mean evolutionary rate) were set to be as the true values in all Bayesian analyses. Simulated data sets and prior settings are available at https://doi.org/10.6084/m9.figshare.11873874, last accessed March 9, 2020.

Supplementary Material
Supplementary data are available at Molecular Biology and Evolution online.