## Abstract

Several features of currently used Bayesian methods in phylogenetic analysis are discussed. The distinction between Clade-Bayes and Topology-Bayes is presented and illustrated with an empirical example. Three problems with Bayesian phylogenetic methods––exaggerated clade support, inconsistently biased priors, and the impossibility of hypothesis testing of cladograms––are shown to be the result of using a Clade-based Bayesian approach. Topology-based Bayesian methods do not share these shortcomings.

## Introduction

Bayesian methods in phylogenetics have a 30-year history, tracing back at least to Farris (1973) (discussed in Felsenstein 2004). Explicit attempts to use empirical priors and to create Bayesian estimators of topology were attempted as early as Wheeler (1991), and recent techniques and implementations have been reviewed by Huelsenbeck et al. (2001). The purpose of this discussion is to examine the application of recent Bayesian techniques to the problem of estimating cladistic relationships among terminal taxa—that is, their branching pattern.

As an optimality criterion to choose among candidate topologies, posterior probability is a fine option, but current use (as in Huelsenbeck and Ronquist 2003) does not follow this path. The entities whose posterior probability is most frequently estimated are clades. The set of clades with posterior probability >one-half is presented as the Bayesian hypothesis of phylogenetic topology. Indeed, many proponents of the methodology have asserted repeatedly that these 50% majority rule consensus topologies reflect the probability of the truth of clade identity (Huelsenbeck et al. 2002), and hundreds of authors employing the methods have repeated this assertion as a justification for the methodological choice. Such an approach differs from that of examining the relative merits of alternative topologies directly, resulting in 3 problems: 1) exaggerated clade support, 2) inconsistently biased priors, and 3) the impossibility of topology hypothesis testing. Here, we discuss these issues and show that these problems with Bayesian phylogenetics are not inherent to Bayesianism per se but to the particular path that has been taken by the majority of the community. We show that the adoption of the Bayesian optimality position––supported by Rannala and Yang (1996), though not adopted by most practitioners––abrogates these problems.

## Topology-Bayes and Clade-Bayes

### Topology-Bayes

A Topology-Bayesian estimator of a phylogenetic hypothesis is that topology (*T _{t}*) which has the maximum posterior probability (assuming all other topologies have equal cost of error or “loss”). Here, we describe a tree

*T*as composed of a set of vertices

*V*and edges

*E*:

Bayes’ theorem allows for the calculation of the probability of the hypothesis––here, the topology––given the data, $pr(Ti|D)$. This is desirable as phylogeneticists begin with data and usually seek the best estimate of the topology. If investigating the posterior probability of a topology $pr(Ti|D)$, given topological “prior” probabilities $pr(Ti)$, Bayes’ theorem can be written:

### Clade-Bayes

The Clade-Bayesian estimator is that set *V _{c}* = {

*V*

_{c}_{,1},

*V*

_{c}_{,2}, ....} of clades that have a posterior probability >one-half. Because any pair of clades from

*V*cannot conflict (because, by definition, there is a positive probability that some tree contains both clades), the set

_{c}*V*forms a tree, which we will denote

_{c}*T*. This is the Clade-Bayes topology. Unlike

_{c}*T*,

_{t}*T*has no associated optimality value even though each clade in

_{c}*T*does. Most current Bayesian phylogenetic analyses (e.g., MrBayes) produce

_{c}*T*(Huelsenbeck and Ronquist 2003). Consider the posterior probability clade on a topology, given data:

_{c}*V*is not uniform (Pickett and Randle 2005). Indeed, some clades will be profoundly less probably than others, and it is impossible to assign a label-invariant (i.e., standard) distributional topological prior, uniform or otherwise, that will induce uniformity on

_{c}*V*(Steel and Pickett 2006). As such, even if $pr(D|Vc,i)$ is maximal (of all clades, e.g.), because that value would be multiplied by its prior probability (which may be quite low under uniform topological priors), it may not correspond to the clade of maximum posterior probability. Simply put, the proportionality and relative ordering of likelihood and posterior probability are no longer guaranteed. In any given case, depending on the data and the particular prior employed, the clade of maximum likelihood may also be the clade of maximum posterior probability, but there is no requirement that this be so.

_{c}Given that *T _{t}* and

*T*are different entities, they need not agree. In such a case,

_{c}*T*and

_{t}*T*would have clades in conflict and

_{c}*T*would not represent the Topology-Bayes estimator or the maximum a posteriori probability (MAP) estimate of Rannala and Yang (1996). This could occur, for example, if several suboptimal (i.e., lower posterior probability) topologies share a group not found in the “best” tree. Below, we provide an example.

_{c}### Example Comparison

The fact that topologies of maximum likelihood need not correspond to the Clade-Bayes tree has been considered recently (Svennblad et al. 2006). But the fact that the Topology-Bayes approach and the Clade-Bayes approach can result in topological differences has received almost no attention. Here, we demonstrate this potential disparity using empirical data.

Consider a case of arthropod morphological data with 54 taxa and 303 characters (Giribet et al. 2001). If we begin with uniform topological prior probabilities, then the *T _{t}* will be the maximum likelihood topology. Any model will suffice; here we use the no common mechansim (NCM) model of Tuffley and Steel (1997). Because MrBayes (Huelsenbeck and Ronquist 2003) does not seek the optimal trees (see below), we calculated

*T*(the Topology-Bayes or MAP estimate) using POY3 (Wheeler et al. 1996–2005), which searches and saves all identified optimal solutions, given a criterion. The 7 characters treated as additive in the original analysis of Giribet et al. (2001) are treated as nonadditive here.

_{t}*T*was estimated using MrBayes (Huelsenbeck and Ronquist 2003), implementing the same model of evolution. Searches were performed with the following options:

_{c}POY3: buildsperreplicate 10, replicates 10, treefuse, likelihood, likelihoodroundingmultiplier 10,000. This implements 10 random replicates with 10 Wagner builds per replicate followed by tree-bisection-and-regrafting (TBR) branch-swapping and tree fusing (Goloboff 1999) within and between replicates; the NCM (Tuffley and Steel 1997) model of evolution was employed, rounding likelihoods to 5 decimal places.

MrBayes: lset parsmodel = yes mcmc ngen = 10,000,000 samplefreq = 500 nchains = 4. This implements the NCM (Tuffley and Steel 1997) model of evolution with 2 simultaneous runs of 4 chains and 10,000,000 generations each, saving every 500th topology visited to file.

Ten cladograms of −log likelihood 773.02908 were found by POY3. Their strict consensus $(Tt)$ is shown in figure 1a. The *T _{c}* produced by MrBayes is shown in figure 1b (−log likelihood of the best binary resolution of this consensus cladogram is 773.38128). Overall, the 2 trees are quite similar. They differ, however, in several major taxonomic groupings. Foremost among these is that the Chelicerata are monophyletic in

*T*and paraphyletic (pycnogonids basal) in

_{c}*T*.

_{t}The 2 topologies are similar, but not identical. Implementation issues aside, this example clearly demonstrates a case where *T _{t}* ≠

*T*.

_{c}It is worth noting that the maximum posterior probability topology from MrBayes has a −log likelihood of 773.03, which differs from the score reported by POY3 (773.02908) only in rounding. MrBayes reported only 2 trees of this score; it may have visited the other 8 topologies of highest probability but because the program is designed to create *T _{c}*, it does not seek to save all optimal topologies. In fact, one of the 2 optimal topologies visited by MrBayes was found before the end of the burn-in in our second run (tree 1905000) and so would not even be included in a majority rule calculation of the postburn-in topologies. However, there would be no burn-in period and no reason to abandon any of the visited topologies, when searching for the optimal solution.

## Doubling Behavior: Clade-Bayes and Support

One of the properties of all statistical methods is an increase in levels of support with a multiplication of identically distributed data. In other words, larger data sets with the same proportional balance of data in favor of, in opposition to, and indifferent to a hypothesis will assign higher support. Consider the data of table 1. Under NCM (Penny et al. 1994; Tuffley and Steel 1997, p. 587. eqs. 12 and 13), the maximum posterior probability cladogram is that of figure 2 at 0.5 (eq. 4).

Taxon | Characters | |||

D | 0 | 0 | 0 | 1 |

C | 0 | 0 | 1 | 0 |

B | 1 | 1 | 0 | 0 |

A | 1 | 1 | 1 | 1 |

Taxon | Characters | |||

D | 0 | 0 | 0 | 1 |

C | 0 | 0 | 1 | 0 |

B | 1 | 1 | 0 | 0 |

A | 1 | 1 | 1 | 1 |

*T*=

_{c}*T*and the posterior probability of group A + B (

_{t}*V*

_{(AB)}) is equal to 0.5. The evidence in favor of AB is equal to that against. If we were to replicate these data

*n*times, with exactly the same balance of evidence for and against AB, $pr(VAB)$ would increase with

*n*until arbitrarily close to 1 (eq. 5).

This same behavior will be observed in a bootstrap, jackknife analysis or other resampling approach to support. Similarly, the same behavior would be observed for any data, contrived or otherwise, that are duplicated identically, regardless of the model employed; our example is presented to permit exact calculations. Although reasonable on statistical grounds, even when support levels are very close to unity, there are still as many observations contradicting the grouping as supporting. The point here is that this behavior is an ineluctable outcome of data duplication. It is important to note that the“inflation” of support values described here is not related to the higher values seen in Bayesian over bootstrap or jackknife support (Cummings et al. 2003; Erixon et al. 2003; Simmons et al. 2004), reported from *T _{c}*. It is also worth noting that if the

*T*approach is adopted, the support inflation seen in

_{t}*T*becomes moot.

_{c}## Priors: Uniform, Biased, and Empirical

A central issue surrounding Bayesian techniques is the choice of appropriate prior probabilities. For the purposes here, we can divide these into 3 types: uniform, biased, and empirical. Uniform or ignorance priors are usually employed when there is no useful preexisting information on the entity to be estimated. Phylogenetic analysis tends to adhere to uniform topological priors, and discussions tend to rely on current evidence to draw conclusions in the admirable feeling that the investigator cannot say which hypotheses are more probable a priori. Biased priors attach greater or lesser initial probability to entities based on nonuniform distributions. In many cases of Bayesian analysis, this is not undesirable. It may be well known that processes follow certain distributions and profitable use can be made of them. In phylogenetic analysis, they are less well regarded because they are not based on biological information. Empirical priors are based, unsurprisingly, on previous experience, data, and knowledge. In general, empirical priors are unobjectionable because they allow the use of previous empirical results. Some strict Bayesians object to the use of data as priors when the data themselves are not, in fact, temporally and ontologically prior to the other data under consideration. This objection derives from the fundamental difference with frequentist statistics: the Bayesian view that prior events predict future events. We note this strict interpretation here only to point out that as phylogeneticists begin using empirical data as prior assertions, this philosophical objection may loom, especially if the prior data did not occur first, even if they were observed first and thus formed the prior phylogenetic viewpoint (as would be the case, e.g., if morphology was used as priors for molecular data; morphology is not temporally antecedent to DNA and thus would violate this strict Bayesian interpretation of appropriate prior data).

Priors are attached to the entities to be estimated and hence play different roles for *T _{t}* and

*T*. As mentioned above, phylogenetic Bayesian methods generally employ uniform priors; the most commonly used is the “proportional-to-distinguishable-arrangements” distribution (Rosen 1978). This is straightforward with

_{c}*T*because each topology (as in eq. 3) can be assigned the inverse of number of topologies. This cannot be done for

_{t}*T*.

_{c}Steel and Pickett (2006) have shown that uniform priors cannot be constructed for clades (*V _{c}*

_{,i}). In short, clades of size 2 or

*n*− 2 (for

*n*taxa) will have higher probabilities than those of intermediate size. This can result in huge prior disparities (orders of magnitude) among clades (Pickett and Randle 2005). In absence of any data, some groups will be favored over others. This is not a problem that occurs with Topology-Bayesian analysis (

*T*). The problem only arises when topological priors (and their resultant clade priors) are used to estimate clade posteriors, as in Clade-Bayesian (

_{t}*T*) analyses.

_{c}Empirical priors on topologies are an underexplored area. Wheeler (1991) tried to do this using morphological data to create priors that were then combined with likelihoods based on molecular data. Using an explicit model of evolution, we can employ this approach using the probability of a topology given a set of morphological data to approximate an empirical prior (eq. 6). Of course, any model of evolution might be invoked for the data that yield the topology prior. Here, we employ NCM.

### Arthropod Example of Empirical Priors

If we use the estimation of empirical priors of equation (6), we can extend this with likelihoods based on molecular data (*D*_{mol}) for the same taxa to:

## Hypothesis Testing

The central act of phylogenetic analysis is establishing the relative merits of 2 hypotheses. This can be done on a variety of grounds (= optimality criteria) and as long as the comparisons are transitive, a best solution can be found. Parsimony, likelihood, and Topology-Bayes all do this by default. In each case, an optimality value (cost, likelihood, and posterior probability) is assigned to each cladogram and reported by the investigator. This value is used to compare and test pairs of hypotheses. No such comparison can made for Clade-Bayes (*T _{c}*) trees in the form that they are usually reported. Although it is true that any tree-shaped object upon which characters are plotted can be assigned an optimality score (and thus, the posterior probability of any given Clade-Bayes tree could be calculated as a Topology-Bayes hypothesis, but therefore abandoning the Clade-Bayes approach), investigators rarely, if ever, calculate or report such optimality scores. As such, few, if any of the reported Clade-Bayes trees from the literature can be compared with subsequent analyses of the same data, which may give different results. Thus, it is impossible to say that any Clade-Bayes topology is superior or inferior to any other, unless subsequent investigators compute the optimality score of the Clade-Bayes trees as topologies. As with jackknife or bootstrap trees, the strength of clade support is presented by the investigator, but not the cladogram optimality. It is also worth noting that any tree that is less resolved than the optimal binary tree––whether Clade-Bayes, jackknife, bootstrap, or other consensus tree––is most likely less optimal (in this case lower posterior probability, but strictly they could be equal) because the consensus is most likely due to character conflict. Hence, Clade-Bayes trees may perhaps be regarded as statements of support but not as best-supported scientific hypotheses of phylogenetic relationships.

Bayesian methods can have a place in systematic analysis, but this position must be based on the relative quality of topologies, not their constituent parts. This requires the use of the Topology-Bayes approach advocated here.

We would like to acknowledge the important influence of discussions with Andrés Varón in developing this manuscript and helpful criticism of Mike Steel, editor Barbara Holland, and 2 anonymous reviewers.