Abstract

The concomitantly variable codons hypothesis of DNA substitution argues that at any time only a fraction of the codons in a gene are capable of accepting a mutation. However, as mutations are fixed at some positions in a gene, the sites that are potentially variable also change because of changed functional constraints. This hypothesis has been termed the “covarion” hypothesis or when the model is applied to nucleotides, the “covariotide” hypothesis. The covarion-covariotide model has proven to be remarkably difficult to test. Here I examine a covariotide hypothesis for 11 genes using a likelihood ratio test. I show that in nine of the genes examined a covariotide model provides a better explanation of the data than a model that does not allow constraints to change over time.

Introduction

Comparative sequence analysis has revealed a number of important properties about the process of DNA substitution (the combined processes of mutation and fixation in a gene). For example, comparative sequence analysis has revealed evidence of positive selection in some genes (genes showing a preponderance of nonsynonymous substitutions; Hughes and Nei 1988 ). It is also known that the rate of change among nucleotides varies, with transitions generally being more common than transversions (Brown et al. 1982 ), and that the rate of substitution varies across sites (Yang 1996 ). In protein-coding genes, in fact, the rates at different codon positions can be predicted by the redundancy of the genetic code; in general, variation in the rate of substitution across a gene is thought to reflect differences in functional constraints at different positions. Virtually all methods of comparative sequence analysis, however, assume that functional constraints, typically reflected as rates of substitution, remain constant over time. That is, the rate of substitution at a site is considered to be constant over the entire phylogenetic history of a group.

A prediction of the concomitantly variable codons, or covarion, hypothesis of substitution (Fitch and Markowitz 1970 ) is that the rate of substitution should vary over the evolutionary history of a group. The argument is that as mutations are fixed at some sites in a gene, the functional constraints at other positions may change. Hence, the rate at a site should vary over the evolutionary history of a group of organisms if the covarion model of evolution is correct. Attempts to test this model have looked at the variability of specific positions in an amino acid or DNA sequence across different clades of organisms (Lockhart et al. 1998 ); the covarion hypothesis is supported if a site is found to be variable in some clades but not in others. Application of this test to eubacterial tufA gene sequences suggested that a covariotide model does explain the data better than a model with constant rates on the tree (Lockhart et al. 1998 ). Application of a covariotide model to two ribosomal sequences data sets also supported a covarion-like model of substitution (Galtier 2001) .

This paper takes advantage of a formulation of the covarion model first proposed by Tuffley and Steel (1998) that allows the likelihood for an alignment of DNA sequences to be calculated. I test a covariotide model of DNA substitution on 11 genes using a likelihood ratio test. A covariotide model provides a significant improvement in most of the genes examined.

The method used in this study is very similar to that used in a recent paper by Galtier (2001) . Galtier (2001) extends the gamma model of Yang (1993, 1994) that allows the rate at a site to vary across a sequence. Specifically, he allows the rate at a site to change over evolutionary time using the same type of framework as Tuffley and Steel (1998) , estimating the rate of switching (and other parameters) using maximum likelihood. Moreover, he allows the rate of switching to vary across a sequence. In this paper, I directly implement the model of Tuffley and Steel (1998) and a variant thereof that allows for site-specific or gamma-distributed rate variation.

Methods

Data

I examined a covariotide model of DNA substitution for 10 alignments of protein-coding genes and one alignment of a ribosomal gene. The data included 54 Drosophilaadh sequences (Atrian et al. 1998 ), 23 mammalian mtDNA sequences (Arnason, Gullberg, and Janke 1997 ), 29 HIV-1 vif sequences (Yang et al. 2000 ), 23 HIV-1 pol sequences (Yang et al. 2000 ), 32 mammalian mhcII sequences (Takahashi, Rooney, and Nei 2000) , 17 vertebrate β-globin sequences (Yang et al. 2000 ), 42 eubacterial tufA sequences (Lockhart et al. 1998 ), 23 Japanese encephalitis env sequences (Yang et al. 2000 ), 18 Flavivirus ns-5 sequences (Kuno et al. 1998 ), 85 insect 18S and 28S rDNA sequences (Whiting et al. 1997 ), and 9 Leviviridae replicase sequences (Bollback and Huelsenbeck 2001) . Each alignment of DNA sequences is contained in a matrix X = {xij}, where i = 1, …, s and j = 1, …, c (s is the number of sequences and c is the length of each sequence in the alignment). The columns of this matrix represent individual homologous sites. The information at the ith site in the alignment is contained in the vector xi = (x1i, x2i, …, xsi)′.

Phylogenetic Model

The species are assumed to be related by an unrooted bifurcating tree, τi. Each possible tree is numbered 1, …, B(s), where B(s) = (2s − 5)!/[2s−3(s − 3)!] is the number of possible unrooted binary trees for s species. Figure 1 shows an example of an arbitrarily chosen tree of s = 8 sequences. The tips of the tree are labeled 1 to s and the internal nodes are labeled s + 1 to 2s − 2. The internal nodes are labeled consecutively according to a postorder traversal of the tree (i.e., from the tips of the tree to the root). Although the tree is unrooted, for convenience it can be drawn to be rooted at one of the tip sequences. In this paper, the root of the tree is at sequence s. Each phylogenetic tree has 2s − 3 branches. The lengths of the branches are in terms of expected number of substitutions per site and the collection of branch lengths for the ith tree is designated vi = (v1, v2, …, v2s−3).

I use the formulation of the covarion-covariotide model developed by Tuffley and Steel (1998) . A site is considered to be either “on” (1), in which case substitutions may occur, or “off” (0), in which case no substitutions are possible. Hence, two stochastic processes are considered. The first describes the on-off process and the second describes the substitution process when the switch process is on. The switching (on-off) process is described by the rate matrix S:
where s01 and s10 are the rates of switching from off to on and from on to off, respectively. The switch process has stationary distribution π0 = (s10)/(s01 + s10) and π1 = (s01)/(s01 + s10). I assume that substitutions occur according to the GTR model of DNA substitution when the switch process is on (Tavaré 1986 ). The GTR model allows unequal nucleotide frequencies and each substitution type to have a different rate. The instantaneous rates of change under the GTR model are
where rij is the rate of change from nucleotide i to nucleotide j measured relative to the rate of change between G and T (rGT = 1) and π = (πA, πC, πG, πT) are the stationary frequencies of the nucleotides. The diagonal elements of Q are set such that the sum of the elements for each row is zero. The scaling factor, μ, is chosen such that the average rate of substitution is 1, σij π1πiqijμ = 1 for i, j ∈ {A, C, G, T}. Rescaling ensures that branch lengths on the tree are in terms of expected number of substitutions per site. Moreover, the switching rates are measured in terms of substitutions. Rate variation across sites can be accommodated by multiplying the scaled substitution matrix, Q, by a rate parameter ρ. Ten of the genes examined in this study are protein-coding and I account for among-site rate variation by partitioning sites into first, second, and third codon positions and allowing each to have a different rate parameter. For the single ribosomal data set, I assume that the rate at a site is a gamma-distributed random variable with shape parameter α (Yang 1993 ). I use the discrete approximation of the gamma distribution, suggested by Yang (1994) , to approximate the continuous gamma. Four categories are used to approximate the continuous gamma.
Together, the switching and substitution processes form a single time-reversible Markov process with instantaneous rate matrix
(Tuffley and Steel 1998 ). I is the 4 × 4 identity matrix. This process has stationary distribution π = (π0πA, π0πC, …, π1πG, π1πT). Transition probabilities can be obtained using this model by simply exponentiating the product of the rate matrix, R, and the branch length of the phylogenetic tree, v: P(v,θ) = {pij(v,θ)} = eRv. The parameters of the substitution model are contained in the vector. θ = (rAC, rAG, rAT, rCG, rCT, πA, πC, πG, s01, s10).
As an example, consider a covarion model for the Jukes-Cantor (1969) model of DNA substitution. The Jukes-Cantor (1969) model of DNA substitution assumes constant rates among substitution types and equal stationary frequencies for the nucleotides. The instantaneous rates of change for the Jukes-Cantor covarion model are then

The implementation of the covariotide model used in this study is a generalization of the proportion of invariable sites model. The proportion of invariable sites model assumes that some fraction, p, of sites are unable to vary. The proportion of invariable sites model can be obtained from the covariotide model by allowing s01 and s10 to approach 0 with the constraint that (s10)/(s01 + s10) = p.

The probability of observing the data at the ith site involves a summation over all possible ancestral states that can be assigned to the internal nodes on the tree. However, the possible states for the covariotide model are not simply A, C, G, and T, as they normally are for substitution models typically used in phylogenetic analysis, but rather A0, C0, G0, T0, A1, C1, G1, and T1; every nucleotide can be either capable (1) or incapable (0) of substitution. To calculate the likelihood at a site, I use a variable z(j) that contains the conditional probability of the observations above the jth node of the tree, given that the node j was in a particular state [z(j) = (z(j)A0, z(j)C0, z(j)G0, z(j)T0, z(j)A1, z(j)C1, z(j)G1, z(j)T1)]. This variable is initialized for the tip sequences. If xji = C is observed at tip sequence j, for example, then the conditional likelihood vector z(j) is z(j) = (0, 1, 0, 0, 0, 1, 0, 0). In other words, the conditional probability of all entries that correspond to an on or off C are set to 1 and the others are set to 0.

The site probability can then be calculated using Felsenstein's (1981) pruning algorithm. Starting from the tips of the tree, we visit successively deeper nodes in the tree calculating z(j) for each visited node. The conditional probability of the observations on the subtree above node j is
where vL and vR are the lengths of the branches to the left and right of node j and i, kS (S = {A0, C0, G0, T0, A1, C1, G1, T1}) (Felsenstein 1981 ). We do this for every interior node on the tree until the basal interior node is reached (the node directly above the root tip). For this node, the conditional likelihoods are calculated in a similar manner but with three summations:
(vD is the length of the branch connecting the root node, s). Figure 2 illustrates the calculation of conditional likelihoods down the tree using the pruning algorithm. The probability of the observations at the ith site conditional on the tree τ, branch lengths, and substitution model parameters is then
where π = (πAπ0, πCπ0, πGπ0, πTπ0, πAπ1, πCπ1, πGπ1, πTπ1) are the stationary frequencies of the eight states. Assuming independence of the substitutions across sites, the probability of observing the aligned matrix of sequences is
and the likelihood of the model parameters is proportional to the probability of observing the data
where the constant C is arbitrary.
In this study, I am only interested in inferring the switching parameters s01 and s10. I partition the parameters of the substitution model, θ, such that all of the nuisance parameters are contained in the vector η = (rAC, rAG, rAT, rCG, rCT, πA, πC, πG). I eliminate the nuisance parameters of the model using integrated likelihood (Berger, Liseo, and Wolpert 1999 ). The integrated likelihood for the switch rates is
where the summation is over all possible trees and integration is over the space for branch lengths and all of the substitution parameters except the switching rates (η). f(vi, η | τi) is the prior for the branch lengths and substitution model parameters. I use uniform, or uninformative, priors for parameters of the substitution model. Specifically, I assume a uniform (0,50) prior for parameters of the substitution model, a uniform (0,100) prior for the switching rates, and a flat dirichlet (1,1,1,1) prior for the nucleotide frequencies. The prior probability of the trees is taken to be 1/B(s) and the branch lengths of the tree are given a uniform (0,10) prior. These prior probability distributions were chosen because they are flat and cover the range of parameter values observed by the author in earlier phylogenetic studies. The priors accommodate biologically plausible values for the parameters. For example, the branch lengths are given a uniform (0,10) prior. If the true branch lengths were greater than 10, then the sequences would have been difficult, or impossible, to align because of the high substitution rate. This was not the case; the very fact that biologists successfully align their sequences suggests that substitution rates have not reached a level of saturation.

The summation and multiple integration required to calculate the integrated likelihood for s01 and s10 are impossible to perform analytically. I use the program MrBayes 2.1 (Huelsenbeck and Ronquist 2001 ) to approximate the integrated likelihood, ℓ(s01, s10). The program is set up to perform Bayesian inference of phylogeny and uses a variant of Markov chain Monte Carlo to approximate the posterior probability density of parameters. However, when uninformative priors are used, as they are for this study, the posterior distribution of a parameter can be considered a rescaled likelihood surface. All Markov chains were run for 2,000,000 generations and sampled every 100th cycle. Inferences were based on a total of 15,000 sampled points for each chain (the first quarter of the samples was discarded as the portion of the chain sampled when not at apparent stationarity). Chains were initialized using randomly chosen trees.

Testing the Covarion Model

I test the null hypothesis that the rate of substitution over the tree is constant for a site. The null hypothesis, H0, is that the proportion of invariable sites model is true. The proportion of invariable sites model can be obtained from the Tuffley and Steel (1998) covariotide model by allowing s01 and s10 to approach zero with the constraint that (s10)/(s01 + ss10) = p, where p is the proportion of sites that are incapable of substitution. More specifically, the proportion of invariable sites model assumes that a site takes one of two rates: with probability p the site is invariable, meaning that it has a rate of zero; with probability 1 − p the rate at the site is multiplied by the factor 1/(1 − p). Note that rate variation across sites is still allowed under the null hypothesis. In fact, the model of rate variation across sites is quite sophisticated as it combines the proportion of invariable sites model with either the site-specific or gamma model for among-site rate variation. Under the null hypothesis, among-site rate variation is zero with probability p. With probability 1 − p the rate multiplier for a site (ρ, from above) is either drawn from a gamma distribution or sites are divided into first, second, and third codon positions and the rate multiplier, ρ, is estimated separately for each.

The alternative hypothesis, H1, allows the switching rates to vary freely. The likelihood ratio
contains information on the relative tenability of the competing hypotheses. For nested models. Λ ≤ 1 and −2 logeΛ is asymptotically χ2 distributed with ν degrees of freedom. The degrees of freedom is simply the difference in the number of free parameters between H1 and H0. The proportion of invariable sites model is nested in the covariotide model of Tuffley and Steel (1998) , and a likelihood ratio test is appropriate.
Although the models are nested, because the null model involves a restriction of the parameters at the boundary of the parameter space, the χ2 approximation does not hold (Whelan and Goldman 1999 ; Ota et al. 2000 ). For this study the appropriate null distribution is a mixture of χ20, χ21, and χ22 distributions (Self and Liang 1987 ). In particular, there are two parameters on the boundary of the parameter space and the other (nuisance) parameters of the model are not restricted to be on the boundary. Hence, the situation in this paper matches ‘case 7’ discussed in Self and Liang (1987) . The likelihood ratio test statistic, −2 logeΛ, is asymptotically distributed as a mixture of χ20, χ21, and χ22 distributions with mixing probabilities ½ − p, ½, and p, respectively. The mixing parameter, p ≤ 1/2 is calculated as
where the Iijs are the entries of the Fisher information matrix.
In this study the likelihood ratio test statistic was approximated by fitting a bivariate normal distribution to the switching rates sampled by the MCMC algorithm. The likelihood ratio was approximated by comparing the density at the maximum of the bivariate normal to the density at the restriction that s01 = s10 = 0. The posterior probability distribution is asymptotically normally distributed, and the normal approximation is known as the Laplace approximation to the posterior density (Laplace 1986 ; also see Raftery 1995 and Gamerman 1997 ). The normal approximation to the posterior density has been used in model testing of evolutionary hypotheses by Suchard, Weiss, and Sinsheimer (2001) . The bivariate normal has positive density for all possible values, whereas the switching model is restricted to have positive likelihoods for s01, s10 > 0. This should not be a problem, though, as I am only interested in knowing the relative heights of the bivariate normal at the maximum value and at the restriction; that is, I only need to know the likelihoods up to a constant in order to validly test the null hypothesis. For the purposes of determining the mixture of χ2 distributions (discussed previously) the mixing parameter, p, must be estimated. The mixing parameter is calculated using the information matrix, which for the bivariate normal approximation to the posterior density of s01 and s10 is

I checked convergence of the ergodic averages for the switching rates using Gelman's (1995)

\(\sqrt{\mathit{{\hat{R}}}}\)
statistic (the estimated potential scale reduction). This statistic summarizes the ratio of two estimates of the within- and between-chain variances for different Markov chains starting from random points in the parameter space. It is constructed such that the estimated potential scale reduction approaches 1 from above. Values of
\(\sqrt{\mathit{{\hat{R}}}}\)
less than 1.1 or 1.2 are considered to indicate an adequately run MCMC analysis (Gelman 1995 ).

Results and Discussion

Convergence of the Markov chains was checked in several ways. First, at least two chains were run for each data set. Each chain was started from parameter values chosen from the prior. In all cases, the chains converged to the same parameter values. One way of seeing this is summarized in figure 3 in which the log probability of observing the data is plotted against generation number. Note that all of the chains start with a poor combination of parameter values; but the chains quickly found parameter combinations with better likelihoods. Moreover, all of the chains reach a plateau, which indicates that the chains have probably reached stationarity. Based on the plots of the log probability of observing the data, I discarded the samples taken during the first 500,000 cycles of the chains as the “burn-in.” Second, I checked that the inferences that would be drawn about the switching rates did not differ across chains. Figure 4 shows that the points sampled by the independent chains (after burn-in) did not differ substantially. Finally, I confirmed convergence of the ergodic averages of the switching rates using Gelman's (1995)

\(\sqrt{\mathit{{\hat{R}}}}\)
statistic (the estimated potential scale reduction). In this study,
\(\sqrt{\mathit{{\hat{R}}}}\)
was less than 1.2 for all of the genes examined by the 500,000th cycle. All but two of the genes had
\(\sqrt{\mathit{{\hat{R}}}}\)
less than 1.01. Table 1 summarizes the estimated values for the switching rates for each data set. Point estimates were based on the mean of the marginal posterior probability distribution for each parameter. The intervals represent the 95% credible interval for each parameter.

A likelihood ratio test was performed for each gene, comparing the proportion of invariable sites model to the covarion model. The likelihood ratio was estimated by fitting a bivariate normal distribution to the switching rates sampled by the chains (see Gamerman 1997 ). Figure 5 shows the marginal likelihoods of the switching rates and normal plots for each parameter. The marginal distributions of a bivariate normal random variable are normal. The marginal distributions of the switching rates are nearly normally distributed, consistent with the joint distribution of the switching rates being bivariate normally distributed. The normal plots (fig. 5 ) show the 5% cut-offs of the cumulative distribution for the observed marginal distributions and the cut-offs expected if the marginal distributions were normal. If the likelihood surface were truly normal, then all of the points in figure 5 would fall along the diagonal. The observed values are close to the diagonal, showing that the marginal distributions are nearly normally distributed. In this case, it does not make sense to perform a statistical test of normality because the observations drawn using the MCMC procedure are not independent. Table 2 shows the estimated parameters of the bivariate normal distribution for each gene. Table 2 also gives the mixing parameter, p, for each gene.

Calculating the likelihood ratio involves comparing the maximum height of the bivariate normal to the value at (0,0). The likelihood ratio test statistic was significant in nine of the 11 genes examined (table 3 ) after a Bonferroni correction was applied, suggesting that a covariotide model of DNA substitution provides a much better explanation of the observed DNA sequences than a model that did not allow rates of substitution to vary across the phylogeny. One interpretation of this result is that functional constraints in a gene change over its evolutionary history. Note that the critical values for the likelihood ratio test differ for each gene. This is the case because the mixing proportions of χ2 distributions differ across genes.

This study used the integrated likelihood to evaluate models. This is a departure from the normal practice in phylogenetics in which nuisance parameters are eliminated by maximizing the likelihood. The maximization approach for eliminating nuisance parameters is sometimes referred to as the profile likelihood (Berger, Liseo, and Wolpert 1999 ). Berger, Liseo, and Wolpert (1999) provide a number of advantages for using the integrated likelihood over the profile likelihood, where possible. For example, the profile likelihood can be misleading when the likelihood surface has a sharp ridge, inferences based on the integrated likelihood accommodate uncertainty in the nuisance parameters, and integrated likelihoods can be simpler to calculate. Indeed, in this study, MCMC was a much more efficient method for inferring parameters. A simple maximization program used to calculate the profile likelihood on one data set took much longer to run than the MCMC approach.

The null hypothesis tested in this study—that switching rates were zero—has parameter values at the boundaries of the parameter space. This means that the normal asymptotic result, that the likelihood ratio test statistic is χ2 distributed with ν degrees of freedom, does not hold (Whelan and Goldman 1999 ; Ota et al. 2000 ). Instead, the null distribution is a mixture of χ2 distributions. Here, the mixture was (½ − p20, ½χ21, and pχ22, where p is a parameter that is calculated from the Fisher information matrix for the parameters. The 95% critical values for a test of the null hypothesis varied from 4.54 to 4.90 for the 11 genes examined here. The critical values differed because the switching rates were more or less correlated across the genes. If I had ignored the fact that the parameters were at the boundaries of the parameter space under the null hypothesis, I would have used the critical values of a χ2 distribution with ν = 2 degrees of freedom. The 95% critical value for a test of the null hypothesis would have been 5.99. In other words, the test would have been conservative. In this study, the null hypothesis was usually rejected decisively and the extra work spent calculating the exact critical values was probably unnecessary. However, for more subtle problems it may be important to use the exact critical values to avoid inappropriate rejection of the null hypothesis.

The likelihood approach taken here differs from that taken by Tuffley and Steel (1998) . Tuffley and Steel (1998) derived distance measures for the covarion model and noted that in principal the covarion model can be distinguished from a rates across sites model when there are at least four monophyletic groups of species. However, distance methods are generally not as powerful at distinguishing models or inferring trees as likelihood-based methods. The covarion model implemented in this study is somewhat different from the model used by Tuffley and Steel (1998) in that among-site rate variation is accommodated along with rate variation on the tree. Hence, this allows a test of the covarion model while accounting for the fact that different sites may have different inherent rates. Moreover, as implemented in this study, the covarion model is readily applied to phylogeny estimation.

Galtier (2001) implemented a model of rate variation over a tree that allows the rate to switch incrementally, with the rate increments determined by a discrete gamma distribution. The appropriate null hypothesis for the Galtier (2001) model is that rates are gamma distributed, and he tested this hypothesis using likelihood ratio tests. The covarion-like model implemented in this study differs from that implemented by Galtier (2001) , in that rates simply switch between off and on. Rate variation across sites is allowed in the variant of the Tuffley and Steel (1998) model implemented in this study because the rate of substitution, when the switching process is on, can vary from one site to another. Specifically, rates across sites followed either the site-specific model or the gamma model. The null hypothesis tested in this study, then, was the proportion of invariable sites model plus either site-specific or gamma-distributed rate variation. Which covarion-like model best fits biological sequences remains an open and interesting question.

To date, the covarion-covariotide model of DNA substitution has found little application in comparative sequence analysis. This study, and two earlier studies of three genes (Lockhart et al. 1998 ; Galtier 2001 ), suggests that a model that allows rates to change through time may provide a much better explanation of observed DNA sequences. The covarion model may also prove important in phylogeny reconstruction; rates of change that vary over the phylogenetic history of a group may mislead phylogenetic methods that fail to accommodate this process (Lockhart et al. 1998 ). Finally, the general methodology used to accommodate rate variation over time may also prove useful for accommodating other processes that are known to vary over the evolutionary history of organisms, such as the frequency of the nucleotides G and C. Some of these processes are known to mislead phylogenetic methods (Lockhart et al. 1992 ), especially for ancient groups, and any method that can estimate the rate at which substitution parameters change over evolutionary time might lead to new insights about the process of DNA substitution.

Program Availability

The covarion model is implemented in the program MrBayes 2.01. MrBayes performs Bayesian inference of phylogeny using the Metropolis-Hastings algorithm. The program is available via the World Wide Web at http://brahms.biology.rochester.edu (on the software page). The covarion model can be activated using the command “lset covarion=yes.”

Mike Hendy, Reviewing Editor

Keywords: covarion-covariotide model integrated likelihood likelihood ratio Markov chain Monte Carlo maximum likelihood mixed chi-square

Address for correspondence and reprints: John P. Huelsenbeck, Department of Biology, University of Rochester, Rochester, New York 14627. [email protected] .

Table 1 Parameter Estimates for the Switching Rates. Point Estimates are the Mean of the Posterior Probability Distribution and the Intervals Represent the 95% Credible Interval for Each Parameter

Table 1 Parameter Estimates for the Switching Rates. Point Estimates are the Mean of the Posterior Probability Distribution and the Intervals Represent the 95% Credible Interval for Each Parameter

Table 2 The Values of s10 and s01 Sampled by the Markov Chain were Fit to a Bivariate Normal Distribution. The Parameters of the Bivariate Normal Distribution are Given in This Table and are the Means (μs10 and μs01) and Standard Deviations (σs10 and σs01) of Each Parameter and the Correlation Coefficient (ρ). The Appropriate Null Distribution for a Test of the Null Hypothesis that Both Rates Equal 0 is a Mixture of χ2 Distributions: (½ − p02 +½ χ12 + pχ22. p is a Parameter Related to the Correlation in the Parameters that Determines the Mixture of χ2 Distributions

Table 2 The Values of s10 and s01 Sampled by the Markov Chain were Fit to a Bivariate Normal Distribution. The Parameters of the Bivariate Normal Distribution are Given in This Table and are the Means (μs10 and μs01) and Standard Deviations (σs10 and σs01) of Each Parameter and the Correlation Coefficient (ρ). The Appropriate Null Distribution for a Test of the Null Hypothesis that Both Rates Equal 0 is a Mixture of χ2 Distributions: (½ − p02 +½ χ12 + pχ22. p is a Parameter Related to the Correlation in the Parameters that Determines the Mixture of χ2 Distributions

Table 3 Likelihood Ratio Test Statistics for a Test of the Null Hypothesis that Rates are Constant on a Phylogenetic Tree. The Likelihood Ratio, Λ, is the Ratio of the Likelihood of the Null Hypothesis (constant rates over the tree) to the Alternative (rates are allowed to vary through time). The 11 Sampled Genes Varied in the Number of Species (s) and the Number of Sites (c). The Critical Values for Each Gene Differed Because the Correlation in the Switching Rates (Table 2) Differed for Each

Table 3 Likelihood Ratio Test Statistics for a Test of the Null Hypothesis that Rates are Constant on a Phylogenetic Tree. The Likelihood Ratio, Λ, is the Ratio of the Likelihood of the Null Hypothesis (constant rates over the tree) to the Alternative (rates are allowed to vary through time). The 11 Sampled Genes Varied in the Number of Species (s) and the Number of Sites (c). The Critical Values for Each Gene Differed Because the Correlation in the Switching Rates (Table 2) Differed for Each

Fig. 1.—An example of a phylogenetic tree of s = 8 sequences. The tips of the tree are labeled 1, 2, …, s and the internal nodes are labeled s + 1, s + 2, …, 2s − 2. The tree is drawn rooted at sequence 8.

Fig. 1.—An example of a phylogenetic tree of s = 8 sequences. The tips of the tree are labeled 1, 2, …, s and the internal nodes are labeled s + 1, s + 2, …, 2s − 2. The tree is drawn rooted at sequence 8.

Fig. 2.—Likelihoods are calculated by performing a postorder traversal of the tree, visiting internal nodes sequentially. The conditional likelihoods are calculated by either considering the two descendant nodes from node j (top example) or the three nodes incident to the basal internal node 2s − 2 (bottom example)

Fig. 2.—Likelihoods are calculated by performing a postorder traversal of the tree, visiting internal nodes sequentially. The conditional likelihoods are calculated by either considering the two descendant nodes from node j (top example) or the three nodes incident to the basal internal node 2s − 2 (bottom example)

Fig. 3.—Plots of the log probability of observing the data (y-axis) versus generation number (x-axis). For each data set, the complete range of log probabilities explored by the chain and a close-up of the log probabilities visited after stationarity are shown

Fig. 3.—Plots of the log probability of observing the data (y-axis) versus generation number (x-axis). For each data set, the complete range of log probabilities explored by the chain and a close-up of the log probabilities visited after stationarity are shown

Fig. 4.—Sampled values for s01 and s10 for the 11 genes examined in this study. The likelihood surface is peaked about the sampled values

Fig. 4.—Sampled values for s01 and s10 for the 11 genes examined in this study. The likelihood surface is peaked about the sampled values

Fig. 5.—The marginal probability distributions of the switching rates is nearly normal. The first two columns show the marginal probability for s01 and s10 and the last two columns are normal plots

Fig. 5.—The marginal probability distributions of the switching rates is nearly normal. The first two columns show the marginal probability for s01 and s10 and the last two columns are normal plots

I thank P. Lockhart for providing the tufA DNA sequences and D. Swofford and J. P. Masly for helpful suggestions. J.P.H. was supported by NSF grants MCB-0075404 and DEB-0075406.

References

Arnason U., A. Gullberg, A. Janke,

1997
Phylogenetic analyses of mitochondrial DNA suggest a sister group relationship between Xenartha (Edentata) and Ferungulates
Mol. Biol. Evol
14
:
762
-768

Atrian S., L. Sánchez-Pulido, R. Gonzàlez-Duarte, A. Valencia,

1998
Shaping of Drosophila alcohol dehydrogenase through evolution: relationship with enzyme functionality
J. Mol. Evol
47
:
211
-221

Berger J. O., B. Liseo, R. L. Wolpert,

1999
Integrated likelihood methods for eliminating nuisance parameters
Stat. Sci
14
:
1
-28

Bollback J. P., J. P. Huelsenbeck,

2001
Phylogeny, genome evolution, and host specificity of single-stranded RNA bacteriophage (Family Leviviridae)
J. Mol. Evol
52
:
117
-128

Brown W. M., E. M. Prager, A. Wang, A. C. Wilson,

1982
Mitochondrial DNA sequences of primates, tempo and mode of evolution
J. Mol. Evol
18
:
225
-239

Felsenstein J.,

1981
Evolutionary trees from DNA sequences: a maximum likelihood approach
J. Mol. Evol
17
:
368
-376

Fitch W. M., E. Markowitz,

1970
An improved method for determining codon variability in a gene and its application to the rate of fixation of mutations in evolution
Biochem. Genet
4
:
579
-593

Galtier N.,

2001
Maximum-likelihood phylogenetic analysis under a covarion-like model
Mol. Biol. Evol
18
:
866
-873

Gamerman D.,

1997
Markov chain Monte Carlo Chapman & Hall, London

Gelman A.,

1995
Inference and monitoring convergence Pp. 131–144 in W. R. Gilks, S. Richardson, and D. J. Spiegelhalter, eds. Markov chain Monte Carlo in practice. Chapman & Hall, London

Huelsenbeck J. P., F. Ronquist,

2001
MrBayes: Bayesian inference of phylogenetic trees
Bioinformatics
17
:
754
-755

Hughes A. L., M. Nei,

1988
Pattern of nucleotide substitution at major histocompatibility complex loci reveals overdominant selection
Nature
355
:
167
-170

Jukes T. H., C. R. Cantor,

1969
Evolution of protein molecules Pp. 21–123 in H. N. Munro, ed. Mammalian protein metabolism. Academic Press, New York

Kuno G., G.-J. J. Chang, K. R. Tsuchiya, N. Karabatsos, C. B. Cropp,

1998
Phylogeny of the genus Flavivirus
J. Virol
72
:
73
-83

Laplace P. S.,

1986
Memoir on the probability of causes of events (translation by S. Stigler)
Stat. Sci
1
:
364
-378

Lockhart P. J., C. J. Howe, D. A. Bryant, T. J. Beanland, A. W. D. Larkum,

1992
Substitutional bias confounds inference of cyanelle origins from sequence data
Mol. Biol. Evol
34
:
153
-162

Lockhart P. J., M. A. Steel, A. C. Barbrook, D. H. Huson, M. A. Charleston, C. J. Howe,

1998
A covariotide model explains apparent phylogenetic structure of oxygenic photosynthetic lineages
Mol. Biol. Evol
15
:
1183
-1188

Ota R., P. J. Waddell, M. Hasegawa, H. Shimodaira, H. Kishino,

2000
Appropriate likelihood ratio tests and marginal distributions for evolutionary tree models with constraints on parameters
Mol. Biol. Evol
17
:
798
-803

Raftery A. E.,

1995
Hypothesis testing and model selection Pp. 163–187 in W. R. Gilks, S. Richardson, and D. J. Spiegelhalter, eds. Markov chain Monte Carlo in practice. Chapman & Hall, London

Self S. G., K.-Y. Liang,

1987
Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions
J. Am. Stat. Assoc
82
:
605
-610

Suchard M. A., R. E. Weiss, J. S. Sinsheimer,

2001
Bayesian selection of continuous-time Markov chain evolutionary models
Mol. Biol. Evol
18
:
1001
-1013

Takahashi K., A. P. Rooney, M. Nei,

2000
Origins and divergence times of mammalian class II MHC gene clusters
J. Hered
91
:
198
-204

Tavare S.,

1986
Some probabilistic and statistical problems on the analysis of DNA sequences Pp. 57–86 in Lectures in mathematics in the life sciences, Vol. 17. American Mathematical Society

Tuffley C., M. Steel,

1998
Modeling the covarion hypothesis of nucleotide substitution
Math. Biosci
147
:
63
-91

Whelan S., N. Goldman,

1999
Distributions of statistics used for the comparison of models of sequence evolution in phylogenetics
Mol. Biol. Evol
16
:
1292
-1299

Whiting M. F., J. C. Carpenter, Q. D. Wheeler, W. C. Wheeler,

1997
The Strepsiptera problem: phylogeny of the holometabolous insect orders inferred from 18S and 28S ribosomal DNA sequences and morphology
Syst. Biol
46
:
1
-68

Yang Z.,

1993
Maximum-likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods
J. Mol. Evol
39
:
306
-314

———.

1994
Maximum likelihood estimation of phylogeny from DNA sequences when substitution rates differ across sites
Mol. Biol. Evol
10
:
1396
-1401

———.

1996
Among-site rate variation and its impact on phylogenetic analyses
TREE
11
:
367
-372

Yang Z., R. Nielsen, N. Goldman, A.-M. K. Pedersen,

2000
Codon-substitution models for heterogeneous selection pressure at amino acid sites
Genetics
155
:
431
-449