Bayesian Estimation of Nonsynonymous/Synonymous Rate Ratios for Pairwise Sequence Comparisons

The nonsynonymous/synonymous rate ratio (ω = dN/dS) is an important measure of the mode and strength of natural selection acting on nonsynonymous mutations in protein-coding genes. The simplest such analysis is the estimation of the dN/dS ratio using two sequences. Both heuristic counting methods and the maximum-likelihood (ML) method based on a codon substitution model are widely used for such analysis. However, these methods do not have nice statistical properties, as the estimates can be zero or infinity in some data sets, so that their means and variances are infinite. In large genome-scale comparisons, such extreme estimates (either 0 or ∞) of ω and sequence distance (t) are common. Here, we implement a Bayesian method to estimate ω and t in pairwise sequence comparisons. Using a combination of computer simulation and real data analysis, we show that the Bayesian estimates have better statistical properties than the ML estimates, because the prior on ω and t shrinks the posterior of those parameters away from extreme values. We also calculate the posterior probability for ω > 1 as a Bayesian alternative to the likelihood ratio test. The new method is computationally efficient and may be useful for genome-scale comparisons of protein-coding gene sequences.


Introduction
The nonsynonymous/synonymous rate ratio (! = d N /d S ) is an important measure of the mode and strength of natural selection acting on protein-coding genes (Kimura 1977). A number of methods have been developed to estimate ! from pairwise sequence alignments, ranging from heuristic counting methods (Li et al. 1985;Nei and Gojobori 1986;Yang and Nielsen 2000) to maximum-likelihood (ML) methods based on an explicit Markov model of codon evolution (Goldman and Yang 1994). ML estimates (MLEs) of ! for thousands of genes are routinely calculated as descriptive statistics in genomic comparisons (Nielsen et al. 2005;Ge et al. 2008;Walters and Harrison 2010;Buschiazzo et al. 2012;Gladieux et al. 2013;Wang and Chen 2013). Although the ML method for pairwise comparisons produces reasonable estimates of ! and sequence distance (t) for most data sets, it suffers from a few problems when the data sets are extreme. For example, the MLE of ! (!) is 0 when the two compared sequences have only synonymous differences and 1 when they have only nonsynonymous differences. Similarly, when the sequences are identical, the MLEt is 0 and! is not unique. When the sequences are very divergentt may be 1.
Because of these infinite or undefined estimates, neither! nort have finite means or variances. Extreme values of! andt are commonly encountered in genome-level comparisons of thousands of genes, and those extreme estimates cause difficulties with the calculation of summary statistics (such as mean! andt across all genes in the genome). An estimation method that always produces finite and reasonable estimates for ! and t is thus desirable. Here, we develop a Bayesian method to calculate the posterior means of ! and t between two sequences, denoted! andt. Using computer simulation, we show that the posterior means of ! and t are well behaved and have better Frequentist properties than the MLEs. We then use ML and the new Bayesian method to estimate ! and t from pairwise gene alignments for the genomes of four mammals (human, chimpanzee, mouse, and rat) and three bacterial strains (Escherichia coli O157:H7, E. coli K-12, and Salmonella typhimurium LT2). We show that extreme MLEs of ! and t are common in these data sets, and that the Bayesian method produces finite, well-behaved estimates. The new Bayesian method is computationally efficient and is implemented in the CODEML program of the PAML package (Yang 2007).

New Bayesian Approach to Estimate x and t
Here, we summarize the main features of the new Bayesian approach. The joint posterior distribution of t and ! given the data x (the pairwise sequence alignment) is where f(x j t, !) is the likelihood or the probability of observing the data x given t and !, f(t, !) is the prior and C = R R f(x j t, !) f(t, !)dtd! is the normalizing constant. The posterior is proportional to the product of the likelihood and the prior. If the model involves the transition/transversion rate ß The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/ licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. ratio (), its MLE () is used. If the model involves nucleotide or codon frequency parameters, they are estimated using the observed frequencies. When the data are informative, the likelihood dominates the posterior. When the data are uninformative, the prior may have a strong influence on the posterior. Here, we use two independent gamma distributions to construct the joint prior of t and !: where the gamma density G(x j , ) has mean / and variance / 2 . Here, the prior means of t and ! are 1 and 0.5, respectively, and the shape parameter = 1.1 indicates that the priors are quite diffuse. This joint prior has a mode away from (0,0) and the prior density decays to 0 as either t or ! approaches 1, thus penalizing extreme values. The likelihood is calculated from a pairwise sequence alignment using a codon substitution model (Yang and Nielsen 1998). As point estimates of ! and t we use their posterior means The posterior variances and covariance of ! and t can be similarly defined and can be calculated using standard numerical techniques. We use Gaussian quadrature to calculate all integrals numerically. We use similar techniques to calculate P(! > 1 j x), the posterior probability that ! > 1, which may be compared with the likelihood ratio test (LRT) of the null hypothesis H 0 : ! = 1 (see Methods and Materials). We consider five different scenarios in which the numerical calculations of the integrals may differ. We simulated five data sets to represent those five scenarios, each consisting of 2 sequences of 100 codons, with different numbers of synonymous (S d ) and nonsynonymous (N d ) differences. The posterior and likelihood surfaces for the five cases are shown in figure 1.
Case I: . This is the most common case, with both synonymous and nonsynonymous differences observed. The data are quite informative about ! and t and the posterior distribution resembles the likelihood (fig. 1A 0 and A). In our example data set, we have S = 73.7, N = 226.3, S d = 18.5, N d = 6.5, where S and N are the numbers of synonymous and nonsynonymous sites. The MLEs aret = 0.30 and ! = 0.11 whereas the posterior means aret = 0.31 and ! = 0.13.
Case II: (S d = N d = 0). In this case, the two sequences are identical. The likelihood is maximized when t = 0 and when t = 0, ! has no effect on the likelihood, so the MLE of ! is not unique (fig. 1B). In our example, S = 73.3, N = 226.7, S d = N d = 0. The posterior has a single mode and the posterior means aret = 0.011 and! = 0.496 ( fig. 1B 0 ). Note that the posterior mean of ! is almost equal to the prior mean, since the data are uninformative about !. Also, the posterior mean is markedly different from the posterior mode, because the posterior distribution is highly skewed.
Case V: (S d > > 0, N d > > 0). The two sequences are so divergent that they look like random sequences (S = 75.9, N = 224.1, S d = 75, N d = 175). Here, the likelihood increases with the increase of both t and !, with the MLEs att = 1 and! = 1 ( fig. 1E). In the Bayesian analysis, the prior penalizes large values and thus the posterior means aret = 10.31 and! = 0.72 ( fig. 1E'). Note that the posterior mean of ! is close to the prior mean, since the data of two nearly random sequences are uninformative about !.
These five cases illustrate how the prior influences the posterior depending on whether the data are informative or not. The posterior means of t and ! are finite for all five cases, whereas the MLEs are not. We note that because the MLEs of t and ! may be infinite, their mean square errors (MSEs) are 1 as well. The MSEs of the posterior means are in contrast always well defined. In this sense, the posterior mean estimates have better Frequentist properties than the MLEs. In the next section, we study the statistical properties of the Bayesian estimates of t and ! using simulated and real data, in comparison with the MLEs. We calculate the MSEs of the MLEs by excluding the infinite estimates.

Analysis of Simulated Data
To examine the statistical properties of the posterior estimates of t and !, we conducted a computer simulation. The program EVOLVER from the PAML package (Yang 2007) was used to generate pairwise sequence alignments of length L c = 500 codons. We used t = 0.1, 0.5, 1, and 5 and ! = 0.01, 0.1, 0.5, and 2 (16 combinations) with transition/ transversion rate ratio = 2 and equal codon frequencies (1/61) to generate the data sets. The number of replicates was 10,000. The simulated data sets were analyzed using both ML and the new Bayesian method using the CODEML program (Yang 2007). The same prior (eq. 2) was used for all data sets. Equal codon frequencies are assumed in the model (Fequal model). Figures 2 and 3 show the histograms (smoothed densities) of posterior mean estimates and MLEs of t and !. As we see in figure 2, ML and Bayesian results are nearly identical for all combinations of ! = 0.1 and 0.5 and t = 0.5 and 1. However, for ! = 0.01, Bayesian estimates of ! are shifted to the right (too large) for all t values, as the prior for ! has a mean of 0.5 and affects the posterior estimates. For ! = 2, posterior estimates of ! are shifted to the left (too small) due to the prior.
Generally, both methods behave best (estimates are more concentrated around the true value) for intermediate distances (t = 0.5 and 1), because sequences of moderate divergences are the most informative. The estimates of t show similar patterns ( fig. 3). Although for t = 0.5 and 1 the Bayesian estimates are almost identical to the MLEs, for t = 0.1 Bayesian results are slightly shifted to the right (too large) and for t = 5 they are shifted to the left (too small).
The means of the Bayesian and ML estimates, the square root of the MSE ( ffiffiffiffiffiffiffiffi ffi MSE p ), and the 2.5% and 97.5% percentiles of estimates from the 10,000 simulations are presented in tables 1 and 2. Those for the ML method are calculated after the infinite estimates are removed. We see that for highly similar (t = 0.1) and highly divergent (t = 5) sequences, the prior has a noticeable impact. For example, when t = 0.1 the mean of Bayesian estimates of ! is 0.02 when the true ! = 0.01 and is 1.591 when the true ! = 2.0. The mean MLEs are in comparison closer to the true values than the means of Bayesian estimates. However, the means for the MLEs are calculated after data sets in which! = 1 are excluded, whereas those same data sets are included in the calculation of the Bayesian estimates. Similar patterns are observed concerning estimates of t. Moreover, for small and intermediate ! and t, ML and Bayesian methods have similar MSE, but for large ! and t, the Bayesian has smaller MSE indicating that in those cases Bayesian estimates are preferable to the MLEs. We also considered a test of positive selection, indicated by ! > 1. For ML, a LRT is used to compare H 0 : ! = 1 against H 1 : ! > 1, at the 5% significance level. In the Bayesian framework, we require the posterior probability to exceed the threshold P(! > 1 j x) > 0.95. For the true ! = 0.01, 0.1, 0.5, no data sets showed significant positive selection by either method. When the true ! = 2 and t = 0.5, 1, 5, both methods correctly detect positive selection in almost 100% of the replicate data sets, so that the power of detecting positive selection is high in both methods but with the LRT having more power (table 1).
When ! = 2 and t = 0.1, positive selection is detected in 35% and 61% of data sets by the Bayesian and ML methods, respectively. In this case, given the short sequence distance, the prior has quite some impact on the ability of the Bayesian method to detect selection. In particular, the prior mean (! = 0.5) is smaller than the true value (! = 2), so that! is shrunk away from 1.

Analysis of Real Data
We applied both ML and Bayesian methods to estimate ! and t for pairwise alignments of protein-coding genes from four mammalian genomes (human, chimpanzee, mouse, and rat) and from three bacterial genomes (E. coli O157:H7, E. coli K-12, and S. typhimurium LT2). In all analyses, the codon frequencies were estimated by using the observed codon frequencies in the genes (the F61 model).

Analysis of the Mammalian Data Set
We conducted three sets of pairwise comparisons: human versus chimpanzee, human versus mouse, and mouse versus rat. Figure 4 shows the distributions (smoothed histograms) of posterior means and the MLEs of t and ! in those comparisons. In the human-chimpanzee comparison, the Bayesian ! estimates are slightly shifted to the right compared with the MLEs for low ! values and shifted to the left for high ! values. The mean, median, and 25% and 75%   those 1,121 with! > 1 has! = 1.22 (95% confidence interval-CI 0.37-4.01) and posterior mean! = 0.93 (95% credibility interval-CI 0.36-2.43). This gene has a length of 262 codons and has a small evolutionary distance witht = 0.043 (95% CI 0.024-0.077) andt = 0.047 (95% CI 0.027-0.082), so that the prior has an impact. Another gene has! = 1.27 (95% CI 0.75-2.16) and! = 1.13 (95% CI 0.60-2.13). This gene is 257 codons in length and the ML and Bayesian distance estimates are 0.17 (95% CI 0.13-0.24) and 0.18 (95% CI 0.13-0.24), respectively. The second gene has a similar length to the first but because the sequence distance is greater, the prior is much less important. In a third gene, of length 1,019 codons, the MLEs aret = 0.041 (95% CI 0.030-0.056) and ! = 1.27 (95% CI 0.77-2.07), compared with the Bayesian estimatest = 0.042 (95% CI 0.031-0.057) and! = 1.13 (95% CI 0.59-2.14). In this case, the effect of the prior is unimportant, because the gene is long. Among the 1,121 genes with! > 1 only 78 have statistically significantly evidence of positive selection, based on the LRT ( = 5%) (table 4). All the 78 genes have the posterior mean! > 1. Moreover, out of them, three showed strong evidence of positive selection in the Bayesian analysis, with P(! > 1 j x) > 0.95 (table 4). The difference (78 vs. 3 genes) in the number of genes with ! > 1 between the ML and the Bayesian method is consistent with the general expectation that the LRT tends to reject the null more readily than the Bayesian analysis. It is also consistent with the results observed in the computer simulations for t = 0.1 and ! = 2. We note that the three genes significant in the Bayesian analysis have fairly large sequence divergences, witht & 0.1, whereas the other 75 genes (for which the LRT is significant but the Bayesian evidence is not strong) have highly similar sequences, witht < 0.07 (with median 0.021).
In the human-mouse comparison, the ML and Bayesian estimates are very similar. The sequence divergence is intermediate, the data are informative, and the prior does not have a noticeable impact. There are very few cases where the MLEs are extreme (0 or 1). Also, the number of genes showing ! estimates >1 are nearly the same between the two methods (7 vs. 6) and the same two genes show significant evidence for positive selection by both methods. The mouse-rat comparison shows similar patterns to the human-mouse comparison: in both cases, the sequences are moderately divergent and the data are informative.
To examine the sensitivity of posterior estimates of t and ! to the prior, we reanalyzed the human-chimpanzee and human-mouse alignments using two alternative priors: AP1 and AP2. The first alternative prior (AP1) is t~G(2, 2) and !~G(2, 4). This has the same means as the default prior of equation (2) but the prior is more informative because of the larger shape parameter (2 vs. 1.1). In the second alternative prior (AP2), we used 2 for the shape parameter, but chose the rate parameter such that the prior mean roughly matches the median of the MLEs for all genes (table 3). Thus, for the human-chimpanzee comparison, AP2 is t~G(2, 100), with the prior mean 0.02 (while the median of MLEs of t is 0.016), and !~G(2, 10), with the prior mean 0.2 (while the median of MLEs of ! is 0.193). For the human-mouse comparison, AP2 specifies t~G(2, 3), with the prior mean 0.67 (while the median of the MLEs is 0.686) and !~G(2, 20), with the prior mean 0.1 (the median of the MLEs is 0.089). While it is in general not advisable to use the data to specify the prior, we note that in specific comparisons, some prior information may be available. For example, between the human and the chimpanzee, the distance t is very likely to be smaller than 0.1.
Posterior estimates of ! and t from the analysis using the default and alternative priors are illustrated in figures 5 and 6. In the human-chimpanzee comparison, the impact of the prior is apparent. The Bayesian ! estimates using the AP1 are higher than those using the default prior for low ! values (! < 0.5) and lower for high ! values (! > 0.5) ( fig. 5A). With a more informative prior (shape parameter 2), the posterior means are closer to the prior mean 0.5. For the humanmouse comparison estimates under AP1 are close to those under the default prior ( fig. 5B). The Bayesian estimates of t are less affected by the change in the prior in both comparisons and the estimates are approximately the same for the majority of the genes ( fig. 6A and B). Prior AP2 has a more significant effect. In both comparisons, the Bayesian estimates of ! are smaller than those obtained using the default prior for almost all genes ( fig. 5C and D). The priors are more informative (with shape parameter = 2) and have

Analysis of the Bacterial Data Set
We conduct two pairwise comparisons: E. coli K-12 versus E. coli O157:H7 and E. coli K-12 versus S. typhimurium LT2. Note that the two strains of E. coli have the same evolutionary distance from the Salmonella. The sequences from the two E. coli strains are very similar, and the prior has an impact on Bayesian estimates, similar to the comparison of the human and chimpanzee genes. The mean, median, and 25% and 75% percentiles of the Bayesian ! estimates are 0.179, 0.116, and (0.055, 0.252) while the corresponding results for the MLEs are 0.099, 0.034, and (0.001, 0.110). The two methods are thus very different in analysis of those genes. Also, the MLE! = 0 in 912 genes and! = 1 in 31 genes.
None of the genes with! > 1 is statistically significant at the = 5% significance level according to the LRT and none has P(! > 1 j x) > 0.95 (table 4). The gene sequences from the E. coli K-12 and Salmonella are quite divergent. In most genes, the two methods produced similar estimates ( fig. 4). However, some genes are very divergent with the MLEt = 1 in 217 genes.

Discussion
We suggest that if possible one should conduct joint comparative analysis of multiple protein-coding gene sequences on a phylogeny, instead of pairwise comparisons. In particular, a number of LRTs have been developed to detect positive selection that affects particular evolutionary lineages on the phylogeny or individual sites in the protein (see, e.g., Yang [2006a] and Cannarozzi and Schneider [2012], for reviews). To apply such tests of positive selection, it is essential to use multiple sequences, as a pair of sequences hardly contains enough information for the tests to have any power (e.g., Yang 2006b). Some proteins may evolve in an episodic manner and thus adaptive episodes may not be detected in pairwise comparisons, especially when the sequences are distantly related (Messier and Stewart 1997). In a pairwise comparison, positive selection is detected only if the ! averaged over all sites in the protein and over the whole evolutionary history connecting the two sequences is >1. This seems to be an extremely stringent criterion. Analysis of multiple sequences on a phylogeny allows one to detect episodic positive selection that affects a particular branch (Yang 1998).
Nevertheless, we note that pairwise sequence comparisons are widely used, especially in comparative genomics, sometimes to provide summary statistics of the data and sometimes because of lack of a third genome. The ML method has been used to estimate ! and t in pairwise comparisons of genes (e.g., Nielsen     Data Bayesiañ NOTE.-N L is the number of genes with statistically significant! > 1 based on the LRT at the 5% level (one-sided with critical value 2.71) in the likelihood method, whereas N B is the number of genes with P(! > 1 j x) > 0.95 in the Bayesian analysis.

1909
Bayesian Estimation of Nonsynonymous/Synonymous Rate Ratios . doi:10.1093/molbev/msu142 MBE Gladieux et al. 2013;Wang and Chen 2013). Counting methods are also used due to their simplicity (Garcia-Gil et al. 2003;Schenekar et al. 2011;Graves et al. 2013), even though they were found not to perform as well as ML in computer simulations (Yang and Nielsen 2000). Both counting and ML methods sometimes return 0 or 1 as estimates, so that neither the expectation nor the variance of the estimates is finite. The infinity estimates of ! appear to be particularly confusing to many users of the methods. To avoid such extreme estimates, some authors (e.g., Novaes et al. 2008;Bajgain et al. 2011;Pellino et al. 2013) added a small arbitrary number (pseudocounts) to the numbers of synonymous and nonsynonymous substitutions before calculating !. Other authors excluded genes with d S = 0 from their analysis (e.g., Wang and Chen 2013). The Bayesian method implemented here may provide a better procedure than such ad hoc treatments. It always returns finite estimates of ! and t as the prior penalizes extreme values. Our computer simulation suggests that the Bayesian estimates of ! have nice statistical properties, with similar or smaller MSEs compared with the MLEs. The posterior means are close to the MLEs when the data are informative, that is, when the sequences are long and the sequence divergence is intermediate, but the differences can be large when the sequences are short and are either too similar or too divergent. Nearly identical sequences contain little information while extremely divergent sequences contain too much noise concerning !. In both cases, the data are not informative and the prior has an impact on posterior estimates of !. However, as sequence length increases the effect of the prior decreases irrespective of the true values of ! and t. Our Bayesian method is used for the analysis of only two sequences. A Bayesian method for the analysis of multiple sequences in a phylogeny requires calculation of high-dimensional integrals and is not pursued here.
We emphasize that MLEs! = 1 should not be taken as evidence for positive selection (! > 1) because the extreme estimate may well be due to chance effects when the numbers of changes are small. Instead, positive selection can be claimed only if the LRT is significant in the ML framework or when P(! > 1 j x) > 0.95 in the Bayesian analysis.

Program Availability
The Bayesian method of this article is implemented in the CODEML program in the PAML package. The program allows the user to specify gamma priors for t and !. Although the Bayesian method is computationally more intensive than ML, it remains fast enough for large-scale screening. It takes 1-2 s to analyze one pair of sequences on a modern PC.

Methods and Materials
Theory We use a simplified version of the model of Goldman and Yang (1994) to model the evolution of codon sequences (Yang and Nielsen 1998). The model accounts for the genetic code structure, the transition/transversion rate ratio, the codon frequencies as well as the d N /d S rate ratio !. The instantaneous substitution rate from codon i to codon j (i 6 ¼ j) is given by where j is the equilibrium frequency of codon j. Stop codons are not considered (they are assumed not to occur within protein-coding genes). Therefore, the substitution rate matrix Q = {q ij } is of size 61 Â 61 for the standard genetic code. The rate matrix is scaled so that the average rate of codon substitution equals À P 61 i¼1 i q ii ¼ 1, and thus time is measured by the expected number of nucleotide substitutions per codon site. We use standard theory to calculate the transition probability matrix over time t as P(t) = exp(Qt). The likelihood function on a pairwise sequence alignment x is where i and j are the observed codons in the two sequences at site h and L c is the number of codons. The joint posterior distribution of ! and t is given by equation (1). If is a parameter in the model we replace it with its MLE (). If the two sequences are identical so that is not unique, we fix it at 2. Besides the posterior means of ! and t given in equations (3) and (4), we also calculate the posterior variances and covariance Varð! j xÞ ¼ Eð! 2 j xÞ À Eð! j xÞ Varðt j xÞ ¼ Eðt 2 j xÞ À Eðt j xÞ
Consider the calculation of the normalizing constant C. All other integrals are calculated in the same way. We write g(t, !) = f(x j t, !) f(t, !). To avoid overflows and underflows, we set h(t, !) = exp{log[g(t, !)]Àl max }, where l max is the maximum of g(t, !), a constant chosen for scaling. The normalizing constant can then be written as We use the Gaussian quadrature method to calculate all integrals numerically, which uses Legendre polynomials to approximate any continuous integrand function f(x, y): The weights w i and w j and the points x i and y j at which the integrand is evaluated are predetermined given the total number of points n. In our case, the limits of the integrals are 0 and 1 and we have to use a transformation to map the (0, 1) limits to (À1, 1). A much more serious problem is that the integrand g may be spiky (i.e., it is highly concentrated in a very small interval) and the approximation will be very poor if the sampled points miss the spike in the integrand. The rationale behind our transformation is to find a probability density function (PDF) that has a similar shape to the integrand g(t, !) and then we use its cumulative distribution function (CDF) to transform the integrand. Note that if the chosen PDF matches the posterior exactly, the new integrand will become perfectly flat after the transformation. The logistic distribution is used for that purpose.
Let x 1 = logt~Logistic( 1 , 1 ) and x 2 = log!~Logistic ( 2 , 2 ). For any random variable x~Logistic(, ) the CDF is F L x ð Þ ¼ 1 1 + e ÀðxÀÞ= . Thus, for equation (10), we use the following transformation (change of variables): Thus, equation (10) becomes rðz 1 , z 2 Þdz 1 dz 2 % exp l max ð Þ X n i,j¼1 w i w j rðz 1 i , z 2 j Þ, where r z 1 , z 2 ð Þ¼hðt, !Þ 2t 1 1Àz 2 1 2! 2 1Àz 2 2 and t and ! are given by equations (12) and (13), respectively. We transform all other integrals in equations (3), (4), and (7)-(9) in the same way. Thus, we have Eð! j xÞ % 1 A P n i,j¼1 w i w j ! i rðz 1 i , z 2 j Þ, Eðt j xÞ % 1 A P n i,j¼1 w i w j t j rðz 1 i , z 2 j Þ, Eð! 2 j xÞ % 1 A P n i,j¼1 w i w j ! 2 i rðz 1 i , z 2 j Þ, Eðt 2 j xÞ % 1 A P n i,j¼1 w i w j t 2 j rðz 1 i , z 2 j Þ, where A = Cexp(Àl max ). Notice that the exponential term exp(l max ) cancels out during calculations. Our Bayesian calculation is performed after the MLEs are obtained. Thus if botht and! are finite, away from 0 and the observed p S and p N (proportion of synonymous differences per synonymous site and proportion of nonsynonymous differences per nonsynonymous site, respectively) are <0.74, we set 1 = logt, 2 = log!, 1 ¼ 1 t ffiffiffiffiffiffiffiffi f Vt À Á q , and The variances V(t) and V(!) are estimated using the Nei and Gojobori (1986) method. Because the Nei and Gojobori method uses the Jukes and Cantor (1969) nucleotide substitution model (JC69) to correct for multiple hits, the use of 0.74 as an upper limit for the p S and p N guarantees an adequate estimation of V(t) and V(!).
In all other cases, we find numerically the point ( t, !) that maximizes log{g(t, !)}. We calculate the Hessian matrix at this point using the second-order difference method and use the inverse of the Hessian to estimate the variances V( t) and V( !). Then, we set 1 = log t, 2 = log !, 1 ¼ 1 t À Á ffiffiffiffiffiffiffif V t ð Þ q , . Notice that because of our choice of the prior, log(g) always has a mode and thus the optimization algorithm returns a point away from (0, 0). We use the same number of points n for both parameters ! and t in the Gaussian quadrature. With n = 32, each sum in equation (15) requires 32 Â 32 = 1,024 evaluations of the r(z 1 , z 2 ) function. Tests suggest that using 32 points achieves high accuracy. The use of more points increases the computational time radically since evaluation of r(z 1 , z 2 ) requires evaluation of the likelihood which is computationally expensive. Moreover, we use the same techniques described above to calculate the posterior probability P(! > 1 j x) = 1 C R 1 0 R 1 1 f ðx j t, !,Þ f ðt, !Þd!dt, as a Bayesian equivalent of the LRT for positive selection indicated by ! > 1.