Large-Scale Comparative Analysis of Codon Models Accounting for Protein and Nucleotide Selection

Abstract There are numerous sources of variation in the rate of synonymous substitutions inside genes, such as direct selection on the nucleotide sequence, or mutation rate variation. Yet scans for positive selection rely on codon models which incorporate an assumption of effectively neutral synonymous substitution rate, constant between sites of each gene. Here we perform a large-scale comparison of approaches which incorporate codon substitution rate variation and propose our own simple yet effective modification of existing models. We find strong effects of substitution rate variation on positive selection inference. More than 70% of the genes detected by the classical branch-site model are presumably false positives caused by the incorrect assumption of uniform synonymous substitution rate. We propose a new model which is strongly favored by the data while remaining computationally tractable. With the new model we can capture signatures of nucleotide level selection acting on translation initiation and on splicing sites within the coding region. Finally, we show that rate variation is highest in the highly recombining regions, and we propose that recombination and mutation rate variation, such as high CpG mutation rate, are the two main sources of nucleotide rate variation. Although we detect fewer genes under positive selection in Drosophila than without rate variation, the genes which we detect contain a stronger signal of adaptation of dynein, which could be associated with Wolbachia infection. We provide software to perform positive selection analysis using the new model.


Introduction
Detecting the selective pressure affecting protein-coding genes is an important component of molecular evolution and evolutionary genomics. Codon models are one of the main tools used to infer selection on protein-coding genes (Koonin and Wolf 2010). This is done by comparing the rate of nonsynonymous substitutions (d N ) that are changing the amino acid sequence with the rate of synonymous substitutions (d S ) that do not affect this amino acid sequence.
Although there is overwhelming evidence of negative and positive selection acting on the amino acid sequence of the proteins (Boyko et al. 2008), synonymous substitutions affecting the protein-coding genes are assumed to be effectively neutral in most current models. This is a reasonable first approximation, especially for species with low effective population size, such as many mammals (Keightley et al. 2005;Romiguier et al. 2014). Therefore the synonymous substitution rate can be used as a proxy for the neutral substitution rate, and comparison between d N and d S can be used to identify selection acting on the level of amino acids (Yang and Bielawski 2000).
A corollary of this assumption has been that most codon models assume that synonymous substitution rates are uniform across each gene. Yet there is no biological reason to assume this uniformity, and actually some evidence against it (for a review, see Rubinstein and Pupko 2012). This is expected to particularly affect more sophisticated models, where x varies between branches and sites. Indeed, violation of the assumption of uniformity of the synonymous rate can affect the performances of the site model (Rubinstein et al. 2011).
There are numerous sources of variation in the rate of synonymous substitutions inside genes. First, the raw mutation rate across each genome varies significantly. One of the strongest effects on the mutation rate in mammals is CpG sites. Transitions at CpG sites are more that 10-fold more likely than transversions at non-CpG sites (Leffler et al. 2013) due to spontaneous deamination, which causes a mutation from C to T, or from G to A. Both mutation frequencies and repair efficiency are highly dependent on the context. For example, the mammalian CpG mutation rate is lower in high GC regions (Fryxell and Zuckerkandl 2000). This is probably related to strand separation and hydrogen bonding in the neighboring region (Segurel et al. 2014). High GC regions themselves are characterized by a higher mutation rate, which is probably caused by less efficient repair by the exonuclease domain. There are other context-dependent effects which are known, many of which lack a mechanistic explanation, such as a higher mutation rate away from T with an increasing number of flanking purines (Hwang and Green 2004, for reviews see Hodgkinson andEyre-Walker 2011 andSegurel et al. 2014).
Mutation rate is also affected by replication time and it has been shown to be higher in late-replicating regions (Stamatoyannopoulos et al. 2009). This effect has been attributed both to the interference between RNA and DNA polymerases (Jørgensen and Schierup 2009) and to variation in the efficiency of mismatch repair (Supek and Lehner 2015). It is not clear that this affects variation within genes, as opposed to between genes, but it could do so in very long genes.
Mutation rates are correlated with recombination rates. Some suggest (Lercher and Hurst 2002;Hellmann et al. 2003Hellmann et al. , 2008) that recombination itself can have a mutagenic effect, possibly through an interaction with indels. Evidence from Drosophila suggests that recombination is not mutagenic, but does influence local N e and thus the efficiency of selection (Castellano et al. 2018). Alternatively, this correlation can be a result of GC-biased gene conversion (GC-BGC), whereby mutations increasing GC content have a higher chance of fixation in the population (Duret and Galtier 2009). Although GC-BGC is a fixation bias, in some cases it can create a pattern which is hard to distinguish from positive selection (Ratnakumar et al. 2010).
Finally, the synonymous substitution rate can be affected by selection at the nucleotide level. First, although synonymous substitutions do not affect the protein sequence, they might affect translation efficiency. This effect is not limited to species with large effective population size, such as Drosophila (Carlini and Stephan 2003), because selection for codon usage was identified even in Homo sapiens (Comeron 2004) and other mammals, especially for highly expressed genes. It has been suggested that bias in codon usage reflects the abundance of tRNAs, and thereby provides a fitness advantage through increased translation efficiency or accuracy of protein synthesis (Bulmer 1991), although in many cases there is no dependency between tRNA abundance and codon frequency, and the source of the bias remains unknown (Plotkin and Kudla 2011). Selection at the nucleotide sequence can be also caused by secondary structure avoidance, as secondary structure can reduce translation efficiency (Kudla et al. 2009;Kertesz et al. 2010). Other potentially important sources of selection on the nucleotide sequence, independent of the coding frame, include splicing motifs located within exons, exon-splicing enhancers (Majewski and Ott 2002), or genes for functional noncoding RNAs, such as miRNAs or siRNAs, which often reside within coding sequences (Mattick and Makunin 2006).
Because of all these mutational and selective effects, it is important to model rate variation not only at the level of protein selection but also at the nucleotide level. There are in principle two different approaches to incorporate rate variation into codon models. We can extend either the Muse and Gaut (1994) model, where both d N and d S are estimated as two independent parameters, or the Goldman and Yang (1994) model, where selection pressure on the protein sequence is represented by a single parameter (x) that defines the ratio of nonsynonymous to synonymous substitutions (d N =d S ). First, it is possible to model synonymous (d S ) and nonsynonymous (d N ) substitution rates separately by extending a two-rate model, as in  and Mayrose et al. (2007). Second, it is possible to incorporate site-specific rates as an independent parameter into singleparameter models (Scheffler et al. 2006;Rubinstein et al. 2011). In the second case, the substitution rate parameter captures biological factors acting on all substitutions, both synonymous and nonsynonymous. These factors can include mutation rates, fixation rates, or nucleotide selection.
Here we focus on the second approach, which is traditionally used for large-scale positive selection analyses in eukaryotes (Clark et al. 2007;Markova-Raina and Petrov 2011;Moretti et al. 2014;Zhang et al. 2014). Spielman et al. (2016) report superior performance of a compound parameter for the estimation of selection strength on the protein.
Although we use a single parameter x to model the selection strength at the protein level, x can vary both between alignment sites and between tree branches.
Although codon models accounting for nucleotide rate variation have been available for more than a decade, they are still rarely used for large-scale selection analyses, such as Kosiol et al. (2008), Moretti et al. (2014), and Zhang et al. (2014). This is probably because these models have even higher computational demands, and the statistical performance of different approaches to nucleotide rate variation was never compared.
Here we extend the Scheffler et al. (2006) model, which captures variation between codons, that is, uses a single rate per codon, and perform a direct comparison with the model of Rubinstein et al. (2011), which captures variation between nucleotides, that is, with three rate parameters per codon. Thanks to the computational efficiency of our method, we can show that synonymous rate variation is pervasive, and impacts strongly the detection of branch-site positive selection.
We also assess the impact of nucleotide rate variation on the BS-REL-family model (Murrell et al. 2015). Models based on Goldman and Yang (1994) typically use the maximum likelihood approach on the two nested models in order to detect positive selection. This way the method can identify genes, whereas individual sites can be detected only using additional posterior analysis. This approach seems to be suitable for large-scale genome analyses, where one is interested in identifying biological functions undergoing positive selection (Clark et al. 2007;Kosiol et al. 2008;Zhang et al. 2014;Daub et al. 2017). We chose Murrell et al. (2015) as a comparison, because it is the only BS-REL model for gene-wide identification of positive selection, whereas other positive selection models in that family are intended for inference of selection at individual sites, and, thus, cannot be compared directly.
We first use simulations to compare different approaches of modeling synonymous rate variation. Then, we use our model to detect positive selection in 12 Drosophila species and in a vertebrate data set.
Comparative Analysis of Codon Models . doi:10.1093/molbev/msz048 MBE We detect positive selection on genes from those two data sets under our new model, and we demonstrate that it is important to take rate variation into account for such inference of positive selection. We investigate factors affecting the nucleotide substitution rate, and we show that the new model successfully detects synonymous selection acting on regulatory sequences within the coding sequence. We also identify what are the gene features that affect rate heterogeneity the most.

New Approaches
We model the process of codon substitution as a Markov process defined by the instantaneous rate matrix Q. In a general case, Q can be written as follows (Rubinstein et al. 2011): : Here, q ðm;pÞ is the substitution rate for pth position (p 2 f1; 2; 3g) of codon site m of an alignment (m 2 1 . . . M, where M is the alignment length in codons). The variable k ij is the substitution factor to change from codon i to codon j. It is typically used to account for the difference between transition and transversion rates (Hasegawa et al. 1985). In this case, k ij depends on the substitution type but does not depend on the codon position in which substitution occurs. The rate q ðm;pÞ is used to account for various effects that are not captured by the variation in x. In particular, it accounts for variation in mutation rate and selection acting on the nucleotide sequence.
In Rubinstein et al. (2011), q ðm;pÞ is modeled using a oneparameter gamma distribution across sites of the alignment, such that the mean relative substitution rate is equal to 1, that is, q ðm;pÞ $ Gammaða; 1=aÞ. Keeping a mean rate of 1 is important to avoid biases in the estimation of branch lengths. There is no implicit assignment of rates to sites, as in the CAT model (Lartillot and Philippe 2004). Instead, a random-effect model is used: The gamma distribution is split into K equally probable discrete categories q k using quantiles, and the site likelihood is computed as the average of the likelihoods for each possible rate assignment. This approach adds only one extra parameter to the model, but it is computationally intensive. Indeed, in order to compute a likelihood for the mth codon in the alignment, it is necessary to compute likelihoods for this codon given all possible rate assignments, that is, q ðm;1Þ ; q ðm;2Þ ; q ðm;3Þ 2 fq 1 ; . . . ; q K g: For K discrete categories, K 3 likelihoods have thus to be computed per codon site.
In Scheffler et al. (2006), unlike Rubinstein et al. (2011), the three positions of each codon have the same rates, that is, q ðm;1Þ ¼ q ðm;2Þ ¼ q ðm;3Þ ; we denote the rate of a codon m as q ðm;ÃÞ . Here a codon belongs to one of three categories, each one represented by a single rate value q k . The rates and their respective proportions are estimated from the data, which leads to the estimation of four different parameters: two rate parameters R 1 < 1 and R 2 > 1 and two proportion parameters 0 < p 1 < 1 and 0 < p 2 < 1. Effective proportions are computed as follows:p 1 ¼ p 1 ;p 2 ¼ ð1 À p 1 Þp 2 , and p 3 ¼ ð1 À p 1 Þð1 À p 2 Þ. Rates are computed as: q 1 ¼ sR 1 , q 2 ¼ s, and q 3 ¼ sR 2 , where s is a scale factor chosen such that the mean rate is equal to one: P 3 i¼1pi q i ¼ 1. This approach is virtually equivalent to adding a branch length multiplier for certain site classes, and therefore likelihood can be computed efficiently.
Here we propose having one rate per codon q ðm;1Þ ¼ q ðm;2Þ ¼ q ðm;3Þ , while allowing this rate to vary following a gamma distribution, q k $ Gammaða; 1=aÞ. The same approach to model rate variation has already been used (Baele and Lemey 2013;Gil et al. 2013), but it was restricted to the assumption of constant selective pressure (x) across all sites and phylogenetic branches.
Our approach is closely related to Scheffler et al. (2006), as we are modeling a single rate per codon. The distinction is that we are using unit gamma distribution to model rates because of its flexibility while being controlled by a single parameter, as opposed to four parameters required for the 3-rate model of Scheffler et al. (2006). The approach described by Rubinstein et al. (2011) also uses gamma distribution to model rates, but those rates are associated with a single nucleotide site, not a single codon site, which substantially increases the computational complexity. In the approaches proposed by  and Mayrose et al. (2007), gamma-distributed rates are also assigned to individual codons. The important distinction is that in those cases synonymous and nonsynonymous rates are modeled separately, which makes estimation of selection as a ratio between the two rates more challenging (see also Spielman et al. 2016).
Using our approach, we extended two widely used codon models: the site model M8  and the branchsite model (Zhang et al. 2005). In principle, our approach could be applied to any GY94-based model. In M8, selection pressure represented by the x parameter varies between the sites of an alignment following a beta distribution, while staying constant over the branches of the phylogenetic tree. In this model, a subset of sites can evolve under positive selection. In the branch-site model, x varies both between the sites of the alignment and the branches of the phylogenetic tree. In this model, a subset of sites can thus evolve under positive selection on a predefined subset of branches. These two models were implemented in Godon, a codon model optimizer in Go, in four variants: no rate variation, site rate variation (Rubinstein et al. 2011), codon 3-rate variation Davydov et al. . doi:10.1093/molbev/msz048 MBE (Scheffler et al. 2006), and codon gamma rate variation as described above.
Four of the eight models were implemented and used for the first time to our knowledge: branch-site models with site rate variation similar to Rubinstein et al. (2011), codon 3-rate variation similar to Scheffler et al. (2006), gamma distributed codon rate variation as proposed above, and M8-based model with gamma-distributed codon rate variation. All models were implemented within a common framework, ensuring fair comparisons.

Site Models
We have simulated four data sets using various flavors of the M8 model: a data set without rate variation, a data set with site rate variation, a data set with gamma-distributed codon rate variation, and a data set with codon 3-rate variation (table 1). We then used the four corresponding models to infer positive selection in those data sets. In all four cases, as expected, the model corresponding to the simulations shows the best result in terms of receiver operating characteristic (ROC, fig. 1 and table 2) as well as accuracy (supplementary table S1, Supplementary Material online), and precision versus recall (supplementary fig. S1, Supplementary Material online).
In the absence of rate variation, the statistical performance of the four methods is very similar, even though the M8 model without rate variation has a slightly better ROC  NOTE.-Bullets and circles indicate which models were used to simulate data sets and which models were used for inference on these data sets. Combinations indicated with bullets are discussed in the main text, whereas combinations indicated with circles are discussed in the Supplementary Material online. M8, M8 model of ; BS, branch-site model of Zhang et al. (2005); BUSTED, BUSTED model from the BS-REL-family (Murrell et al. 2015). Codon gamma variation refers to the proposed parametrization, whereas codon 3-rate variation refers to the parametrization introduced in Scheffler et al. (2006).

Codon gamma variation
Codon 3−rate variation with site rate variation, M8 with codon gamma rate variation, and M8 with codon 3-rate variation) and of BUSTED on data sets (A) without rate variation, (B) with site rate variation, (C) with codon gamma rate variation, and (D) with codon 3-rate variation. Specificity is defined as the proportion of correctly identified alignments simulated under a model with positive selection, and sensitivity is defined as the proportion of correctly identified alignments simulated without positive selection. The dashed diagonal line shows theoretical performance of the random predictor, and the dashed vertical and horizontal lines indicate theoretical performance of the perfect predictor.
Comparative Analysis of Codon Models . doi:10.1093/molbev/msz048 MBE gamma variation performs almost as well as site variation, and clearly better than the model with no variation.
For both data sets with codon rate variation, there is a relatively large decrease in the performance of models both without rate variation and with site rate variation. ROC is decreased ( fig. 1C and D) and FPR is inflated (supplementary fig. S2C and D, Supplementary Material online). Performance of the two variants of codon rate variation is similar. Codon gamma rate variation even increases the FPR above 50% for the model without rate variation at the significance level of 0.05. Stronger rate variation (i.e., smaller a value) causes a higher FPR (supplementary fig. S4, Supplementary Material online).
From this, we can conclude that (i) the performance of models accounting for codon variation is acceptable in all three scenarios, that is, no rate variation, site rate variation, and codon rate variation; (ii) in the presence of codon rate variation in the data, models not accounting for this kind of variation suffer from a notable loss of statistical performance.
We have mainly focused on a realistic scenario in which true branch lengths are unknown. However, to confirm that our results are not biased by the differences in branch lengths, we also fit the models using the true branch lengths. In this case, performance is very similar, both in terms of ROC (sup- More complex models have a computational cost. Analyses with codon gamma rate variation were 3.3 and 2.8 times slower compared with no rate variation for M8 and branch-site models, respectively. The codon 3-rate variation model by Scheffler et al. (2006) provides a similar statistical performance while having a higher computational cost: 3.7 and 6.7 times slower compared with no rate variation, respectively. This increased computational load might be explained by a larger dimensionality of parameter space. Site rate Table 2. AUC for All M8-Based Simulations (see Fig. 1) and for BUSTED.  (table 3). Thus codon rate variation captures biological signal at a much lower computational cost than site rate variation, especially when coupled with a gamma model.

Comparisons with BS-REL
It has been demonstrated (Murrell et al. 2015) that in certain cases the statistical power of BS-REL is superior to other methods, including the branch-site model. Therefore, it is important to study how rate variation affects the performance of those models. The only BS-REL model suitable for the gene-wide identification of positive selection is BUSTED (Murrell et al. 2015), and the current implementation supports neither rate variation nor d S variation (implemented by . The BS-REL framework differs from the branch-site and M8 models by the frequency parametrization. BS-REL is based on the approach of Muse and Gaut (1994), whereas the branchsite and M8 models are based on the approach of Goldman and Yang (1994). During simulations, we used F0 frequencies, that is, identical frequencies of all the codons (p i ¼ 1=61). As F0 can be considered a special case of both Muse and Gaut (1994) and Goldman and Yang (1994), we do not expect any bias caused by this difference between the two approaches.
BUSTED shows significantly inflated rates of false positives in the presence of codon gamma rate variation (supplementary figs. S2C and S7C, Supplementary Material online, simulations under M8 and branch-site, respectively). At a typical significance level of 0.05, the FPR of BUSTED is close to 0.3 and 0.2 for the M8 and branch-site simulations, respectively (i.e., the proportion of false positives can be 4-6 times higher than we expect by chance). This shows that the statistical performance of the BS-REL family models is also affected by not taking into account rate variation.

Vertebrate Data Set
Given the good performance of the codon gamma rate model in the simulations, we applied it to real data. First, we used 767 one-to-one orthologs from vertebrate species. This represents a set of genes with high divergence (more than 450 My), conservative evolution (Studer et al. 2008), and relatively low effective population sizes (although some vertebrates have high N e , see Gossmann et al. 2012), thus relatively weak impact of natural selection. We analyzed them with four variants of the branch-site model: no rate variation, site rate variation, codon gamma rate variation, and codon 3-rate variation. We used the branch-site model to search for positive selection, as it is more sensitive to episodic positive selection (see Supplementary Material online, page 3).
In most cases (each case corresponding to a single branch of a gene tree), the data support the codon rate variation models based on Akaike information criterion (AIC): out of 8,907 individual branches tested, data support codon gamma rate variation in 43% of the tests, codon 3-rate variation in 47%, site rate variation in 9.4%, and no rate variation model was favored only in 12 tests (0.1%).
A large proportion (43%) of branches detected to be under positive selection with the no rate variation model are not detected to be under positive selection with the codon rate variation model (table 4). This effect is even stronger (72%) when multiple testing correction is used (table 4B). The majority of the positive predictions from the standard branchsite model are not supported when both multiple testing and codon rate variation are accounted for. This suggests that evolution on these branches can be explained by nucleotide substitution rate variation without positive selection (see Discussion). We observe relatively high agreement in predictions between the site rate variation and codon rate variation models (after multiple testing correction), as those are two different approaches to model the same evolutionary process. Supplementary figure S12, Supplementary Material online, shows the number of genes detected by the branch-site models with codon gamma rate variation and without rate variation, mapped to the vertebrate species phylogeny. There was no positive selection identified using M8 models, with or without codon gamma rate variation.
Supplementary table S4, Supplementary Material online, shows prediction agreement between each model and the best supported model out of four, confirming the good performance of the codon variation model. Although the codon 3-rate model has a slightly higher proportion of tests which support it, the codon gamma rate variation model has the highest prediction agreement with the best model (supplementary table S4B and C, Supplementary Material online).
Codon gamma rate variation model was 4.7 times slower, whereas codon 3-rate variation model was 8.7 times slower, and site rate variation model was 69 times slower (table 3) than no rate variation.

MBE
With real data, differences between genes are not only stochastic, but more interestingly are expected to be driven by underlying biological differences. It is thus interesting to find which factors affect rate variation as estimated by the model, as well as to know which genes favored the model with codon rate variation the most.
We focused on gene features associated with the underlying evolutionary process, such as recombination rate, GC content standard deviation (indicative of shifts in recombination hotspots; Glemin et al. 2015), and expression level (associated with stronger purifying selection; Drummond et al. 2005;Pal et al. 2006;Kryuchkova-Mostacci and Robinson-Rechavi 2015). Parameters which can directly affect the performance of the method were also included in the linear model to avoid potential biases, for example, number of sequences and alignment length. We also included total intron length and number of exons, because they can affect synonymous selection associated with splicing, or disparity between mutation rates associated with chromosomal localization of exons.
Here and below we used three response variables for our analyses. First, we create linear models using the relative support of the model (based on Akaike weights, see Materials and Methods) as a response variable. These models allow us to understand for which categories of genes the effect of rate variation is the strongest. Second, models using the a parameter of the gamma distribution (codon rate variation) as a response variable allow us to identify gene properties associated with high substitution-rate variance. Finally, a model for the proportion of branches which are inferred to have evolved under positive selection when rate variation is not taken into account, but not when it is, allows us to identify the main causes of discrepancy between the results of the two models.
In this analysis, each group of orthologous genes was treated as a single observation. Estimated parameter values obtained by testing different branches of the tree were averaged.
The relative support of the model with codon gamma rate variation is mostly affected by total branch length, alignment length, and mean GC content of the gene (supplementary table S5, Supplementary Material online). The positive correlation with tree length and alignment lengths is probably related to the increase in total amount of information available for the model. The relation to GC content might be due to the relationships between recombination rates, substitution rates, and GC content (Duret and Galtier 2009;Rudolph et al. 2016;see Discussion).
For the shape parameter of the gamma distribution a, the strongest explanatory variable is GC content (supplementary table S6, Supplementary Material online). As with relative support of the model, this could be related to recombination. We also observe a weak relation with maximal expression level. Highly expressed genes tend to have a higher rate variation, which could be explained by higher nucleotide level selection on certain parts of the gene.
Enrichment analysis did not identify any categories overrepresented among genes detected to be evolving under positive selection with rate variation. This might be due to the small size of the data set. The second real data set we used contains 8,606 one-to-one orthologs from Drosophila genomes. The Drosophila data set is 10-fold larger than the vertebrate data set. As analyses on the simulated and vertebrate data sets show a consistent superiority of codon gamma rate over site variation and codon 3-rate variation, with a much lower computational cost, we ran only no variation and codon gamma variation on the full data set. Therefore for the Drosophila data, we are mainly focusing on comparing models with and without codon rate variation. However, we did run site variation and codon 3-rate variation on a subset of 1,000 genes selected randomly.
Drosophila has large effective population sizes on average (Gossmann et al. 2012), thus stronger impact of natural selection; the genes studied are less biased toward core functions than in the vertebrate data set, and have lower divergence: about 50 Ma for Drosophila (Russo et al. 2013) compared with more than 450 Ma for the vertebrate data set (Betancur-R et al. 2015).
In total, 66,656 branches were tested for positive selection. The model with codon gamma rate variation was supported by the data in 97% (respectively 96%) of the tests when using AIC (respectively likelihood ratio test, LRT). On the smaller subset, on which all the four approaches were applied, codon gamma rate variation was supported in 48% of the tests, codon 3-rate variation in 23%, site rate variation in 26%, and no variation in 2%. As with the vertebrate data set, predictions were not consistent between the models (table 5, comparison of all models for a subset of 1,000 genes in supplementary table S7, Supplementary Material online). The site and the codon gamma rate variation models display a stronger consistency in predictions of positive selection relative to the consistency between the model without rate variation and the model with codon gamma rate variation.
As in vertebrates, when accounting for multiple testing, the vast majority of predictions of positive selection given by the model without rate variation are not supported by the model accounting for rate variation. Supplementary figure S13, Supplementary Material online, shows the number of genes detected by the branch-site models with codon gamma rate variation and without rate variation, mapped to the Drosophila species phylogeny. In addition, there were 4 and 19 genes identified with the M8 models with codon gamma rate variation and without rate variation, respectively (for the full lists, see Availability).
Genes identified to be under positive selection with the model accounting for codon gamma rate variation are enriched for molecular function GO categories associated with dynein chain binding (GO:0045503, GO:0045505, for both terms q-value¼ 0:016, supplementary table S8, Supplementary Material online). Dynein plays an important role in Wolbachia infection (Serbus and Sullivan 2007), and is thus a likely candidate for strong positive selection (Werren et al. 2008). Surprisingly, there are no significant molecular function GO categories identified using the branch-site model without rate variation (for dynein categories q-value ¼ 1, supplementary table S9, Supplementary Material online). Genes associated with dynein chain binding predicted to have evolved under positive selection, and the relevant amino acid positions, are provided in supplementary table S10, Supplementary Material online.
The relative support of the model with codon rate variation is mainly explained by alignment length, number of sequences, and coding sequence length (table 6). Stronger model support associated with increase in the amount of information (increased coding sequence length means less gaps for the same alignment length), expression levels, and mean GC content is consistent with the vertebrate results (supplementary table S5, Supplementary Material online).
We also observe a dependence on the number of exons and on recombination rate. A larger number of exons imply more exon-intron junctions, which might affect variation in levels of nucleotide sequence selection (see below). Recombination might affect GC-BGC, mutation rate, and selection strength acting on synonymous sites (Campos et al. 2014).  NOTE.-Drosophila data set. Significant variables (P-value < 0.05) in italics. Model Pvalue is < 2:2 Â 10 À16 , adjusted R 2 is 0.6478. Model formula: relative model sup-port~number of sequences þ total branch length þ alignment length þ length of coding sequence þ GC content (mean) þ GC content (SD) þ total intron length þ number of exons þ maximum expression þ mean expression þ recombination rate.

MBE
The rate variation parameter a can be explained by several features of genes (supplementary table S11, Supplementary Material online). Most of the effects are not reproduced between the two data sets. Although some of them are strongly significant, generally the effect sizes are not very large. The most consistent effect between the two data sets is dependence of the rate variation on GC content.

Signatures of Selection at the Nucleotide Level
Codon rate variation can be influenced by various factors such as mutation bias, fixation bias (e.g., gene conversion), or selection acting against synonymous substitutions. Notably, it is well known that exon regions adjacent to splicing sites are evolving under purifying selection at the nucleotide level (e.g., see Majewski and Ott 2002). We determined posterior rates for positions of protein-coding gene regions located in the proximity of exon-intron and intron-exon junctions; first exons were excluded from the analysis.
We observe in Drosophila ( fig. 3) that our codon rate variation model captures these selection constraints: The codon substitution rate is lower at the exon-intron junction than at the intron-exon junction, and both have lower rates than the rest of the exon. This is in agreement with splicing motif conservation scores (e.g., see Cartegni et al. 2002), and consistent with negative selection acting on splicing sites.
We also used the model M8 with codon gamma rate variation to simultaneously estimate the effect of factors which affect substitution rates of nucleotide and protein sequences, again in Drosophila. We observed that the model is able to recover opposing trends acting on the 5 0 -region of the protein-coding gene ( fig. 4). These trends are probably a result of the high functional importance of the 5 0 -nucleotide sequence, but low functional importance of the corresponding amino acid sequence (see Discussion). Comparing nucleotide and inverse protein rates (supplementary fig. S14, Supplementary Material online) indicates that the effect is slightly stronger at the protein level; however, the difference is only marginal. It is worth noting that this effect cannot be explained by a dependence between the q and the x estimates, as such a dependence is not observed in other regions (supplementary fig. S15, Supplementary Material online).
We observe that the top 25% most highly expressed genes show both stronger conservation of the amino acid sequen- . As for the decrease in substitution rate, q is defined relative to the gene-wide average substitution rate. Therefore, The rates were estimated using the model M8 with codon gamma rate variation. The left panel depicts rates in 5 0 -exon (prior to the exon-intron junction, negative distances), whereas the right panel depicts 3 0 -exon (rates after the intron-exon junction, positive distances). A rate of 1 corresponds to the average rate of substitution over the gene; thus values above 1 do not indicate positive selection, but simply a rate higher than average for this gene. The blue ribbon indicates the 98% confidence interval of mean estimate. Only alignment positions with <30% of gaps were used in the plot. MBE a stronger relative selection on the 5 0 -region can be detected above the average increase in purifying selection on highly expressed genes. The relation with expression levels is consistent with the assumption that we are measuring natural selection on gene sequences in this case, rather than mutation rates.

Nucleotide Level Selection in Coding Regions
There is strong evidence of selection acting on synonymous substitutions within protein-coding sequences, and the strength of this selection is expected to vary across coding regions (Chamary et al. 2006). In particular, negative selection strongly affects regulatory sequences, such as exonic splicing enhancers or exon junction regulatory sequences (Cartegni et al. 2002). Variation in selection over both synonymous and nonsynonymous substitutions can affect the performance of codon models (Rubinstein et al. 2011), and we show that indeed it strongly impacts the results of the popular branch-site model, as well as the simpler site model (M8).
Although there are multiple ways to account for this variation, for instance by modeling the synonymous and nonsynonymous rates separately , here we focused on modeling the ratio of nonsynonymous versus synonymous rates as a single parameter (x), while allowing the substitution rate (q) to vary along the sequence. Our approach succeeds in recovering a signal of splicing motif conservation jointly with negative and positive selection acting on the protein sequence.
We also demonstrate that our model is able to disentangle opposite trends acting on the same sequence, that is, stronger negative selection acting on the nucleotide sequence combined with weaker amino acid selection toward the beginning of the reading frame.
Selection on the 5 0 -nucleotide sequence is probably due to selection for translation initiation efficiency (Bentele et al. 2013), and is perhaps related to suppression of mRNA structures at the ribosome binding site. At the same time N-terminal amino acids are more likely to be unstructured, and they are relatively less important to protein function and stability compared with the core (Guharoy and Chakrabarti 2005).

Determinants of Rate Variation
The majority of gene alignments in the study indicated better support for the model with codon gamma rate variation. Moreover, the relative probability of the models incorporating codon gamma rate variation increases with the amount of information available, be it number of sequences, alignment length, or total number of substitutions. This indicates that these models are better in describing the underlying evolutionary process, and if we have enough data, these models are favored. We detect a strong signal of nucleotide variation in two quite different data sets. Flies have high effective population size, thus natural selection is relatively strong, including on codon usage or splicing. Vertebrates have higher sequence divergence, which does not appear to mask the signal of nucleotide evolution, despite lower effective population sizes in many species (Gossmann et al. 2012;Romiguier et al. 2014). Thus, the effect of nucleotide rate variation appears quite general, and will probably be found in many other species.
The strongest determinant of the relative support of the model with codon rate variation is GC content. A strong effect of GC content on synonymous rate variation was reported by Dimitrieva and Anisimova (2014), based on an analysis of protein domain coding sequences with a modified site model. It is well known (Fullerton et al. 2001;Marais et al. 2003;Chamary et al. 2006) that regions with high recombination rates have higher GC as a result of GC-BGC, notably in the species studied here. It has also been shown that models accounting for rate variation show significantly better performance than simpler models in the presence of recombination (Scheffler et al. 2006), even if the true tree topology is used.
The effect of recombination rate as measured by Comeron et al. (2012) on the relative support of codon rate variation is comparable to the effect of GC content alone. In mammals, the higher CpG dinucleotide mutation rate (Kong et al. 2012) can increase the disparity in substitution rates and therefore contribute to the dependence between the GC content and the relative model support.
Yet, we observe this dependence in Drosophila, where there are no significant neighboring base contextual effects on the mutation rate (Keightley et al. 2009). One explanation is that the GC content could be acting as a proxy for the average recombination rate over time via GC-BGC. Considering the rapid evolution of recombination hotspots (Ptak et al. 2005), GC content probably captures historical recombination rates, whereas the direct measurement of recombination rate captures only the current state.
Low and moderate levels of recombination are in general tolerated by codon models, especially models that account for rate variation (Anisimova et al. 2003;Scheffler et al. 2006). However, high levels of recombination could inflate the FPR of such models. Therefore, for genes showing the highest rates of recombination, positive selection predictions should be interpreted with caution. Removing the top 30% highest recombination rates has almost no effect on the linear model (supplementary tables S12 and S13, Supplementary Material online).
In both data sets, we observe a significant positive association of rate variation with the maximal expression level. Pressure for translational robustness increases with expression levels (Drummond et al. 2005), and codon choice affects expression level (Bentele et al. 2013). One of the main causes of selection on the codon sequence of highly expressed genes is protein misfolding avoidance ), but there is also selection for efficient translation initiation (Pop et al. 2014). Consistent with this, Dimitrieva and Anisimova (2014) reported more evidence for synonymous rate variation in genes expressed in the brain. Brain-expressed genes are known to be more sensitive to misfolding and under stronger purifying selection (Drummond and Wilke 2008;Kryuchkova-Mostacci and Robinson-Rechavi 2015;Roux et al. 2017). It is reasonable to assume that only certain parts of protein-coding genes will be affected by strong nucleotide sequence Comparative Analysis of Codon Models . doi:10.1093/molbev/msz048 MBE selection, and that this selection will be stronger on more expressed genes. Indeed our results show strong negative selection acting on the coding sequence of translation initiation regions, and the relative selection strength is higher for the 25% highest expressed genes (supplementary fig. S17, Supplementary Material online). This can lead to an overall increase in substitution rate variation.
Given that highly expressed genes have both stronger negative selection and stronger variation in substitution rate on the coding sequence, it is especially important to take this variation into account. These genes can easily have a combination of a low average d S , pulled down by strong purifying selection on some regions of the gene, a subset of codons evolving faster than this mean, and a low average x. This can be mistakenly interpreted as positive selection by models without rate variation. On the other hand, with our model we find a positive correlation between support for rate variation and positive selection in Drosophila, even after correcting for confounding variables such as gene length or GC content (linear model of relative model support with proportion of branches under positive selection: supplementary table S14, Supplementary Material online). It is possible that these genes are evolving under very strong selection, both positive and negative, or that strong recombination affects the performance of the model. We did not find a significant association in vertebrates (P-value ¼ 0.24 for the term in a linear model), which could be explained by a smaller size of the data set. In any case, we cannot confirm a previous report that the genes with the strongest evidence for synonymous rate variation had the less evidence for positive selection (Dimitrieva and Anisimova 2014).

Codon Models and Rate Variation
Widely used mechanistic codon models rely on the assumption of constant synonymous substitution rates. This assumption is often violated due to factors such as mutation bias or nucleotide selection, which vary across the gene. Although substitution rate variation can be caused by multiple factors, we use a single compound rate parameter to model this variation.
We demonstrate that this simple model captures such rate variation, and that it both detects new biological signal and substantially decreases the FPR in positive selection detection. Not only do we observe this effect in simulations (figs. 1 and 2, FPR: supplementary fig. S2, Supplementary Material online), but inconsistency between models is even higher when applied to the vertebrate and fly data sets. The vast majority of the predictions of positive selection obtained using models without rate variation were not supported by the model with codon gamma rate variation. The comparable power of the two methods (sensitivity: supplementary fig. S3A, Supplementary Material online) and the strong support for rate variation from the data suggest that most of those positive selection predictions can be explained by nucleotide rate variation (tables 4 and 5). Thus, they should be considered as probable false positives. The proportion of those potential false positives per gene alignment is positively associated with the amount of available information, such as alignment length (linear models: supplementary tables S15-S17, Supplementary Material online). This confirms that loss of power is not the main cause for the lack of detection by the codon rate model, but rather that the issue is false positive detection by the classical branch-site.
The effect of rate variation on the model performance is stronger in Drosophila than in vertebrates. On the vertebrate data set, after multiple testing correction, about 80% of the positive predictions might be false positives (table 4B), whereas in Drosophila it is 95% (table 5B). This might be a consequence of the higher effective population size (Gossmann et al. 2012) and thus stronger selection at the nucleotide level in Drosophila.
Incorporating rate variation into the model allowed us to identify a strong signal of positive selection acting on the inner dynein arm, which could be a result of selection against Wolbachia infection in Drosophila. With the model without rate variation, this signal is masked by other, presumably false positive, genes. Although there have already been several large-scale analyses of positive selection in Drosophila (Clark et al. 2007;Markova-Raina and Petrov 2011;Cicconardi et al. 2017), none of them reported positive selection affecting inner dynein arm. An association between positive selection on dynein and Wolbachia needs more evidence before it can be confirmed; however, it is promising that this association is better supported with a more stringent and realistic model. Although the gene list obtained with rate variation is shorter and thus provides less signal for anatomical enrichment, testis notably remains highly significant, as expected for selection on sexual conflict in flies (supplementary table S18, Supplementary Material online). Indeed, sexual selection has been repeatedly demonstrated using various approaches (e.g., Lupold et al. 2016).
Both versions of the branch-site test detect a small number of genes under positive selection, relative to expectations from some other approaches (Bierne and Eyre-Walker 2004). It has been suggested that the presence of positive selection on background branches can cause a decrease in power of the branch-site models (Kosakovsky Pond et al. 2011). However, we observe only a marginal decrease of power in our simulations (see Supplementary Material, "Branch-Site Model and Background Positive Selection"). This decrease is small both under the classical branch-site model and under rate variation. Thus, this background selection effect does not seem to explain the small number of genes detected.
An important question is why accounting for rate variation changes the statistical properties of the test. For models with a single x ¼ d N =d S value per alignment, comparison between d N and d S can be viewed as a contrast between the rates before and after the action of selection on the protein, and should not be significantly biased by nucleotide rate variation (Yang 2014). However, when x is allowed to vary, d N =d S overestimation could be caused not only by the variation in d N but also by codon-specific substitution rates. Indeed, having a small percentage of rapidly evolving codons in the gene would not be captured by an overall rate for d S , and therefore would be interpreted as positive selection by Davydov et al. . doi:10.1093/molbev/msz048 MBE models with protein level but without nucleotide level rate variation. Conversely, fully accounting for rate variation allows detecting these codons as rapidly evolving by the signatures of both synonymous and nonsynonymous substitutions.
There is recent evidence that double mutations in coding sequences increase the branch-site model FPR from 1.1% to 8.6% in similar data sets to those investigated here (Venkat et al. 2018). The interaction between this effect and rate variation along the gene is worth investigating.
We compared three different models accounting for rate variation: the site variation model of Rubinstein et al. (2011), the codon 3-rate variation of Scheffler et al. (2006), and our new codon gamma variation model which extends Scheffler et al. (2006). The codon rate variation model can be informally thought of as a special case of the site rate variation model. Despite that, the codon gamma rate variation performs better both in the simulations (M8: . There are probably two reasons for that. First, the fact that we can assign a rate to a particular nucleotide position does not necessary mean that we can reliably estimate it. Only two amino acids allow single nucleotide synonymous substitution associated with the first or second codon positions. This means that individual position rates can be estimated mostly through nonsynonymous substitutions, which are typically rare compared with synonymous ones. Moreover, the branch-site and M8 models allow variation in the nonsynonymous rate over codon positions, which means estimates of x and site rates are not independent. However, there is no visible dependency between codon rate variation ( fig. 3) and x variation (supplementary fig. S15, Supplementary Material online).
Secondly, we expect site rates to be autocorrelated along the sequence, because many factors, such as GC content, recombination rate, or chromatin state change slowly over the gene. Indeed, we see a signal of such autocorrelation in our data (effect size 0.018, P-value < 0.0001). Therefore, having an independent rate for every site is probably redundant.
Statistical performance of codon gamma rate variation and codon 3-rate variation by Scheffler et al. (2006) is comparable (M8 based models: table 2, branch-site based models: supplementary table S2, Supplementary Material online). Also, they are similar in terms of the model support provided by the data. However, codon gamma rate variation provides two important advantages. First, it is up to two times faster than codon 3-rate variation implemented using the same optimizations in Godon (table 3). This is probably caused by a larger dimensionality of parameter space and by nonindependence between model parameters, which can slow down the likelihood optimization. Second, comparison of positive selection predictions between codon gamma variation and the model with the highest support suggests (supplementary table S4, Supplementary Material online) that the codon gamma rate variation model provides a better detection of positive selection even in case of model misspecification.
One of the key advantages of codon variation relative to site variation is computational performance. Having a distinct rate for every position increases the number of site classes for which likelihood computations have to be performed by a factor of K 3 , where K is the number of discrete categories for gamma distribution. Having a rate only for each codon increases the number of site classes by a factor of K. This means that even for four discrete categories, the slowdown of likelihood computation for site rate gamma model will be about 64 times versus only 4 times for codon rate variation model. In practice, this ratio between the two models was respected in simulated and in vertebrate data. This makes codon rate variation models usable in large phylogenomics data sets, as we demonstrate by analyzing 12 Drosophila genomes.
Unlike traditional mechanistic codon models, our new models allow independent estimations of substitution rate at the nucleotide level and of selective pressure on amino acid sequences. It should be noted that individual site rate estimates may be still noisy because of the amount of data available. But given enough data it is possible to have accurate estimates of selection acting on specific regions, for example, splicing motifs, within coding sequences ( fig. 3).

Conclusions
We have performed a large-scale comparison of different approaches to model rate variation. Failure to account for rate variation leads to both type I and type II errors. We also propose an extension to the model of Scheffler et al. (2006), which has a good statistical performance both in the presence and in the absence of rate variation. We also provide a software implementation of the new models. Rate variation is strongly supported by homologous genes both from species with larger (flies) and smaller (vertebrates) effective population sizes. We are able to capture differences in substitution rates caused by nucleotide selection. Importantly, while being more complex these models remain computationally tractable and therefore can be applied to large-scale data sets. These models and their efficient implementation open the opportunity of simultaneous analysis of different layers of selection in phylogenomics.

Sequence Simulations
We simulated eight data sets (table 1) that include either no rate variation across sites (corresponding to the GY94 model), variation between sites (corresponding to the Rubinstein et al. [2011] model), 3-rate variation between codons (corresponding to Scheffler et al. [2006] parametrization), or gamma variation between codons (corresponding to our new approach). Each data set contains 1,000 alignments simulated under the null hypothesis H0 with no positive selection (all x 1) as well as 1,000 alignments under the alternative hypothesis H1 with positive selection (some x > 1). Models used in the study are mainly focusing on detecting alignments (i.e., genes) under positive selection, rather than individual sites. As our data set is balanced in terms of alignments, we Comparative Analysis of Codon Models . doi:10.1093/molbev/msz048 MBE used ROC and AUC as our main performance metrics. All the data sets had between 8 and 12 sequences composed of 100-400 codons and were simulated using our software named cosim (see Availability). The parameters of each simulation, including the alignment length and the number of species, were generated at random from their respective distribution (supplementary tables S19 and fig. S18, Supplementary Material online). Maximum values of x > 1 for H1 were 79 and 15 for the branch-site and M8 models, respectively. Values of a were within the range of values estimated from the real data (supplementary fig. S19, Supplementary Material online), with an emphasis on smaller values where the variation is stronger. For the simulations including rate variation, we used four discrete gamma categories that we assigned either to sites or to codons. The M8 model assumes that the neutral sites and those under purifying selection have an x drawn from a beta distribution and we represented this distribution using five discrete categories. Finally, to simulate evolution under the branch-site model, we randomly selected one "foreground" branch of the phylogenetic tree (either internal or terminal) for every simulated alignment.

Vertebrate and Drosophila Data Sets
We analyzed two biological data sets. Our goals were to compare the fit of the different models on real data, and to study which gene features are contributing to the variation of the substitution rate. First, we used a vertebrate one-to-one orthologs data set (Studer et al. 2008, available at http://bioinfo.unil.ch/supdata/positiveselection/Singleton.html) consisting of 767 genes (singleton data set). This data set was already used in previous studies of codon models (Fletcher and Yang 2010;Gharib and Robinson-Rechavi 2013;Davydov et al. 2017).
We also used a subset of one-to-one orthologs from 12 Drosophila species from the Selectome database (release 6, http://selectome.unil.ch/). This data set consists of 8,606 genes, and the alignments were filtered to remove unreliably aligned codons; the Selectome filtering procedure is based notably on GUIDANCE (Penn et al. 2010) and is described on the Selectome website and the corresponding publication (Moretti et al. 2014). Phylogenetic trees for Drosophila and vertebrates were acquired from TimeTree (Hedges et al. 2006).

Model Parameters Inference
For all the tests on simulated data we used the correct (i.e., simulated) tree topology, but starting branch lengths were estimated using PhyML v. 20131022 (Guindon et al. 2010) with the model HKY85 (Hasegawa et al. 1985). We did not start the optimization from the true branch lengths, by similarity to a real use-case, when only gene sequences are available, and the true branch lengths are unknown. Additionally, we also show results of estimations when true branch lengths were used. Although tree topology is also inferred in real usecases, and wrong topology could impact the inference of positive selection (Diekmann and Pereira-Leal 2015), investigating this is outside the scope of our study. Optimization of all model parameters jointly with branch lengths is not practical and substantially increases the computational load. We instead first estimated branch lengths using the simpler M0 model, which assumes a constant x across branches and sites, and optimized in a second step the model parameters of the M8 or branch-site models with or without rate variation, while fixing branch lengths. A similar approach was used in previous studies (Scheffler et al. 2006;Moretti et al. 2014).
We show that this approach at least in the case of the absence of variation does not decrease significantly the statistical properties of the positive selection inference (supplementary fig. S20, Supplementary Material online).
All model optimizations with the exception of BUSTED were performed in Godon, followed by model selection (see below).
For BUSTED, we used an implementation available in HyPhy v. 2.2.6 . When running BUSTED on M8 simulations, positive selection was tested on all the branches jointly. In the case of branch-site model simulations, only the foreground branch (x ! 1) was tested for positive selection.
For the biological data sets, all the internal branches were tested using the branch-site model for positive selection. Tip branches were not tested to reduce the potential effect of sequencing errors. The M8 model was applied to estimate substitution rates and x for individual sites.

Model Selection
During model selection we had eight models to choose from: four rate variation approaches and, for each, the absence or presence of positive selection. Although LRT can be used to test for positive selection, it is not possible to use it to compare across all eight models that we tested (i.e., any pair of codon rate variation and site rate variation models cannot be represented as a nested pair).
We thus first used the AIC on the alternative model to select one of the four approaches to model rate variation: no rate variation, site rate variation, codon 3-rate variation, or codon gamma rate variation. For the Drosophila data set, when only the no rate variation and codon gamma rate variation models were compared, we used both AIC and LRT. Figure 5 shows the scheme of model selection.
Once the rate variation model was selected, we performed LRT to detect positive selection on the corresponding pair of models, that is, model with x 1 and model without this constraint. A 50:50 mix of a v 2 distribution with one degree of freedom and of a point mass of 0 was used as a null distribution (Yang and dos Reis 2011). False discovery rate was computed using the qvalue package (Storey et al. 2004).

Posterior Rates Inference
In order to estimate substitution rates for individual codons, we used an approach similar to Mayrose et al. (2007) and Rubinstein et al. (2011). First, we estimated the probability of a codon belonging to each rate as P ¼ Prðq ðm;ÃÞ ¼ q k jx m ; gÞ, where q ðm;ÃÞ is the rate of codon m, q k is the kth discrete gamma rate, x m is the data observed Davydov et al. . doi:10.1093/molbev/msz048 MBE at codon m, and g are the parameters of the model (e.g., for M8 g ¼ fp 0 ; p; q; xg). In this approach, g is replaced with the maximum likelihood estimate of model parametersĝ. Thus, codon rates can be estimated as a weighted sum b q ðm;ÃÞ ¼ X K k Prðq ðm;ÃÞ ¼ q k jx m ; b gÞq k .
An alternative would be to use Bayes empirical Bayes (BEB; Yang et al. 2005), instead. However, BEB was developed and tested for site detection in particular codon models, and we do not know how well it is applicable to rate variation. On top of that given the increased parametric space of the model, BEB would be computationally intensive. As we are averaging rates over multiple sites, random noise should not introduce a substantial bias.
Codon site d N =d S ratios in the M8 model can be estimated using a similar approach, while replacing codon rate categories with the x categories.
Posterior codon rate and d N =d S estimation are implemented in Godon. In all cases, we used an alternative model for posterior estimation. As the null model for every pair is a special case of the alternative, we can use the later for parameter estimation without any significant loss of precision.
To test for autocorrelation, we used the average absolute difference between the posterior rates of the neighboring codon positions as a statistic: jq ðm;ÃÞ Àq ðmþ1;ÃÞ j: The null distribution was computed by shuffling the rates within all the genes 10,000 times.

Regression Analysis
To estimate dependencies between various parameters (variables), we used linear models (lm function, R version 3.5.1).
Variables were transformed to have a bell-shaped distribution if possible (see supplementary tables S20 and fig. S21, Supplementary Material online). Subsequently, parameters were centered at zero and scaled so that standard deviation was equal to 1. This transformation allowed us to compare the estimates of the effects. Because in some cases residuals showed strong heteroskedasticity (supplementary figs. S22 and S23, Supplementary Material online, for vertebrates and Drosophila), we used White standard errors (White 1980) implemented in the sandwich R package.
The relative support of codon gamma rate model was computed as a log ratio between Akaike weights (Wagenmakers and Farrell 2004) of the model with codon gamma rate variation and the model without rate variation.

Availability
All the code and the full lists of detected genes are available from https://bitbucket.org/Davydov/codon.rate.variation. Sequence simulator cosim is available from http://bitbucket. org/Davydov/cosim. Codon model parameter estimator Godon is available from https://bitbucket.org/Davydov/ godon. We provide source code as well as precompiled binaries for GNU/Linux (64 bit).
To the best of our knowledge, Godon provides the first implementation of branch-site models (Zhang et al. 2005) incorporating codon rate variation using approaches of Scheffler et al. (2006), Rubinstein et al. (2011), and our own gamma rates extension.

Godon Usage
Godon provides an easy way to perform an analysis given a FASTA-file and a newick-file, for example, the following command will perform branch-site test for positive selection with gamma distributed codon rates on every branch of a tree: FIG. 5. Model selection scheme. Red arrows correspond to LRT and blue arrows correspond to AIC. Rate variation selection (A) was performed only on the full Drosophila data set, whereas (B) was used on all three data sets: the vertebrate data set, the Drosophila data set, and a subset of 1,000 genes from the Drosophila data set.
Comparative Analysis of Codon Models . doi:10.1093/molbev/msz048 MBE godon test BSG --all--branches --ncat-codon--rate 4 input.fst input.nwk Often branch length optimization is performed once using a simpler model and then for every test the branch length parameters are fixed. In Godon, there is an easy way to achieve this behavior: godon test BSG --m0--tree --all-branchesncat-codon-rate 4 input.fst input.nwk

Supplementary Material
Supplementary data are available at Molecular Biology and Evolution online.