Abstract

The molecular clock hypothesis states that protein-coding genes evolve at an approximately constant rate. However, this is only expected to be true as long as the function and the tertiary structure of the molecule remain unaltered. An important implication of this statement is that significant deviations in the rate of evolution of a gene with respect to the species clock are likely to reflect functional and/or structural alterations. Here, we present a method to identify such deviations and apply it to a data set of 2,929 high-quality coding sequence alignments corresponding to one-to-one orthologous genes from six mammalian species—human, macaque, mouse, rat, cow, and dog. Deviated branches are defined as those that present significant alterations in both the rate of nonsynonymous substitutions (dN) and the selective pressure (dN/dS). Strikingly, we find that as many as 24.5% of the genes show branch-specific deviations in dN and dN/dS, though this is a relatively well-conserved set of genes. Around half of these genes show branch-specific acceleration of evolutionary rates. Positive selection (PS) tests based on divergence data only identify 17.7% of the accelerated branches. Failure to identify PS in accelerated branches with an excess of radical amino acid replacements suggests that these tests are conservative. Interestingly, genes with accelerated branches are significantly enriched in neural proteins, indicating that this type of protein might play a more important role than previously thought in species diversification, although they are generally not detected by PS tests. We discuss in detail several examples of genes that show lineage-specific evolutionary rate acceleration and are involved in synaptic transmission, chemosensory perception, and ubiquitination.

Introduction

The molecular clock hypothesis was formulated in the early 1960s following the observation that the number of amino acid differences between two orthologous proteins was approximately proportional to the divergence time of the two species being compared (Zuckerkandl and Pauling 1962; Margoliash 1963). Subsequently, Kimura and Ota (1974) noted that the rate of protein evolution will be approximately constant as long as the function and the tertiary structure of the molecule remains unaltered. Examples were soon identified in which alterations in the rate of evolution of a protein in a specific lineage marked important functional or structural modifications (Goodman et al. 1975; Baba et al. 1984). If the species are sufficiently close (e.g., within mammals), the comparison of the number of nonsynonymous substitutions per nonsynonymous site (dN, amino acid altering) to the number of synonymous substitutions per synonymous site (dS, silent), also known as omega (ω = dN/dS), is a useful estimator of the gene's selective pressure (if the species are too distant, the rate of synonymous substitutions may become saturated and difficult to estimate). Thus, significant differences in dN/dS in different lineages are strong indicators of changes in the impact of natural selection over time (Czelusniak et al. 1982).

In recent years, the analysis of an increasingly large number of protein sequences has shown that the assumption of a constant rate of evolution is often violated (Ohta and Ina 1995; Ayala 2000; Arbiza et al. 2006; Bedford and Hartl 2008). There are two main reasons for this. First, the specific characteristics of the species or group of species under study, including generation time, historical population size and substructure, and DNA replication fidelity, will all affect rate of change at the level of DNA sequence. For example, the number of fixed nucleotide substitutions per unit time in the mouse lineage is approximately double that of the human lineage, which is likely due in part to the shorter generation time of rodents compared with primates (Waterston et al. 2002). This will result in both higher dN and dS in the rodent lineage. Another example is the small effective population size of humans due to historical population bottlenecks, which is expected to result in an increase in fixation of slightly deleterious, and slightly advantageous, mutations through genetic drift (Ohta 1973). This agrees with the observed higher dN/dS ratio in the human lineage when compared with other lineages that have larger effective population sizes (Nielsen et al. 2005). The second factor that may lead to deviations from the predictions of the molecular clock hypothesis is lineage-specific changes in the strength and type of selection that is acting on each gene over time. For example, the accumulation of a series of advantageous mutations by positive selection (PS) in a given lineage will be observed as an increase in dN/dS in the corresponding branch. A similar effect will be observed if there is a period of relaxation of purifying selection on the protein sequence. Inversely, increased purifying selection in a given branch will be observed as a decrease in dN/dS.

A number of different methods are now available for the automatic estimation of dN and dS in different branches of a tree pertaining to a set of homologous sequences (Ronquist and Huelsenbeck 2003; Seo et al. 2004; Yang 2007). These methods permit investigation and comparison of the selective processes that have acted on different lineages, including internal branches of the tree. In the complete absence of selection, a dN/dS ratio of 1 would be expected. Most genes have a dN/dS value much lower than 1, indicating that negative, or purifying, selection predominates. A dN/dS > 1 is generally taken as indicative of PS, although a test should be performed to ensure that the probability of the observed value is significantly different from the probability of observing a value of 1 (Zhang 2003; Zhang et al. 2005). In general PS is expected to act on, at most, a few sites in a gene, and for only a short period of time, and thus an average dN/dS > 1 is very rarely observed. For this reason, methods that allow testing for PS on individual codons in specific branches of the tree, known as “branch-site (BS) models,” have become increasingly popular (Yang and Nielsen 2002; Zhang et al. 2005).

In a set of 744 core genes from 30 γ-proteobacterial species, Shapiro and Alm (2008) showed that pairs of genes with similar selective patterns were more likely to share the same cellular function, denoting concerted evolution. For example, genes involved in glycolysis and phenylalanine metabolism evolved faster in Idiomarina loihiensis, which could be explained by a shift in carbon source from sugars to amino acids in this species. A study comparing dN/dS values from four yeast species indicated that 13.2% of the genes deviated significantly from clock-like evolution, involving both branch-specific acceleration and deceleration events (Kawahara and Imanishi 2007). Studies in mammals to date have been almost exclusively limited to the identification of branches showing signs of PS. Clark et al. (2003) used human, chimpanzee, and mouse orthologous gene trios to infer branches undergoing PS using maximum likelihood methods, and Nielsen et al. (2005) performed a scan of genes likely to have been targeted by PS at any time during the evolution of humans and chimpanzees. Other studies have investigated the human and chimpanzee branches independently for positively selected genes, using rodents or macaque as the outgroup (Arbiza et al. 2006; Bakewell et al. 2007; Gibbs et al. 2007). In a recent study, Kosiol et al. (2008) performed a scan for PS in a set of mammalian genes including human, chimpanzee, macaque, mouse, rat, and dog. Of approximately 16,500 human genes that had orthologs in at least two of the other species, they found that 400 showed evidence of PS and an additional 144 genes presented lineage-specific PS. Although these studies have yielded quite different results, several of them have found enrichment for proteins involved in functions related to immunity and chemosensory perception, which have been classically associated with adaptive evolution. Together, they provide strong evidence that deviations from the molecular clock are widespread and nonrandom.

The detection of PS in mammalian genes is without doubt valuable to help unravel different molecular adaptations, including those that are specific to humans. However, there are many other aspects of mammalian gene evolution that remain unexplored. For example, it is presently unknown how often significant changes in gene selective pressure have occurred in the mammalian phylogeny. Are the vast majority of genes evolving in a clock-like fashion once we have corrected for general lineage-specific differences? In other words, is the pace of divergence of orthologous genes steady? How many genes that have undergone branch-specific evolutionary rate acceleration also show signs of PS? To address these questions, we have built a high-quality set of ∼3,000 orthologous coding sequence alignments from six mammalian species—human, macaque, mouse, rat, dog, and cow—and estimated the corresponding dN and dS values in each branch of the corresponding phylogenetic tree. In order to identify significant deviations in dN and dN/dS, we compared the observed tree for each gene with the expected tree given the general species-specific differences and the global rate of evolution of the gene. The results show that approximately a quarter of all genes have significant branch-specific deviations, denoting high evolutionary flexibility even in a relatively well-conserved set of one-to-one orthologous genes. We present several examples of how these deviations might be associated with functional modifications in the protein concerned.

Materials and Methods

Orthologous Gene Data set

A 1:1 orthologous gene data set from human, mouse, rat, cow, and dog was obtained using Compara at Ensembl (v49) where orthologs are defined using several different methods of phylogenetic tree reconstruction (Flicek et al. 2008; Vilella et al. 2009). The data set comprised 10,267 orthologous genes, with the longest transcript available taken as the representative protein, as defined in Ensembl. For each orthologous protein family, we performed a multiple sequence alignment of the amino acid sequence using Prank + F (Loytynoja and Goldman 2008). Subsequently, nucleotide coding sequence alignments were obtained based on the Prank + F protein alignments using an in-house Perl program.

Estimation of Evolutionary Rates

For each orthologous gene, we estimated the number of nonsynonymous substitutions per nonsynonymous site (dN), the number of synonymous substitutions per synonymous site (dS), and the dN/dS ratio, using maximum likelihood as implemented in the codeml program of the PAML software package (Yang 2007). Equilibrium codon frequencies of the model were used as free parameters (CodonFreq = 3). dN and dS calculations were performed on the unrooted tree ([human, macaque], [mouse, rat], and [cow, dog]). Subsequently, we labeled the branch from the common ancestor of human and macaque to the common ancestor of all sequences “primate,” and the branch from the common ancestor of mouse and rat to the common ancestor of all sequences “rodent,” thus implicitly assuming the tree topology depicted in figure 1. We also estimated dN, dS, and dN/dS using a model that considers a unique dN/dS ratio for all lineages in codeml (one-ratio model) (Yang 2007). A chi-square test based on the comparison of the likelihood obtained using the free-ratio model and the one-ratio model confirmed that the former was more appropriate in the vast majority of cases (98.4%, P value <0.01).

FIG. 1.

Pipeline used to identify branches showing significant evolutionary rate acceleration or deceleration. The example shown corresponds to the PTC bitter taste receptor (T2R38), showing significant rate acceleration in the primate branch. We compared the branch lengths (Gene Observed) to those expected (Gene Expected) given the typical rates of each branch (Species Observed) and the overall evolutionary rate of that gene (total branch length). We obtained the log2 Observed/Expected for the branch-specific dN/dS, dN, and dS estimated values. The distribution for dS was used to identify branches showing significant log2 Observed/Expected (called SB) for both dN/dS and dN (5% two tailed). Subsequently, the binomial test was applied to branches showing significant SB to determine whether the absolute number of nonsynonymous substitutions over the total number of substitutions was significantly different in the deviated branches with respect to the nondeviated branches of the same tree.

FIG. 1.

Pipeline used to identify branches showing significant evolutionary rate acceleration or deceleration. The example shown corresponds to the PTC bitter taste receptor (T2R38), showing significant rate acceleration in the primate branch. We compared the branch lengths (Gene Observed) to those expected (Gene Expected) given the typical rates of each branch (Species Observed) and the overall evolutionary rate of that gene (total branch length). We obtained the log2 Observed/Expected for the branch-specific dN/dS, dN, and dS estimated values. The distribution for dS was used to identify branches showing significant log2 Observed/Expected (called SB) for both dN/dS and dN (5% two tailed). Subsequently, the binomial test was applied to branches showing significant SB to determine whether the absolute number of nonsynonymous substitutions over the total number of substitutions was significantly different in the deviated branches with respect to the nondeviated branches of the same tree.

Orthologous Data Set Filtering

Several filters were applied to obtain a high-quality orthologous gene data set on which to perform the analysis of sequence substitution rates. We discarded orthologous gene families that included sequences with ambiguous amino acids (496 families). We then filtered out orthologous gene families in which the length of one of the proteins was shorter than half of the length of the longest protein (510 families). Next, we discarded very short alignments of length (not counting gaps) <100 base pairs trees with branches dS > 2 or dN > 2, to avoid the inclusion of nonbona fide orthologs and of branches saturated with substitutions, as well as with dS < 0.01, as such low dS values do not provide reliable estimates of the dN/dS ratio. After applying the above-mentioned filters, the size of the data set was 6,759 orthologous gene families. Finally, we imposed a filter to ensure that we were only aligning orthologous exons. This filter was required because some of the genes had different exon structures in different species, presumably due to incompleteness of transcript annotations, as well as to the presence of incorrectly annotated genes, especially in species with low expressed sequence coverage. We mapped all exon boundaries onto the protein alignments and discarded those alignments in which the region orthologous to any exon showed an overall amino acid sequence similarity of less than 50%. Merely masking poorly aligned regions, as performed in other large-scale studies aimed at the identification of genes under PS (Kosiol et al. 2008), was not considered here because protein incompleteness would have biased the analysis of the evolutionary patterns at the whole protein level. After applying this filter, 2,929 protein sequence alignments, and their associated trees with dN, dS, and dN/dS values, remained.

Identification of Genes with Branch-Specific Selective Biases

The first objective was to obtain a representative set of proteins from the complete data set to build a concatenated sequence alignment to infer dS, dN, and dN/dS values characteristic of each branch, which would encompass any lineage-specific effects on the estimation of these variables. We chose three random sets of 150 proteins for which the standard error of the mean of the dN, dS, and dN/dS values for each branch was smaller than 10% and which showed no significant differences in the distribution of dN, dS, and dN/dS values when compared with the complete data set using the Wilcoxon test (P value >0.4). The branch-specific values of dN, dS, and dN/dS for the concatenated alignments were estimated using codeml from the PAML package (Yang 2007). We stored the values for eight branches of the tree—human, macaque, primate, mouse, rat, rodent, cow, and dog—as explained in more detailed above.

Using the values obtained from the concatenated alignment (Species Observed, fig. 1), we calculated, for each orthologous gene family and its associated tree (Gene Observed, fig. 1), the expected dN, dS, and dN/dS values considering the global rate of evolution of the gene (keeping the total branch length constant) as well as the general species-specific differences (maintaining the proportions of the branches in the Species Observed tree). This tree was termed Gene Expected. Finally, the length of each branch of the Gene Observed tree was divided by the length of the corresponding branch of the Gene Expected tree, and the base 2 logarithm was applied to this ratio (log2 ratio). For dN and dN/dS, the resulting value was termed the selective bias (SB). A value of SB > 0 implied that the branch was longer than expected (accelerated), and a value of SB < 0 that it was shorter than expected (decelerated).

To determine whether the acceleration or deceleration associated with dN/dS and dN (SB) was significant and likely to be caused by differences in selective forces, we used the distribution of the log2 ratio values for dS to calculate 5% two-tailed confidence limits, that is, we used the results of the normalization procedure described above for dS values to control for stochastic variation due to the action of genetic drift alone. If the SB for the dN/dS and dN values of a particular branch of a tree was higher than the dS upper limit for that branch, that particular branch was classified as accelerated (supplementary file S3, Supplementary Material online). Similarly, if the SB for dN/dS and dN values was smaller than the dS lower limit, the branch was classified as decelerated. By imposing that both dN and dN/dS were outwith the confidence limits, we ensured that the deviation in the number of nonsynonymous substitutions in a branch (dN) was associated with changes in the selective pressure (dN/dS). Besides, in three replicates of this procedure, 2,437 trees (83%) showed identical classification of all branches into the groups accelerated, decelerated, or nondeviated. Only those trees with fully consistent branch classification in the three replicates were considered for the counts of accelerated or decelerated branches.

Binomial Test

The next step was to determine if the difference in the estimated number of nonsynonymous substitutions over synonymous substitutions, in the accelerated or decelerated branch, and in the nondeviated branches of the same tree, was sufficiently large to ensure statistical robustness of the SB. This step was introduced because some very short- or slow-evolving genes showed very strong relative differences in branch lengths but this was supported by only a few substitutions. We applied a binomial test, where k was the number of nonsynonymous substitutions in the deviated branch, n was the total number of substitutions in the deviated branch (synonymous plus nonsynonymous), and p was the ratio between nonsynonymous and total substitutions in branches of the same tree that did not show any deviation. To ensure that p was representative of the main tendency of the gene, this test was not applied to trees with more than four deviated branches, which represented a very small fraction of the trees with deviations (4.8%). p was obtained as follows: 

graphic

where, n is the number of branches that show no significant deviations in SB, 4 ≤ n ≤ 7, Ni is the estimated number of nonsynonymous substitution in branchi, and Si is the estimated number of synonymous substitution in branch i.

A P value <0.05 in the binomial test was taken as confirmation that the branch was accelerated or decelerated. No correction for multiple testing was applied at this stage because the branches on which the test was applied had already presented evidence of acceleration or deceleration as measured by the SB. Of the 1,068 trees with at most four branches with significant SB (36.5% of the initial data set), 718 passed the binomial test (24.5% of the initial data set). A Fisher test applied to the same data produced very similar results.

Fit to a Poisson Distribution

We tested whether the number of deviated branches per tree followed a Poisson distribution. If it did, it would mean that the distribution of deviated branches in the different genes was random and that there was no tendency for them to cluster on particular genes. First, we calculated the fitted Poisson distribution with the visualizing categorical data package of the R statistical package, using the MinChisq method. Second, we tested whether the observed counts and the fitted Poisson distribution counts were significantly different. For zero deviated branches, we observed 2,211 trees compared with 2,125 in the fitted Poisson; for one deviated branch, 564 compared with 682; for two deviated branches, 126 to 109; for three deviated branches, 25 to 12; and for four deviated branches, 3 to 1. The difference between the original distribution and the fitted Poisson distribution was highly significant according to a chi-square test (P < 10−9). This was due to an excess of trees with zero, two, three, or four deviated branches, and a lack of trees with one deviated branch, in the observed data set.

Positive Selection Tests

The BS model A in the PAML package was used to test for PS in each branch of each tree of the data set (Zhang et al. 2005). This test allows detection of PS acting on only a few sites in a specific lineage. The branch to be tested was the foreground branch, whereas the rest of the branches were the background branches. In the null model, all codons in the foreground and background branches were evolving under negative selection (dN/dS < 1) or neutrally (dN/dS = 1), whereas in the alternative model, the foreground branch also included sites evolving under PS (dN/dS > 1). PS in the foreground branch was inferred if the likelihood of the observation of the gene sequences was significantly higher under the alternative model than under the null model. In the latter, no codons are evolving under PS in the foreground branch. We used a q value of 0.05, corresponding to a false discovery rate of 5%, to correct for multiple testing (Storey and Tibshirani 2003).

Amino Acid Replacements

We counted the number of observed lineage-specific amino acid replacements in the protein sequence alignments using an in-house Perl program. We focused on replacements that were specific to each lineage, that is, on positions in which all sequences had the same amino acid except for the sequence of interest. For the internal branches (primate and rodent), we counted the positions in which all sequences had the same amino acids except the two sequences corresponding to the derived branches, which shared a different amino acid. The branch of interest was the focal branch and the rest of branches, the background branches. The amino acid replacements were classified as radical if they were associated with a change of polarity group (polar: C, N, Q, S, T, and Y; nonpolar: A, F, G, I, L, M, P, V, and W; positively charged: H, K, and R; and negatively charged: D and E), and nonradical if they did not imply a change of polarity group.

First, we compared the ratio of radical to nonradical amino acid replacements in deviated versus nondeviated branches. We found a significant excess of radical replacements in accelerated branches with respect to nondeviated branches (chi-square test, P < 0.01) and no significant differences in decelerated branches with respect to nondeviated branches. We next compared the relative frequency of all possible types of radical replacements (e.g., replacement of a nonpolar amino acid by a positively charged amino acid) in accelerated versus nondeviated branches. To avoid biases due to the different nature of genes with or without accelerated branches, this comparison was restricted to genes containing accelerated branches only. We calculated the log2 of the ratio of the relative frequency of the replacement in accelerated versus nondeviated branches, so that values over 0 indicated higher frequency in accelerated branches and below 0 higher frequency in nonaccelerated branches. Again, a chi-square test was used to determine if the differences were significant.

Gene Ontology Analysis

A Gene Ontology (GO) (Ashburner et al. 2000) description for each protein in the data set was obtained using BioMart at Ensembl (Flicek et al. 2008). We have to be aware that our initial data set is already biased toward some GO descriptions because it is comprised a set of proteins conserved among six mammalian species. For this reason, the GO analysis was performed comparing the genes that showed some kind of deviation (acceleration or deceleration) with the remaining genes in the data set. The proportion of genes with GO annotations was 83% in the complete data set (2,437 out of 2,929), 92% in the genes with accelerated branches (245 out of 265), and 92% in the genes with decelerated branches (301 out of 306). P values were computed using Fisher's exact test.

Statistical Tests and Graphics

The R statistical software package (R 2007) was used to perform the Wilcoxon test, the chi-square test, the binomial test, and the Fisher exact test. As already mentioned, we also used R to compare the number of deviated branches in the trees to those expected under a fitted Poisson distribution. For the q-value test the QVALUE R library was used. We wrote a specific R script to automate the normalization procedure for dN/dS, dN, and dS values. Graphs were generated using R software.

Results

Identification of Lineage-Specific Evolutionary Rate Deviations

We generated multiple sequence alignments for 10,267 orthologous protein families using Prank + F (Loytynoja and Goldman 2008). This program minimizes the number of incorrectly aligned homologous positions by using a more balanced rate of insertions and deletions than other commonly used multiple alignment programs. This was of great importance to avoid overestimation of the proportion of nonsynonymous, compared with synonymous, substitutions. For each family, we estimated the number of nonsynonymous substitutions per nonsynonymous site (dN) and the number of synonymous substitutions per synonymous site (dS) on each branch of the species tree depicted in figure 1. We then applied a number of rigorous filters to ensure a high-quality data set for further study (see Material and Methods and table 1). This filtering was required because the data set included genomes with varying degrees of gene annotation reliability. Ultimately, we obtained a high-quality data set of 2,929 multiple sequence alignments and their corresponding trees (supplementary file 1, Supplementary Material online). Taking all the branches together, we found a strong positive correlation between all dN/dS and dN values and a very small negative correlation between dN/dS and dS values, indicating that the estimations of these parameters were robust and that variations in dN/dS mainly reflected variations in dN (supplementary file 2, S1, Supplementary Material online).

Table 1.

Filters Applied to the Initial Data Set of Orthologous Coding Sequence Alignments.

 Number of Alignments 
Initial set of 1:1 orthologs 10,267 
Filters 
    Ambiguous sequence 9,771 
    Nonconserved length 9,261 
    Unreliable dN or dS estimates 6,759 
    Poor alignment quality 2,929 
 Number of Alignments 
Initial set of 1:1 orthologs 10,267 
Filters 
    Ambiguous sequence 9,771 
    Nonconserved length 9,261 
    Unreliable dN or dS estimates 6,759 
    Poor alignment quality 2,929 

The vast majority of trees, 2,883 out of 2,929 (98.4%, P value <0.01), did not follow a model of a unique dN/dS ratio for all lineages (Yang 2007). This nonclock-like behavior is the result of two different components: species-specific differences in the efficiency of selection and gene-specific variations in selective pressure in different branches. Species-specific effects were evident when we examined the dN/dS distributions (table 2; supplementary file 2, S2, Supplementary Material online). The average and median dN/dS were larger in the primates (human, macaque, and primate internal branch) than in the other branches, probably indicating weaker purifying selection as a consequence of small effective population size (Ohta 1973; Nielsen et al. 2005). This observation highlights the importance of correcting for species-specific dN/dS deviations when comparing the rate of sequence substitutions of a gene in different branches.

Table 2.

Summary Statistics for Substitution Rates in 2,929 Mammalian 1:1 Gene Orthologs.

 dN/dS
 
dN
 
dS
 
Branch Mean Median SD Mean Median SD Mean Median SD 
Homo sapiens 0.163 0.099 0.208 0.005 0.003 0.005 0.036 0.030 0.023 
Macaca mulatta 0.151 0.089 0.195 0.005 0.003 0.007 0.043 0.037 0.027 
Primate 0.139 0.091 0.162 0.015 0.011 0.015 0.125 0.107 0.080 
Mus musculus 0.131 0.080 0.185 0.011 0.008 0.011 0.098 0.093 0.043 
Rattus norvergicus 0.122 0.079 0.154 0.011 0.008 0.012 0.104 0.100 0.045 
Rodent 0.098 0.074 0.090 0.036 0.027 0.034 0.391 0.351 0.186 
Bos taurus 0.119 0.085 0.120 0.025 0.018 0.025 0.239 0.204 0.145 
Canis familiaris 0.115 0.081 0.116 0.021 0.015 0.022 0.203 0.171 0.132 
 dN/dS
 
dN
 
dS
 
Branch Mean Median SD Mean Median SD Mean Median SD 
Homo sapiens 0.163 0.099 0.208 0.005 0.003 0.005 0.036 0.030 0.023 
Macaca mulatta 0.151 0.089 0.195 0.005 0.003 0.007 0.043 0.037 0.027 
Primate 0.139 0.091 0.162 0.015 0.011 0.015 0.125 0.107 0.080 
Mus musculus 0.131 0.080 0.185 0.011 0.008 0.011 0.098 0.093 0.043 
Rattus norvergicus 0.122 0.079 0.154 0.011 0.008 0.012 0.104 0.100 0.045 
Rodent 0.098 0.074 0.090 0.036 0.027 0.034 0.391 0.351 0.186 
Bos taurus 0.119 0.085 0.120 0.025 0.018 0.025 0.239 0.204 0.145 
Canis familiaris 0.115 0.081 0.116 0.021 0.015 0.022 0.203 0.171 0.132 

To measure the second component associated with the observed nonuniformity of dN/dS, related to gene-specific changes in selective pressure in different branches, we designed a procedure that first corrects for species-specific deviations and then uses the variability associated with dS to control for the stochastic nature of mutations (fig. 1, see Materials and Methods for more details). The Species Observed tree captured any deviations due to species-specific differences alone and was obtained with 150 coding sequences taken at random. The Gene Observed tree was calculated for each of the 2,929 genes in the filtered data set. The Gene Expected tree was obtained by keeping the same total branch length of the Gene Observed tree but relative branch lengths as in the Species Observed tree. This procedure is similar to that described by Shapiro and Alm (2008), although we applied it to trees with branch-specific estimated dN, dS, and dN/dS instead of branch-specific rates of amino acid substitutions. Subsequently, we calculated the log2 ratio of branch lengths (be it dN, dS, or dN/dS) for the Gene Observed and Gene Expected trees for each branch. To estimate the significance of the log2 ratio of observed versus expected for dN and dN/dS in each branch (called SB), we used the distribution of the log2 ratio for dS for that branch. This allowed us to control for the range of variation in substitution rates due to mutational stochasticity. The 2.5% upper and lower tails (5% two tailed) of this distribution were used to identify putatively accelerated and decelerated branches, with significantly high or low SB, respectively (supplementary file 2, S3, Supplementary Material online). These branches were further examined to determine if the absolute proportion of nonsynonymous to synonymous substitutions estimated by codeml was significantly different to the proportion observed in the branches that showed no deviations in dN or dN/dS in the same tree, using a binomial test (fig. 1). At a P value cutoff of 0.05, this test discarded around 30% of the genes previously identified as having accelerated or decelerated branches. These tended to be short- or slow-evolving genes that had too few substitutions for the observed differences to be statistically robust.

Nature of Genes Showing Branch-Specific Evolutionary Rate Deviations

The procedure described above identified 718 trees, out of the 2,929 analyzed (24.5%), which had at least one accelerated or decelerated branch (table 3; supplementary file 3, Supplementary Material online for full details), the majority of which had only one significantly deviated branch. Overall, we found a similar number of accelerated and decelerated branches. As expected, accelerated branches tended to have higher dN/dS than the average dN/dS for that branch across all trees and decelerated branches lower dN/dS values than the average dN/dS (supplementary file 2, S4, Supplementary Material online). A similar number of accelerated branches were observed in human and macaque (70 and 77, respectively; supplementary file 2, S5, Supplementary Material online) as well as in mouse and rat (51 and 55, respectively). Deceleration events tended to be more common in longer branches (such as the rodent branch) than shorter branches. This can be explained by the fact that the binomial test is more powerful in longer branches as they accumulate more substitution events. This is especially so in the case of rate deceleration, involving few changes. We also tested whether the number of deviated branches per tree was randomly distributed. We found that the observed data was significantly different from a fitted Poisson distribution (P < 10−9), and therefore the distribution was not random. Interestingly, there was an excess of trees with two or more deviated branches, indicating that some genes are likely to be especially prone to undergo changes in selective pressure.

Table 3.

Trees with Significant Branch-Specific Evolutionary Rate Deviations.

Total Trees N (%) 
Trees with significant evolutionary rate deviations 718 (24.5) 
    Trees with one deviated branch 564 (19.3) 
        Accelerateda 262 (8.9) 
        Decelerated 302 (10.3) 
Trees with two or more deviated branches 154 (5.3) 
    All accelerated 3 (0.1) 
    All decelerated 29 (1.0) 
    Accelerated and decelerated 122 (4.2) 
Total Trees N (%) 
Trees with significant evolutionary rate deviations 718 (24.5) 
    Trees with one deviated branch 564 (19.3) 
        Accelerateda 262 (8.9) 
        Decelerated 302 (10.3) 
Trees with two or more deviated branches 154 (5.3) 
    All accelerated 3 (0.1) 
    All decelerated 29 (1.0) 
    Accelerated and decelerated 122 (4.2) 
a

Accelerated, higher dN and dN/dS than expected; decelerated, lower dN and dN/dS than expected.

P < 0.05.

The analysis of overrepresented GO terms (Ashburner et al. 2000) in the set of genes associated with accelerated and decelerated branches provided interesting insights into the nature of these genes (table 4, P < 0.05 using the Fisher test). Genes containing accelerated branches were enriched in several terms related to the nervous system, such as neuropeptide signaling pathway or Synapse. In contrast, genes containing decelerated branches were enriched in a different set of terms, including Protein complex assembly and cell–cell signaling. Interestingly, both sets were enriched in membrane proteins, which includes many potential cell receptors. Branch-specific variations in dN and dN/dS in these genes may be related to changes in affinities for different targets.

Table 4.

GO Terms Overrepresented in Genes with Accelerated and Decelerated Branches.

GO Description Genes with Deviated Branches Genes with No Deviated Branches Odds Ratio Fisher Test 
Nda (%) Nnda (%) %d/%nd P value 
Accelerated 
Biological process 
    Neuropeptide signaling pathway 5 (2.46) 11 (0.61) 4.11 0.017 
Cellular component 
    Synapse 13 (6.31) 29 (1.36) 4.89 3.31 × 10-5 
    Cell junction 15 (7.28) 45 (2.11) 3.65 1.31 × 10-4 
    Postsynaptic membrane 7 (3.40) 23 (1.08) 3.23 0.013 
    Cytoplasmic vesicle 7 (3.40) 29 (1.36) 2.55 0.034 
    Membrane 91 (44.17) 739 (34.58) 1.49 0.007 
Molecular function 
    Chromatin binding 5 (2.31) 14 (0.66) 3.55 0.026 
    Ion channel activity 12 (5.56) 41 (1.94) 2.97 0.003 
Decelerated 
Biological process 
    Protein complex assembly 6 (2.39) 13 (0.70) 3.46 0.019 
    Vesicle-mediated transport 9 (3.59) 23 (1.24) 2.95 0.010 
    Cell–cell signaling 11 (4.38) 38 (2.05) 2.18 0.040 
Cellular component 
    Chromatin 5 (1.96) 9 (0.43) 4.58 0.013 
    Soluble fraction 10 (3.92) 31 (1.49) 2.69 0.011 
    Membrane 109 (42.75) 725 (34.92) 1.39 0.015 
Molecular function 
    Structural molecule activity 11 (3.96) 38 (1.83) 2.21 0.039 
GO Description Genes with Deviated Branches Genes with No Deviated Branches Odds Ratio Fisher Test 
Nda (%) Nnda (%) %d/%nd P value 
Accelerated 
Biological process 
    Neuropeptide signaling pathway 5 (2.46) 11 (0.61) 4.11 0.017 
Cellular component 
    Synapse 13 (6.31) 29 (1.36) 4.89 3.31 × 10-5 
    Cell junction 15 (7.28) 45 (2.11) 3.65 1.31 × 10-4 
    Postsynaptic membrane 7 (3.40) 23 (1.08) 3.23 0.013 
    Cytoplasmic vesicle 7 (3.40) 29 (1.36) 2.55 0.034 
    Membrane 91 (44.17) 739 (34.58) 1.49 0.007 
Molecular function 
    Chromatin binding 5 (2.31) 14 (0.66) 3.55 0.026 
    Ion channel activity 12 (5.56) 41 (1.94) 2.97 0.003 
Decelerated 
Biological process 
    Protein complex assembly 6 (2.39) 13 (0.70) 3.46 0.019 
    Vesicle-mediated transport 9 (3.59) 23 (1.24) 2.95 0.010 
    Cell–cell signaling 11 (4.38) 38 (2.05) 2.18 0.040 
Cellular component 
    Chromatin 5 (1.96) 9 (0.43) 4.58 0.013 
    Soluble fraction 10 (3.92) 31 (1.49) 2.69 0.011 
    Membrane 109 (42.75) 725 (34.92) 1.39 0.015 
Molecular function 
    Structural molecule activity 11 (3.96) 38 (1.83) 2.21 0.039 
a

Only GO terms with at least five annotated genes in any data set were analyzed.

Identification of Positive Selection

We investigated in more detail genes showing accelerated branches, as these branches are likely to represent lineage-specific adaptive processes or, alternatively, lineage-specific relaxation of purifying selection. There were 395 accelerated branches (in 13.5% of the initial genes) at P < 0.05. To better understand the possible importance of PS on the accelerated branches, we intersected the accelerated branches with the branches that were significant for PS according to the BS test (Zhang et al. 2005) and using a false discovery rate correction threshold of 5%. As expected, accelerated branches were strongly enriched in events of PS (17.7% in comparison with 2.2% for all branches in the 2,929 trees; supplementary file 2, S6, Supplementary Material online; see also supplementary file 4, Supplementary Material online for a complete list of genes with PS). Curiously, of the 16 accelerated branches with dN/dS > 1, only two showed evidence of PS in the BS test. So, the majority of accelerated branches (82.3%) did not show significant signs of PS. The converse was also true, most branches with evidence of PS according to the BS test (86.6%) were not classified as accelerated. This simply means that we are measuring very different aspects of gene evolution by the two methods. Inspection of branches with signs of PS but that had not been classified as accelerated confirmed that their overall dN/dS were comparable with the rest of branches in the tree.

We noted that the proportion of genes for which there was significant PS varied greatly in different branches. For example, whereas the human branch only had 11 cases of PS in 2,929 genes (0.4%), the macaque branch had a total of 66 cases of PS (2.5%). This difference is difficult to explain in biological terms, neither is it consistent with the very similar proportion of accelerated branches observed in the lineages. The macaque genome assembly is of lower quality than the human genome assembly, and it has been recently shown that the BS method is particularly sensitive to errors in sequence (Mallick et al. 2009; Schneider et al. 2009). Therefore, it seems likely that, in spite of the quality filters employed, the excess of PS in some branches is due to a few remaining sequencing or gene annotation errors scattered throughout the sequence. The lineage with the largest number of branches under PS was the dog (8.4% of the genes). This again may be inflated as a result of errors in the sequence. We also have to consider that the process of domestication of dog breeds is associated with an accumulation of deleterious mutations (Cruz et al. 2008), which might confound estimates of PS.

Focusing on the species with the highest quality gene sequence, human and mouse, we can conclude that the vast majority of accelerated branches did not show evidence of PS using the BS method, even when the false discovery rate correction was not applied (18.6% of PS in human accelerated branches, 11.8% in mouse accelerated branches; supplementary file 2, S6, Supplementary Material online). This may partly be due to conservativeness of the test (Zhang et al. 2005), and partly to cases in which relaxation of selective constraints is the main reason for the observed pattern.

Excess of Radical Amino Acid Replacements in Accelerated Branches

To gain further insights into the nature of the modifications that have taken place in the accelerated branches, we compared the type of replacements observed in the accelerated branch with those of the rest of the branches of the tree. For clarity, we focused on the positions of the alignment on which the sequence/s corresponding to the branch of interest (focal branch) had a different amino acid from the amino acid shared by all the other sequences (background). We grouped the replacements into two types, those that involved a change of polarity group (nonpolar, polar, negatively charged, and positively charged; see Materials and Methods), which we called radical, and those that did not, which we called nonradical in this particular context. It has been found that substitutions for dissimilar amino acids are more likely to be deleterious but are also more likely to be advantageous (Yampolsky et al. 2005; Gojobori et al. 2007). The ratio of radical changes to nonradical changes in the branches that were accelerated was 1.06 (1,361 radical changes and 1,290 nonradical changes), whereas the ratio was 0.85 (3,559 radical changes and 4,202 nonradical changes) in the nondeviated branches of the corresponding trees. Therefore, accelerated branches were associated with significantly more radical changes than nondeviated branches (chi-square test, P < 0.01). As a control, we also compared the decelerated branches with the rest of the branches of the tree, observing no significant difference in the ratio of radical to nonradical changes. We next asked the question whether there were any biases in the replacements between amino acids of different polarity groups in accelerated branches with respect to the remaining branches (table 5). Accelerated branches showed an excess of replacements that involved both the gain or loss of charged amino acids, which are expected to be critical for the function of the protein.

Table 5.

Radical Replacements in Accelerated Versus Nondeviated Branches of the Same Tree.

 Focal Branch
 
Background Branches Neutral Nonpolar Neutral Polar Positively Charged Negatively Charged 
Neutral nonpolar — −0.23** 0.51* 0.51 
Neutral polar −0.05 — −0.25 −0.03 
Positively charged 0.98** −0.05 — 0.84 
Negatively charged 0.39 0.06 −0.09 — 
 Focal Branch
 
Background Branches Neutral Nonpolar Neutral Polar Positively Charged Negatively Charged 
Neutral nonpolar — −0.23** 0.51* 0.51 
Neutral polar −0.05 — −0.25 −0.03 
Positively charged 0.98** −0.05 — 0.84 
Negatively charged 0.39 0.06 −0.09 — 

The log2 ratio between different replacement types is shown. Background branches represent the ancestral state and the focal branch is the one in which the mutation has been fixed. Values higher than 0 indicate that the replacement is more frequent in accelerated branches and lower than 0 in nondeviated branches.

*P < 0.05, **P < 0.01.

Examples of Genes Showing Branch-Specific Evolutionary Rate Acceleration

Table 6 provides a selected list of genes with accelerated branches (binomial test P < 10−3), grouped into several functional classes. Genes in the “nervous system” group are associated with several overrepresented GO terms in accelerated branches, such as Synapse and neuropetide signaling pathway (table 4). Genes in the “interaction with the environment” mostly consist of membrane proteins, also overrepresented in the GO analysis, and include several chemosensory receptors. In contrast to the neural receptors, these genes tend to be very rapidly evolving. “Regulation of transcription” is associated with the overrepresented GO term chromatin binding, which is basically composed of transcription factors. The fourth group is “metabolism,” which, although not particularly enriched in genes with accelerated branches, is included for the purpose of comparison. Finally, the group “other functions” includes a selection of genes involved in a variety of different functions.

Table 6.

Selection of Genes Showing Branch-Specific Evolutionary Rate Acceleration.

Gene name (HGC) Human Ensembl Protein ID Accelerated Brancha dN/dS Acceleration dN/dS Otherb PS (BS)c Description 
Interaction with the environment 
    OR51S1 ENSP00000322754 Human 1.36 0.23 No OR51S1 
    OR4K5 ENSP00000319511 Human 0.99 0.25 No OR4K5 
    OR2A4 ENSP00000319546 Human 0.98 0.19 No OR2A4 
    TAS2R41 ENSP00000375343 Macaque 1.86 0.46 Yes Taste receptor type 2 member 41 
    CXCR3 ENSP00000362795 Macaque 0.86 0.08 No C-X-C chemokine receptor type 3 
    TAS2R38 ENSP00000331291 Primate 1.17 0.38 Yes Taste receptor type 2 member 38 
 RCV1 ENSP00000226193 Primate 0.22 0.06 No Recoverin, implicated in sensory transduction 
    CXCR6 ENSP00000304414 Rat 0.68 0.31 No C-X-C chemokine receptor type 6 
Nervous system 
    GRIN3A ENSP00000355155 Human 0.25 0.07 No GluR (NMDA) subunit 3A 
    CHRNA1 ENSP00000261007 Human 0.29 0.04 Yes Acetylcholine receptor subunit alpha 
    SYT12 ENSP00000310733 Macaque 0.54 0.03 No Synaptotagmin-12 
    CHRM5 ENSP00000331108 Macaque 0.65 0.16 No Muscarinic acetylcholine receptor M5 
    GRIK1 ENSP00000327687 Primate 0.09 0.01 No GluR (NMDA) 5 
    SSTR2 ENSP00000350198 Mouse 0.14 0.06 Yes Somatostatin receptor type 2 
    GABRP ENSP00000265294 Rat 0.21 0.09 Yes Gamma-aminobutyric-acid receptor sub. pi 
    LPHN2 ENSP00000359752 Rat 0.08 0.04 No Latrophilin-2 precursor 
    GRASP ENSP00000293662 Dog 0.10 0.03 Yes General receptor for phosphoinositides 1 
    GABRB1 ENSP00000295454 Dog 0.11 0.02 No Gamma-aminobutyric-acid receptor sub. b1 
    SYNGR1 ENSP00000332287 Dog 0.09 0.03 Yes Synaptogyrin-1 
Regulation of transcription 
    GRHL2 ENSP00000251808 Macaque 0.25 0.03 Yes Transcription factor CP2-like 3 
    ZNF609 ENSP00000316527 Mouse 0.16 0.06 No Zinc finger protein 609 
    ZNF7 ENSP00000320627 Mouse 0.44 0.08 No Zinc finger protein 7 
    DRGX ENSP00000363254 Mouse 0.42 0.06 No Dorsal root ganglia homeobox 
    HEY1 ENSP00000338272 Rat 0.44 0.06 No Hairy related transcription factor 1 
    C19orf2 ENSP00000312530 Rodent 0.33 0.2 No RNA polymerase II sub. 5-mediating prot. 
    NR5A2 ENSP00000356331 Rodent 0.11 0.02 No Alpha-1-fetoprotein transcription factor 
    RUNX1 ENSP00000382189 Rodent 0.05 0.02 Yes Runt-related transcription factor 1 
    ZNF143 ENSP00000379847 Dog 0.12 0.03 Yes Zinc finger protein 143 
    NKX2-3 ENSP00000342828 Dog 0.15 0.06 No Homeobox protein Nkx-2.3 
Metabolism 
    DHPS ENSP00000210060 Human 0.36 0.05 No Deoxyhypusine synthase 
    MGAT5B ENSP00000364138 Human 0.11 0.03 No α-1,6-mannosylglyc. 6-β-N-acetyglucosaminyltransf. B 
    DGAT2L4 ENSP00000276101 Macaque 1.84 0.30 Yes Acyl-CoA wax alcohol acyltransferase 2 
    DNPEP ENSP00000273075 Primate 0.25 0.07 No Aspartyl aminopeptidase 
    B3GALT2 ENSP00000356404 Mouse 0.16 0.03 No Beta-1,3-galactosyltransferase 2 
    GMPS ENSP00000295920 Rat 0.24 0.05 No GMP synthase 
    ALG2 ENSP00000238477 Rodent 0.26 0.12 No Alpha-1,3-mannosyltransferase ALG2 
    MLSTD2 ENSP00000346874 Dog 0.21 0.04 Yes Fatty acyl-CoA reductase 1 
    ALDHX ENSP00000366927 Cow 0.24 0.05 No Aldehyde dehydrogenase X 
Other functions 
    FBX039 ENSP00000321386 Human 1.34 0.07 No F-box only protein 39 
    KIAA0391 ENSP00000250377 Human 2.5 0.25 Yes Mitochondrial ribonuclease P protein 3 
    PLA2G2C ENSP00000247992 Human 1.9 0.18 Yes Phospholipase A2, group IIC 
    IFT74 ENSP00000369402 Human 1.35 0.09 Yes Intraflagellar transport 74 homolog 
    TEKT2 ENSP00000207457 Macaque 0.47 0.14 Yes Tektin-2, cilium biogenesis 
    PDCD4 ENSP00000280154 Macaque 0.21 0.05 No Programmed cell death protein 4 
    CCNL1 ENSP00000295926 Macaque 0.26 0.02 Yes Cyclin-L1 
    LRRC49 ENSP00000260382 Primate 0.33 0.06 No Tubulin polyglutamylase complex subunit 4 
    NUP205 ENSP00000285968 Mouse 0.13 0.06 No Nuclear pore complex protein Nup205 
    RBM45 ENSP00000376298 Mouse 0.27 0.06 No Developmentally-regulated RNA-bind. prot. 1 
    WDR51A ENSP00000296484 Rat 0.30 0.08 Yes WD repeat protein 51A 
    RGS2 ENSP00000235382 Dog 0.22 0.04 Yes Regulator of G-protein signaling 2 
Gene name (HGC) Human Ensembl Protein ID Accelerated Brancha dN/dS Acceleration dN/dS Otherb PS (BS)c Description 
Interaction with the environment 
    OR51S1 ENSP00000322754 Human 1.36 0.23 No OR51S1 
    OR4K5 ENSP00000319511 Human 0.99 0.25 No OR4K5 
    OR2A4 ENSP00000319546 Human 0.98 0.19 No OR2A4 
    TAS2R41 ENSP00000375343 Macaque 1.86 0.46 Yes Taste receptor type 2 member 41 
    CXCR3 ENSP00000362795 Macaque 0.86 0.08 No C-X-C chemokine receptor type 3 
    TAS2R38 ENSP00000331291 Primate 1.17 0.38 Yes Taste receptor type 2 member 38 
 RCV1 ENSP00000226193 Primate 0.22 0.06 No Recoverin, implicated in sensory transduction 
    CXCR6 ENSP00000304414 Rat 0.68 0.31 No C-X-C chemokine receptor type 6 
Nervous system 
    GRIN3A ENSP00000355155 Human 0.25 0.07 No GluR (NMDA) subunit 3A 
    CHRNA1 ENSP00000261007 Human 0.29 0.04 Yes Acetylcholine receptor subunit alpha 
    SYT12 ENSP00000310733 Macaque 0.54 0.03 No Synaptotagmin-12 
    CHRM5 ENSP00000331108 Macaque 0.65 0.16 No Muscarinic acetylcholine receptor M5 
    GRIK1 ENSP00000327687 Primate 0.09 0.01 No GluR (NMDA) 5 
    SSTR2 ENSP00000350198 Mouse 0.14 0.06 Yes Somatostatin receptor type 2 
    GABRP ENSP00000265294 Rat 0.21 0.09 Yes Gamma-aminobutyric-acid receptor sub. pi 
    LPHN2 ENSP00000359752 Rat 0.08 0.04 No Latrophilin-2 precursor 
    GRASP ENSP00000293662 Dog 0.10 0.03 Yes General receptor for phosphoinositides 1 
    GABRB1 ENSP00000295454 Dog 0.11 0.02 No Gamma-aminobutyric-acid receptor sub. b1 
    SYNGR1 ENSP00000332287 Dog 0.09 0.03 Yes Synaptogyrin-1 
Regulation of transcription 
    GRHL2 ENSP00000251808 Macaque 0.25 0.03 Yes Transcription factor CP2-like 3 
    ZNF609 ENSP00000316527 Mouse 0.16 0.06 No Zinc finger protein 609 
    ZNF7 ENSP00000320627 Mouse 0.44 0.08 No Zinc finger protein 7 
    DRGX ENSP00000363254 Mouse 0.42 0.06 No Dorsal root ganglia homeobox 
    HEY1 ENSP00000338272 Rat 0.44 0.06 No Hairy related transcription factor 1 
    C19orf2 ENSP00000312530 Rodent 0.33 0.2 No RNA polymerase II sub. 5-mediating prot. 
    NR5A2 ENSP00000356331 Rodent 0.11 0.02 No Alpha-1-fetoprotein transcription factor 
    RUNX1 ENSP00000382189 Rodent 0.05 0.02 Yes Runt-related transcription factor 1 
    ZNF143 ENSP00000379847 Dog 0.12 0.03 Yes Zinc finger protein 143 
    NKX2-3 ENSP00000342828 Dog 0.15 0.06 No Homeobox protein Nkx-2.3 
Metabolism 
    DHPS ENSP00000210060 Human 0.36 0.05 No Deoxyhypusine synthase 
    MGAT5B ENSP00000364138 Human 0.11 0.03 No α-1,6-mannosylglyc. 6-β-N-acetyglucosaminyltransf. B 
    DGAT2L4 ENSP00000276101 Macaque 1.84 0.30 Yes Acyl-CoA wax alcohol acyltransferase 2 
    DNPEP ENSP00000273075 Primate 0.25 0.07 No Aspartyl aminopeptidase 
    B3GALT2 ENSP00000356404 Mouse 0.16 0.03 No Beta-1,3-galactosyltransferase 2 
    GMPS ENSP00000295920 Rat 0.24 0.05 No GMP synthase 
    ALG2 ENSP00000238477 Rodent 0.26 0.12 No Alpha-1,3-mannosyltransferase ALG2 
    MLSTD2 ENSP00000346874 Dog 0.21 0.04 Yes Fatty acyl-CoA reductase 1 
    ALDHX ENSP00000366927 Cow 0.24 0.05 No Aldehyde dehydrogenase X 
Other functions 
    FBX039 ENSP00000321386 Human 1.34 0.07 No F-box only protein 39 
    KIAA0391 ENSP00000250377 Human 2.5 0.25 Yes Mitochondrial ribonuclease P protein 3 
    PLA2G2C ENSP00000247992 Human 1.9 0.18 Yes Phospholipase A2, group IIC 
    IFT74 ENSP00000369402 Human 1.35 0.09 Yes Intraflagellar transport 74 homolog 
    TEKT2 ENSP00000207457 Macaque 0.47 0.14 Yes Tektin-2, cilium biogenesis 
    PDCD4 ENSP00000280154 Macaque 0.21 0.05 No Programmed cell death protein 4 
    CCNL1 ENSP00000295926 Macaque 0.26 0.02 Yes Cyclin-L1 
    LRRC49 ENSP00000260382 Primate 0.33 0.06 No Tubulin polyglutamylase complex subunit 4 
    NUP205 ENSP00000285968 Mouse 0.13 0.06 No Nuclear pore complex protein Nup205 
    RBM45 ENSP00000376298 Mouse 0.27 0.06 No Developmentally-regulated RNA-bind. prot. 1 
    WDR51A ENSP00000296484 Rat 0.30 0.08 Yes WD repeat protein 51A 
    RGS2 ENSP00000235382 Dog 0.22 0.04 Yes Regulator of G-protein signaling 2 
a

The binomial test was significant at P < 10−3.

b

dN/dS other is the mean dN/dS of the remaining branches of the tree.

c

PS detected by the BS method in PAML (Q < 0.05, false positive rate discovery).

Example 1: Phenylthiocarbamide Bitter Taste Receptor (T2R38)

The phenylthiocarbamide (PTC) bitter taste receptor T2R38 (Chr.7q) belongs to a family of bitter taste receptors (T2R), which are G-protein-coupled receptors with seven transmembrane domains. In humans, 25 such receptors have been found distributed in two clusters on chromosomes 7 and 12 (Conte et al. 2003; Shi et al. 2003). Bitter taste is one of the basic tastes in mammals, together with sweet, salty, sour, and savory/umami (Kinnamon and Cummings 1992; Lindemann 1996). The ability, or lack thereof, to taste PTC, a bitter taste compound, has been known for a long time but has only recently been associated with T2R38 (Kim et al. 2003). As bitter taste is commonly found in toxic substances, the ability to discern bitter taste may have been selectively advantageous (Fischer et al. 2005). Multiple studies have found evidence that bitter taste receptors are likely to have been subject to PS, especially with regards to amino acid replacements in the extracellular regions (Shi et al. 2003; Fischer et al. 2005; Go et al. 2005; Kosiol et al. 2008).

In our analysis, T2R38 was classified as accelerated in the primate branch, with a dN/dS of 1.17 (figs. 1 and 2). We observed six radical amino acid replacements common to the human and macaque sequences in conserved positions in the other sequences. Three of the radical replacements were in the extracellular region and involved changes in the charge of the amino acids concerned, which may have affected ligand recognition. PS in this branch was detected by the BS test, and three of the radical replacements had a Bayesian posterior probability of PS over 0.95 (see fig. 2 for more details).

FIG. 2.

Alignment of the PTC bitter taste receptor. The three domains—extracellular, cytoplasmic, and transmembrane—are indicated. This gene showed specific acceleration in the primate branch. Replacements that are specific to this branch are shown.

FIG. 2.

Alignment of the PTC bitter taste receptor. The three domains—extracellular, cytoplasmic, and transmembrane—are indicated. This gene showed specific acceleration in the primate branch. Replacements that are specific to this branch are shown.

F-box only Protein 39 (FBXO39)

The F-box only protein 39 is a member of the F-box protein family that forms part of the Skp, Cullin, F-box (SCF) ubiquitin ligase complex involved in protein ubiquitylation. The SCF complex is formed by a modular E3 core (CUL1 and RBX1) and a substrate specificity module (SKP1 and a F-box protein) (Cardozo and Pagano 2004). F-box proteins are adaptor proteins that contain at least two domains: a F-box motif to interact with SKP1 and another domain, most frequently a WD40 or leucine rich repeats domain, to bind ubiquitylation targets (Jin et al. 2004). A single F-box protein is able to recognize several substrates, expanding the range of action of the SCF complex (Skaar et al. 2009).

In our data set, FBXO39 had a dN/dS of 1.34 in the human branch and was classified as accelerated in the human and decelerated in the dog. It showed signs of PS according to the BS method (P = 0.02), although it was not significant following false discovery rate (FDR) correction. Of the six radical changes specific to the human branch, five were in the C-terminus (fig. 3). Two of the replacements involved a possible formation of a new cysteine disulfide bond due to the substitution of a W for a C and an F for a C separated by only 21 amino acids. We inspected these positions in the chimpanzee and orangutan orthologs and found that the first C corresponds to a W in orangutan (as in the ancestral state), whereas in chimpanzee, there is a Y. At the second C, we found a C in orangutan and in chimpanzee. So, the first change is human specific, whereas the second appears to be hominoid specific. Interestingly, we found that other F-box only proteins show specific evolutionary rate acceleration in other branches, for example, F-box only protein 28 in macaque and F-box only protein 6 in mouse (supplementary file 1, Supplementary Material online). Given the diversity of this family, these differences are likely to be related to specializations for differing substrates.

FIG. 3.

Alignment of the F-box only protein 39. The two domains—F-box and leucine rich repeat—are indicated. This gene showed specific acceleration in the human branch and deceleration in the dog branch. Replacements that are specific to the human branch are shown.

FIG. 3.

Alignment of the F-box only protein 39. The two domains—F-box and leucine rich repeat—are indicated. This gene showed specific acceleration in the human branch and deceleration in the dog branch. Replacements that are specific to the human branch are shown.

Olfactory Receptors OR2A4 and OR4K5

Olfactory receptors (ORs) form part of a very large family of proteins in which each receptor is able to detect several odorants but some odorants are detected only by a combination of several receptors (Buck and Axel 1991; Zhao et al. 1998; Mombaerts 1999; Malnic et al. 2004). ORs are classified into families, and each family is subdivided into subfamilies. The proteins classified in the same subfamilies have similar structures and can detect similar odorants. Several studies have suggested that the evolution of ORs is strongly affected by PS (Gilad et al. 2003; Nielsen et al. 2005).

In our data set of accelerated genes, we found that OR2A4 (family 2, subfamily A) and OR4K5 (family 4, subfamily K) showed evolutionary rate acceleration in the human branch (supplementary file 2, S7 and S8, Supplementary Material online, respectively). Both genes showed high dN/dS values, 0.9813 and 0.9989, respectively. Although these ratios are close to 1, several lines of evidence indicate that they are not pseudogenes. First, they are not classified as such in the Horde OR database (Safran et al. 2003). Second, they show evidence of PS according to the BS test (P = 0.0274 for OR2A4 and P = 0.0094 for OR4K5), although they do not remain significant following FDR correction. Finally, the pattern of amino acid substitutions is not random, there being an excess of radical replacements in the second extracellular domain in both human receptors. These replacements may have an adaptive function related to the binding of odorant molecules.

Glutamate Receptor Subunit 3A

The glutamate receptor (GluR) subunit 3A (GRIN3A) belongs to the GluR family, involved in the mediation of excitatory synaptic transmission. It has been hypothesized that due to their function, this class of receptors may play a key role in cognitive functions (Riedel et al. 2003). There are two classes of GluRs: ionotropic GluRs and metabotropic GluRs. Ionotropic receptors, which include GRIN3A, are ion channels, whereas metabotropic receptors act in several intracellular signaling pathways. GluRs have been related to brain dysfunction in humans, for example, in Alzheimer's disease (Gong et al. 2009). Goto et al. (2009) analyzed evolutionary changes in GluRs using human and chimpanzee orthologous sequences and found that the dN/dS ratios were smaller than the ratios in genome-wide values, indicating that there are strong functional constraints acting on GluRs. GRIN3A was the member showing the greater divergence between humans and chimpanzees. They found a fixed substitution (D17G) in humans that implied the loss of a N-myristoylation site that, they hypothesized, could lead to changes in synaptic plasticity.

Here, we found GRIN3A had significant dN/dS acceleration in the human branch and in the alignment, one can observe seven human-specific replacements (including the D17G reported by Goto et al. 2009), all of which are radical (supplementary file 2, S9, Supplementary Material online). The replacements included the gain of three positively charged, and one negatively charged, amino acids. Comparison with the chimpanzee sequence showed that three of the seven substitutions predated the human–chimpanzee split, whereas four had become fixed after the separation from chimps. Considering the very strong conservation of this protein in mammals, it is tempting to speculate that these replacements might have functional consequences. Surprisingly, despite all the replacements being radical, the BS test of PS was not significant.

Discussion

Understanding how natural selection drives gene functional diversification across different species and lineages is a key issue in biology. The availability of the full set of genes from several mammalian genomes provides an unprecedented opportunity to study this question at a global scale. Here, we have derived a high-quality set of orthologous complete coding sequence alignments from six mammalian species to obtain a first estimate of the proportion of mammalian genes that show substantial branch-specific selection pressure variations. An important finding of this work is that this proportion is very high (24.5%), indicating that the strength of selection acting on a particular gene often can change substantially at different time periods and in different lineages. This is even more relevant considering that we worked with a relatively well-conserved set of genes, excluding lineage-specific gene duplicates, which are the genes generally associated with processes of functional diversification over short timescales (Ohno 1970; Scannell and Wolfe 2008; Farre and Alba 2009). Our results are in agreement with the previously described overdispersion of the molecular clock (Gillespie 1989; Bedford et al. 2008). Our procedure was designed to discriminate between events of evolutionary rate acceleration and deceleration. Acceleration was associated with an increased number of radical amino acid replacements, and we found that membrane proteins were particularly prone to show lineage-specific evolutionary rate variations. This is in agreement with previous findings by Jordan et al. (2001), who found the largest number of deviations in the rate of change in proteins that had functions at the periphery of the cell and which mediated interactions with the environment. Branch-specific rate acceleration was associated with an excess of replacements involving change of the polarity state and gain or loss of charged residues. Such replacements are subject to increased selective pressure (Yampolsky et al. 2005), which indicate that they often modify protein function. We described several examples of such radical changes in extracellular regions of receptor proteins, which could potentially affect interaction with ligands.

Only a relatively low fraction of the branches showing significant acceleration of dN and dN/dS (11–18% for the highest quality genomes) showed signs of PS according to the BS test (Zhang et al. 2005). Therefore, most accelerated branches are likely to be the result of relaxed purifying selection, probably reflecting partial loss of protein functionality. In such a scenario, the rate of fixation of nonsynonymous substitutions will increase due to negative selective coefficients being closer to 0. However, we also have to consider that, according to the authors, the BS test is conservative, so PS might have been underestimated. We observed that genes with PS detected by the BS test tended to have higher dN/dS than the average gene, which was not the case for genes with significantly accelerated branches using our method (supplementary file 2, S4, Supplementary Material online). This may imply that adaptive changes in very slowly evolving genes might be difficult to detect by the BS method. Several nervous system receptors that showed branch-specific acceleration in our data set belong to this class. One example was the GluR subunit 3A, which is an overall very well-conserved protein that nevertheless contained seven human-specific radical amino acid replacements (to zero nonradical ones) (supplementary file 2, S9, Supplementary Material online). The BS test of PS was negative for the human branch even with no false discovery rate correction.

In spite of the popularity of the BS test to detect PS, the truth is that our ability to detect bona fide events of PS using interspecific sequence comparisons is still limited. For example, the BS method assumes that PS has favored repeated amino acid changes in a short period of time, but it has been argued that this situation might be quite rare except for immune response genes that coevolve with the pathogen (Hughes 2007). A recent study on a set of vertebrate vision genes is particularly illustrative, as it showed that the BS method had a low probability of detecting positions that have been experimentally found to drive functional changes (Nozawa et al. 2009). In some cases, a single amino acid change might suffice to give rise to a new phenotype that is adaptive. A well-known example is a single amino acid replacement in the melanocortin-1 receptor of Florida beach mice that provides a more cryptic skin color patterning (Hoekstra et al. 2006). In other cases, even when the original adaptation is driven by a single amino acid change, subsequent compensatory mutations, for example, to restore protein stability, might result in a burst of nonsynonymous substitutions, causing deviations from the molecular clock (DePristo et al. 2005).

A relatively large number of genes in our data set showed branch-specific evolutionary rate deceleration. Evolutionary rate deceleration has been much less studied than evolutionary rate acceleration, and the functional implications of a dN/dS decrease are less clear. A general tendency toward evolutionary rate deceleration has been observed in several gene families. For example, in vertebrate hemoglobins (Goodman et al. 1975; Czelusniak et al. 1982), and in eukaryotic calmodulin (Baba et al. 1984), the rate of accumulation of nonsynonymous substitutions, when compared with synonymous substitutions, has been found to be much higher in the early stages of the protein's history than later on. This has been interpreted as resulting from a period of rapid changes in which the protein adapts to its basic function followed by a period of stabilization and increased functional constraints. Indirect evidence that this could be a widespread phenomenon is provided by the observation that widely distributed, ancient genes tend to evolve more slowly than phylogenetically restricted, younger genes (Doolittle et al. 1986; Alba and Castresana 2005, 2007; Toll-Riera et al. 2009), indicating that the evolution of the former is, in general, currently more constrained.

In the present study, we used a set of genes enriched in well-conserved core functions. Even so, one quarter of them showed branch-specific evolutionary rate deviations. In 15.5% of genes, we detected branch-specific evolutionary rate deceleration. The functional constraints on these genes thus appear to have increased in one or more branches. How can this be interpreted? One possibility is that it reflects cooption of residues for novel protein functionalities. In this case, no change in the sequence would be required to become involved in a new function (in contrast to the classical PS model, which would be associated with rate accelerations), but the changes would occur in other interacting elements, constraining the evolution of the first protein. We can imagine, for example, that several residues present in protein A become important for a new, advantageous, interaction with protein B, which has undergone a mutation that allows it to interact with protein A. As a result, the constraints on the residues in protein A, which are required for the interaction with protein B will increase, resulting in a decrease in dN/dS in protein A. Indeed, what is adaptive is the interaction between A and B (and not merely the changes in B), and this might lead to opposite observations in relation to the changes in selective pressure in the two interacting proteins.

This study provides a first look at the degree of variability in gene evolutionary rates in multiple mammalian branches as a result of selection. We have shown that many instances of evolutionary rate acceleration, which may be associated with lineage-specific functional modifications, are not identified by the widely used BS test for PS. The sequencing of a larger number of mammalian genomes and, more importantly, improvement of gene annotation quality, will provide increased resolution in studies of mammalian gene evolution. At the same time, experimental investigation will be required to better understand the functional consequences of amino acid replacements in protein sequences and to develop more refined models to detect selection from sequences.

Supplementary Material

Supplementary files 14 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).

We thank Jesse Shapiro for fruitful discussions during this work. We received financial support from Ministerio de Ciencia e Innovación (Formación Personal Universitario to M.T.-R, BIO2009-08160), Generalitat de Catalunya (Formació Personal Investigador to S.L.), and Fundació Institució Catalana de Recerca i Estudis Avançats (M.M.A).

References

Alba
MM
Castresana
J
Inverse relationship between evolutionary rate and age of mammalian genes
Mol Biol Evol
 , 
2005
, vol. 
22
 (pg. 
598
-
606
)
Alba
MM
Castresana
J
On homology searches by protein Blast and the characterization of the age of genes
BMC Evol Biol
 , 
2007
, vol. 
7
 pg. 
53
 
Arbiza
L
Dopazo
J
Dopazo
H
Positive selection, relaxation, and acceleration in the evolution of the human and chimp genome
PLoS Comput Biol
 , 
2006
, vol. 
2
 pg. 
e38
 
Ashburner
M
Ball
CA
Blake
JA
, et al.  . 
(20 co-authors)
Gene ontology: tool for the unification of biology. The Gene Ontology Consortium
Nat Genet
 , 
2000
, vol. 
25
 (pg. 
25
-
29
)
Ayala
FJ
Neutralism and selectionism: the molecular clock
Gene
 , 
2000
, vol. 
261
 (pg. 
27
-
33
)
Baba
ML
Goodman
M
Berger-Cohn
J
Demaille
JG
Matsuda
G
The early adaptive evolution of calmodulin
Mol Biol Evol
 , 
1984
, vol. 
1
 (pg. 
442
-
455
)
Bakewell
MA
Shi
P
Zhang
J
More genes underwent positive selection in chimpanzee evolution than in human evolution
Proc Natl Acad Sci U S A
 , 
2007
, vol. 
104
 (pg. 
7489
-
7494
)
Bedford
T
Hartl
DL
Overdispersion of the molecular clock: temporal variation of gene-specific substitution rates in Drosophila
Mol Biol Evol
 , 
2008
, vol. 
25
 (pg. 
1631
-
1638
)
Bedford
T
Wapinski
I
Hartl
DL
Overdispersion of the molecular clock varies between yeast, Drosophila and mammals
Genetics
 , 
2008
, vol. 
179
 (pg. 
977
-
984
)
Buck
L
Axel
R
A novel multigene family may encode odorant receptors: a molecular basis for odor recognition
Cell
 , 
1991
, vol. 
65
 (pg. 
175
-
187
)
Cardozo
T
Pagano
M
The SCF ubiquitin ligase: insights into a molecular machine
Nat Rev Mol Cell Biol
 , 
2004
, vol. 
5
 (pg. 
739
-
751
)
Clark
AG
Glanowski
S
Nielsen
R
, et al.  . 
(17 co-authors)
Inferring nonneutral evolution from human-chimp-mouse orthologous gene trios
Science
 , 
2003
, vol. 
302
 (pg. 
1960
-
1963
)
Conte
C
Ebeling
M
Marcuz
A
Nef
P
Andres-Barquin
PJ
Evolutionary relationships of the Tas2r receptor gene families in mouse and human
Physiol Genomics
 , 
2003
, vol. 
14
 (pg. 
73
-
82
)
Cruz
F
Vila
C
Webster
MT
The legacy of domestication: accumulation of deleterious mutations in the dog genome
Mol Biol Evol
 , 
2008
, vol. 
25
 (pg. 
2331
-
2336
)
Czelusniak
J
Goodman
M
Hewett-Emmett
D
Weiss
ML
Venta
PJ
Tashian
RE
Phylogenetic origins and adaptive evolution of avian and mammalian haemoglobin genes
Nature
 , 
1982
, vol. 
298
 (pg. 
297
-
300
)
DePristo
MA
Weinreich
DM
Hartl
DL
Missense meanderings in sequence space: a biophysical view of protein evolution
Nat Rev Genet
 , 
2005
, vol. 
6
 (pg. 
678
-
687
)
Doolittle
RF
Feng
DF
Johnson
MS
McClure
MA
Relationships of human protein sequences to those of other organisms
Cold Spring Harb Symp Quant Biol
 , 
1986
, vol. 
51
 
Pt 1
(pg. 
447
-
455
)
Farre
D
Alba
MM
Heterogeneous patterns of gene expression diversification in mammalian gene duplicates
Mol Biol Evol
 , 
2009
, vol. 
27
 (pg. 
325
-
335
)
Fischer
A
Gilad
Y
Man
O
Paabo
S
Evolution of bitter taste receptors in humans and apes
Mol Biol Evol
 , 
2005
, vol. 
22
 (pg. 
432
-
436
)
Flicek
P
Aken
BL
Beal
K
, et al.  . 
(59 co-authors)
Ensembl 2008
Nucleic Acids Res
 , 
2008
, vol. 
36
 (pg. 
D707
-
D714
)
Gibbs
RA
Rogersj
J
Katze
MG
, et al.  . 
(176 co-authors)
Evolutionary and biomedical insights from the rhesus macaque genome
Science
 , 
2007
, vol. 
316
 (pg. 
222
-
234
)
Gilad
Y
Bustamante
CD
Lancet
D
Paabo
S
Natural selection on the olfactory receptor gene family in humans and chimpanzees
Am J Hum Genet
 , 
2003
, vol. 
73
 (pg. 
489
-
501
)
Gillespie
JH
Lineage effects and the index of dispersion of molecular evolution
Mol Biol Evol
 , 
1989
, vol. 
6
 (pg. 
636
-
647
)
Go
Y
Satta
Y
Takenaka
O
Takahata
N
Lineage-specific loss of function of bitter taste receptor genes in humans and nonhuman primates
Genetics
 , 
2005
, vol. 
170
 (pg. 
313
-
326
)
Gojobori
J
Tang
H
Akey
JM
Wu
CI
Adaptive evolution in humans revealed by the negative correlation between the polymorphism and fixation phases of evolution
Proc Natl Acad Sci U S A
 , 
2007
, vol. 
104
 (pg. 
3907
-
3912
)
Gong
Y
Lippa
CF
Zhu
J
Lin
Q
Rosso
AL
Disruption of glutamate receptors at Shank-postsynaptic platform in Alzheimer's disease
Brain Res
 , 
2009
, vol. 
1292
 (pg. 
191
-
198
)
Goodman
M
Moore
GW
Matsuda
G
Darwinian evolution in the genealogy of haemoglobin
Nature
 , 
1975
, vol. 
253
 (pg. 
603
-
608
)
Goto
H
Watanabe
K
Araragi
N
, et al.  . 
(12 co-authors)
The identification and functional implications of human-specific “fixed” amino acid substitutions in the glutamate receptor family
BMC Evol Biol
 , 
2009
, vol. 
9
 pg. 
224
 
Hoekstra
HE
Hirschmann
RJ
Bundey
RA
Insel
PA
Crossland
JP
A single amino acid mutation contributes to adaptive beach mouse color pattern
Science
 , 
2006
, vol. 
313
 (pg. 
101
-
104
)
Hughes
AL
Looking for Darwin in all the wrong places: the misguided quest for positive selection at the nucleotide sequence level
Heredity
 , 
2007
, vol. 
99
 (pg. 
364
-
373
)
Jin
J
Cardozo
T
Lovering
RC
Elledge
SJ
Pagano
M
Harper
JW
Systematic analysis and nomenclature of mammalian F-box proteins
Genes Dev
 , 
2004
, vol. 
18
 (pg. 
2573
-
2580
)
Jordan
IK
Kondrashov
FA
Rogozin
IB
Tatusov
RL
Wolf
RL
Koonin
EV
Constant relative rate of protein evolution and detection of functional diversification among bacterial, archaeal and eukaryotic proteins
Genome Biol
 , 
2001
, vol. 
2
  
RESEARCH0053
Kawahara
Y
Imanishi
T
A genome-wide survey of changes in protein evolutionary rates across four closely related species of Saccharomyces sensu stricto group
BMC Evol Biol
 , 
2007
, vol. 
7
 pg. 
9
 
Kim
UK
Jorgenson
E
Coon
H
Leppert
M
Risch
N
Drayna
D
Positional cloning of the human quantitative trait locus underlying taste sensitivity to phenylthiocarbamide
Science
 , 
2003
, vol. 
299
 (pg. 
1221
-
1225
)
Kimura
M
Ota
T
On some principles governing molecular evolution
Proc Natl Acad Sci U S A
 , 
1974
, vol. 
71
 (pg. 
2848
-
2852
)
Kinnamon
SC
Cummings
TA
Chemosensory transduction mechanisms in taste
Annu Rev Physiol
 , 
1992
, vol. 
54
 (pg. 
715
-
731
)
Kosiol
C
Vinar
T
da Fonseca
RR
Hubisz
MJ
Bustamante
CD
Nielsen
R
Siepel
A
Patterns of positive selection in six Mammalian genomes
PLoS Genet
 , 
2008
, vol. 
4
 pg. 
e1000144
 
Lindemann
B
Chemoreception: tasting the sweet and the bitter
Curr Biol
 , 
1996
, vol. 
6
 (pg. 
1234
-
1237
)
Loytynoja
A
Goldman
N
Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis
Science
 , 
2008
, vol. 
320
 (pg. 
1632
-
1635
)
Mallick
S
Gnerre
S
Muller
P
Reich
D
The difficulty of avoiding false positives in genome scans for natural selection
Genome Res
 , 
2009
, vol. 
19
 (pg. 
922
-
933
)
Malnic
B
Godfrey
PA
Buck
LB
The human olfactory receptor gene family
Proc Natl Acad Sci U S A
 , 
2004
, vol. 
101
 (pg. 
2584
-
2589
)
Margoliash
E
Primary structure and evolution of cytochrome c
Proc Natl Acad Sci U S A
 , 
1963
, vol. 
50
 (pg. 
672
-
679
)
Mombaerts
P
Seven-transmembrane proteins as odorant and chemosensory receptors
Science
 , 
1999
, vol. 
286
 (pg. 
707
-
711
)
Nielsen
R
Bustamante
C
Clark
AG
, et al.  . 
(13 co-authors)
A scan for positively selected genes in the genomes of humans and chimpanzees
PLoS Biol
 , 
2005
, vol. 
3
 pg. 
e170
 
Nozawa
M
Suzuki
Y
Nei
M
Reliabilities of identifying positive selection by the branch-site and the site-prediction methods
Proc Natl Acad Sci U S A
 , 
2009
, vol. 
106
 (pg. 
6700
-
6705
)
Ohno
S
Evolution by gene duplication
 , 
1970
New York
Springer-Verlag
Ohta
T
Slightly deleterious mutant substitutions in evolution
Nature
 , 
1973
, vol. 
246
 (pg. 
96
-
98
)
Ohta
T
Ina
Y
Variation in synonymous substitution rates among mammalian genes and the correlation between synonymous and nonsynonymous divergences
J Mol Evol
 , 
1995
, vol. 
41
 (pg. 
717
-
720
)
R, 2007
R
R: a languange and environment for statistical computing
 , 
2007
Vienna (Austria)
R Fundation for Statistical Computing
Riedel
G
Platt
B
Micheau
J
Glutamate receptor function in learning and memory
Behav Brain Res
 , 
2003
, vol. 
140
 (pg. 
1
-
47
)
Ronquist
F
Huelsenbeck
JP
MrBayes 3: Bayesian phylogenetic inference under mixed models
Bioinformatics
 , 
2003
, vol. 
19
 (pg. 
1572
-
1574
)
Safran
M
Chalifa-Caspi
V
Shmueli
O
, et al.  . 
(16 co-authors)
Human gene-centric databases at the Weizmann Institute of Science: GeneCards, UDB, CroW 21 and HORDE
Nucleic Acids Res
 , 
2003
, vol. 
31
 (pg. 
142
-
146
)
Scannell
DR
Wolfe
KH
A burst of protein sequence evolution and a prolonged period of asymmetric evolution follow gene duplication in yeast
Genome Res
 , 
2008
, vol. 
18
 (pg. 
137
-
147
)
Schneider
A
Souvorov
A
Sabath
N
Landan
G
Gonnet
GH
Graur
D
Estimates of positive darwinian selection are inflated by errors in sequencing, annotation and alignment
Genome Biol Evol
 , 
2009
, vol. 
1
 (pg. 
114
-
118
)
Seo
TK
Kishino
H
Thorne
JL
Estimating absolute rates of synonymous and nonsynonymous nucleotide substitution in order to characterize natural selection and date species divergences
Mol Biol Evol
 , 
2004
, vol. 
21
 (pg. 
1201
-
1213
)
Shapiro
BJ
Alm
EJ
Comparing patterns of natural selection across species using selective signatures
PLoS Genet
 , 
2008
, vol. 
4
 pg. 
e23
 
Shi
P
Zhang
J
Yang
H
Zhang
YP
Adaptive diversification of bitter taste receptor genes in mammalian evolution
Mol Biol Evol
 , 
2003
, vol. 
20
 (pg. 
805
-
814
)
Skaar
JR
D'Angiolella
V
Pagan
JK
Pagano
M
SnapShot: F Box Proteins II
Cell
 , 
2009
, vol. 
137
 pg. 
1358
 pg. 
1358
 pg. 
e1351
 
Storey
JD
Tibshirani
R
Statistical significance for genomewide studies
Proc Natl Acad Sci U S A
 , 
2003
, vol. 
100
 (pg. 
9440
-
9445
)
Toll-Riera
M
Bosch
N
Bellora
N
Castelo
R
Armengol
L
Estivill
X
Alba
MM
Origin of primate orphan genes: a comparative genomics approach
Mol Biol Evol
 , 
2009
, vol. 
26
 (pg. 
603
-
612
)
Vilella
AJ
Severin
J
Ureta-Vidal
A
Heng
L
Durbin
R
Birney
E
EnsemblCompara GeneTrees: complete, duplication-aware phylogenetic trees in vertebrates
Genome Res
 , 
2009
, vol. 
19
 (pg. 
327
-
335
)
Waterston
R
Lindblad-Toh
HK
Birney
E
, et al.  . 
(222 co-authors)
Initial sequencing and comparative analysis of the mouse genome
Nature
 , 
2002
, vol. 
420
 (pg. 
520
-
562
)
Yampolsky
LY
Kondrashov
FA
Kondrashov
AS
Distribution of the strength of selection against amino acid replacements in human proteins
Hum Mol Genet
 , 
2005
, vol. 
14
 (pg. 
3191
-
3201
)
Yang
Z
PAML 4: phylogenetic analysis by maximum likelihood
Mol Biol Evol
 , 
2007
, vol. 
24
 (pg. 
1586
-
1591
)
Yang
Z
Nielsen
R
Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages
Mol Biol Evol
 , 
2002
, vol. 
19
 (pg. 
908
-
917
)
Zhang
J
Evolution of the human ASPM gene, a major determinant of brain size
Genetics
 , 
2003
, vol. 
165
 (pg. 
2063
-
2070
)
Zhang
J
Nielsen
R
Yang
Z
Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level
Mol Biol Evol
 , 
2005
, vol. 
22
 (pg. 
2472
-
2479
)
Zhao
H
Ivic
L
Otaki
JM
Hashimoto
M
Mikoshiba
K
Firestein
S
Functional expression of a mammalian odorant receptor
Science
 , 
1998
, vol. 
279
 (pg. 
237
-
242
)
Zuckerkandl
E
Pauling
L
Kasha
M
Pullman
B
Molecular diseases, evolution and genic heterogeneity
Horizons in biochemistry
 , 
1962
New York
Academic Press
(pg. 
189
-
225
)

Author notes

Associate editor: Hideki Innan