Abstract

Cellular metabolic networks depend on the products of many loci for proper functioning. These interrelationships between loci at the phenotypic level raise the question of whether they evolve independently. Previous research has demonstrated that in the anthocyanin pathway, which produces important secondary metabolites in plants, the genes encoding downstream enzymes show an increased rate of change at nonsynonymous sites when compared with upstream loci due to relaxed constraint. To test whether this pattern exists more broadly, we compared a set of 4 genes encoding enzymes of the carotenoid biosynthetic pathway, which produces a set of distinct colored secondary metabolites in plants. Comparisons between copies of phytoene desaturase, ζ-carotene desaturase, lycopene β-cyclase, and zeaxanthin epoxidase from 6 taxa indicate that the 3 upstream enzymes (phytoene desaturase, ζ-carotene desaturase, and lycopene β-cyclase) have similar proportions of codons under selective constraint, whereas the most downstream enzyme (zeaxanthin epoxidase) has more codons evolving under relaxed constraint. Overall, nonsynonymous substitution rates appear to be highest for zeaxanthin epoxidase, whereas synonymous substitution rates were highest for the intermediate enzyme lycopene β-cyclase. Analysis of codon bias shows that only lycopene β-cyclase may be under slight selection pressure for codon usage. Taken together, these results show that the enzymes of the carotenoid biosynthetic pathway are under strong selective constraint but that the most downstream enzyme is under the least constraint.

The spectacular recent improvements in high-throughput DNA sequencing technology are now providing molecular evolutionists with increasingly large data sets. Importantly, these data sets are both deeper within species and more widespread across taxa, allowing for broad comparisons encompassing whole genomes (Clark et al. 2007). These data sets, together with the concomitant advances in computational methods, now offer an unprecedented scale and sophistication for testing global hypotheses of major causative factors explaining the different rates of protein evolution seen across genes and taxa.

Our knowledge of the chromosomal location, structure, and patterns of expression for many genes has allowed these characteristics to be tested as explanatory variables in studies of protein evolution. To date, the single largest factor influencing a protein's evolutionary rate appears to be a composite measure of expression including codon bias, level of transcription, and amount of protein in the cell (Drummond et al. 2006). Beyond expression, other factors shown to have significant but smaller effects include intron number, genomic position, recombination rate, and essentiality (Pál et al. 2006; Larracuente et al. 2008).

These tests have explicitly considered what we may call structural factors leading to expression, but a gene's biochemical function in the greater cellular context after expression is of equal importance (Cork and Purugganan 2004). Gene products act and interact within a cell in multiple ways. Many proteins function together as enzymes in biochemical pathways, causing interrelationships between loci in the formation of phenotypes. These interactions can also be physical in that the gene products from different loci may participate in macromolecular complexes necessary to carry out a reaction. These groups of loci, however, may be participating in disparate areas of the metabolome, raising the question of whether any of these factors systematically affect a gene's evolutionary rate.

At a gross level, differences can be seen between the mean levels and variation of ω, the ratio of nonsynonymous (dN) to synonymous (dS) mutation rates, across different functional categories (Clark et al. 2007). When taking into account the interactions between gene products, the number of interactions a protein has and its position in the interaction network appear to lead to a decreased dN (Fraser et al. 2002; Hahn and Kern 2005; but see Jordan et al. 2003). Although these studies give broad overviews, they have not looked specifically at how the structure of a metabolic pathway may have influenced protein evolution rates.

The most comprehensive examples to date of molecular evolution studies that explicitly take into account pathway structure have used Drosophila and the anthocyanin biosynthetic pathway in angiosperms. In Drosophila, Flowers et al. (2007) have found that genes at branch points of glucose catabolism show evidence of divergence between species driven by positive selection, which they interpret as selection acting on the parts of the pathway that control metabolic flux rates. More recently, an overall analysis by Greenberg et al. (2008) has confirmed that pathway architecture does appear to affect protein evolutionary rates, although the main finding of this study was that core metabolic enzymes and those that share metabolites evolve very slowly.

An alternative system for these studies is the plant anthocyanin biosynthetic pathway developed by Rausher and colleagues. In the first of 3 studies, broad comparisons between Zea, Antirrhinum, and Ipomoea showed that upstream enzymes in this pathway have reduced rates of nonsynonymous substitution when compared with the downstream enzymes (Rausher et al. 1999). A second study examining multiple species within Ipomoea found the same pattern of increased dN for downstream enzymes and suggested relaxed constraint on downstream enzymes to be the cause, as changes to these enzymes would be less pleiotropic and/or have lesser effects on flux through the pathway (Lu and Rausher 2003). This conclusion was confirmed in the most recent work by Rausher et al. (2008), which showed that within Ipomoea trifida the downstream enzymes in this pathway are under relaxed constraint. Taken together, the Drosophila and anthocyanin studies suggest that pathway structure is another global factor influencing protein evolution rates.

Our goal in this study was to test for the pattern seen in the anthocyanin pathway in another plant system, the carotenoid biosynthetic pathway. The carotenoid biosynthetic pathway provides many parallels to the anthocyanin pathway: the products of both pathways are used for both conserved, housekeeping functions in green tissues (Cuttriss and Pogson 2004; Koes et al. 2005), as well as derived functions important for coloring reproductive structures (Bradshaw and Schemske 2003; Clegg and Durbin 2003; Zufall and Rausher 2003; Howitt and Pogson 2006). In addition, the nutritional importance of carotenoids has led to extensive characterization of the biosynthetic genes from several species, allowing broad comparisons to be made. Here we use a sequence data set of 4 genes from 6 species to ask whether downstream genes in the carotenoid pathway show increased rates of evolution and examine factors that may cause these differences.

Materials and Methods

Sequences and Phylogenetic Tree Construction

In higher plants, there are 10 enzymes in the central carotenoid biosynthetic pathway that convert geranylgeranyl pyrophosphate to the end products neoxanthin or lutein. In this study, we focused on the neoxanthin branch (Figure 1), selecting the enzymes phytoene desaturase (Pds; note that enzyme names and gene symbols vary among organisms), ζ-carotene desaturase (Zds), lycopene β-cyclase (Lcy-B), and zeaxanthin epoxidase (Zep) for analysis. These genes were chosen because there is evidence for ancestral and derived duplication of the other loci in the pathway (Cunningham 2002; DellaPenna and Pogson 2006; Galpaz et al. 2006; Just et al. 2007). Although Lcy-B also appears to have undergone an ancestral duplication, inclusion of this gene was justified based on prior work that allowed for recognition of each ortholog (Livingstone K, unpublished data).

Figure 1

Partial schematic of the carotenoid biosynthetic pathway in higher plants. This pathway shows the formation of the first C40 carotenoid, phytoene, and subsequent conversions to neoxanthin (not shown are some intermediates and the branch at lycopene which makes lutein). The gene symbols used for the enzymes are as follows: phytoene synthase, Psy; phytoene desaturase, Pds; ζ-carotene desaturase, Zds; carotenoid isomerase, Crtiso; lycopene β-cyclase, Lcy-B; β-carotene hydroxylase, βOH; zeaxanthin epoxidase, Zep; neoxanthin synthase, Nxs. (modified from Hirschberg 2001 and Howitt and Pogson 2006).

Figure 1

Partial schematic of the carotenoid biosynthetic pathway in higher plants. This pathway shows the formation of the first C40 carotenoid, phytoene, and subsequent conversions to neoxanthin (not shown are some intermediates and the branch at lycopene which makes lutein). The gene symbols used for the enzymes are as follows: phytoene synthase, Psy; phytoene desaturase, Pds; ζ-carotene desaturase, Zds; carotenoid isomerase, Crtiso; lycopene β-cyclase, Lcy-B; β-carotene hydroxylase, βOH; zeaxanthin epoxidase, Zep; neoxanthin synthase, Nxs. (modified from Hirschberg 2001 and Howitt and Pogson 2006).

Solanum lycopersicum (tomato) was chosen to be the representative species based on extensive previous research characterizing each enzyme (Bramley 2002; Galpaz et al. 2006). We were able to obtain representative complementary DNA sequences for each of the tomato enzymes listed above from GenBank. We then identified similar genes through BLAST searches (TBLASTX) against all plant gene indices from the Institute for Genomic Research (Lee et al. 2004), PlantGDB (Dong et al. 2005), and GenBank databases. Homology of the tomato and database sequences was determined through percent identities and the length of the match between sequences. In addition, we used sequences that were published in studies of the carotenoid enzymes from Chrysanthemum (Kishimoto and Ohmiya 2006) and Daucus carota (Just et al. 2007).

The selected sequences were initially trimmed down to the coding sequences and then translated using BIOEDIT 7.0.9.0 (Hall 1999). We next created alignments of the edited peptide sequences (Hall 2004) using CLUSTALX 2.0.5 (Larkin et al. 2007). We obtained predicted chloroplast leader sequences from the CHLOROP 1.1 server (Emanuelsson et al. 1999), and these results were combined with subjective analysis to demarcate the portion of the alignment that appeared to represent the leader sequences. The DNA sequences corresponding to the predicted leader sequences were then removed, and manual adjustment of aligned DNA sequences was performed as necessary. We also created a concatenated sequence for each species by combining DNA sequence alignments. We constructed a maximum likelihood phylogeny of the 6 species analyzed using the concatenated DNA sequences and the program DNAML, a part of the PHYLIP package (Felsenstein 2008). Sequences from Oryza sativa were used as an out-group, and confidence in the phylogeny was assessed by analyzing 1000 bootstrap replicates.

Comparing Codon Evolution Models to Determine Selective Constraint

We fit 3 models of molecular evolution to the data for each gene using the CODEML program of the PAML suite according to published parameters (Yang 2007). The first model, M0, is a null model that assumes all positions across a given set of sequences have the same levels of dN and dS, and these values and their ratio (ω = dN/dS) are estimated. The parameter ω is a good indicator of protein level selection, with 0 < ω < 1 indicating stabilizing/purifying selection, ω = 1 indicating neutral evolution, and ω > 1 indicating directional/positive selection. The M1a model uses 2 site classes: Class 0 is a proportion of codons with 0 < ω0 < 1 and the remainder of codons form the other class with fixed ω1 = 1. Model M1a estimates both the proportion of Class 0 sites and their ω0. The fit of model M0 versus model M1a can be evaluated by the likelihood ratio test, comparing twice the difference in log likelihoods with a χ2 distribution whose degrees of freedom equals the difference in the number of parameters from each model (Yang 2007). A significantly better fit for model M1a indicates that there are codons evolving neutrally, as well as codons under selective constraint. Model M2a further divides the codons into 3 site classes with 0 < ω0 < 1, ω1 = 1, and ω2 ≥ 1 and estimates both the proportions and variable ω values. Again, the relative fits of model M1a and model M2a can be compared by a likelihood ratio test and accepting model M2a over M1a provides evidence that there is a subset of codons that have experienced positive selection.

To test whether differences in the estimated parameter values between genes were significant, the methods of Lu and Rausher (2003) and Streisfeld and Rausher (2008) were implemented. Briefly, this method uses model M0, but instead of using the program's estimates of ω, fixed values of ω are supplied, and the resulting likelihoods are compared to the model with the original estimate of ω. Values of ω between the values estimated for each gene are used to determine whether an intermediate value of ω exists that is not significantly better than the original estimates for each gene. If such a value exists, then the original estimated values of ω for each gene are said to be not significantly different. Alternatively, if no such intermediate value exists, then the values of ω for the 2 genes are said to be significantly different. Finally, pairwise dN and dS values were also calculated for each gene using the CODEML program from the PAML 4.0 suite (Yang 2007).

Codon Bias Estimation

The effective number of codons (ENC) was calculated using the methods of Wright (1990) as implemented in DNASP 4.50.1 (Rozas et al. 2003). Because codon bias is often associated with GC content at the third positions across codons (GC3), a graphical comparison of ENC versus GC3 was used to control for this possible bias. The observed values of ENC versus GC3, along with a curve describing the expected values, were graphed using the Nc-plot technique (Wright 1990).

Results

Sequences and Alignments

We were able to find a large number of sequences for each enzyme from a diverse set of taxa. Many of these sequences were discarded, however, because they did not contain the majority of the coding region. In addition, this analysis required that each alignment includes the same set of species, so we pruned the set of putative orthologs to a final set including S. lycopersicum, Capsicum annuum, Gentiana lutea, Chrysanthemum x morifolium, Citrus spp., and D. carota (Table 1). We were also able to obtain a complete set of sequences from O. sativa, and these were used as out-group sequences during phylogenetic tree construction. Most of these sequences were generated as part of published studies; however, several were direct depositions into the database without peer review. Although these sequences were not verified, it was assumed that they contained no more error than the sequences associated with publications and that this error rate was low relative to interspecific differences. In no case were putative paralogs of Psy, Pds, or Zep discovered in the databases. It is still possible, however, that species-specific duplications of these loci exist and that they have not been discovered despite directed searches in the case of extensively studied taxa such as S. lycopersicum.

Table 1

Database accession number (GI) and subsequences (nt) of genes selected for analysis

Gene
 
Pds
 
Zds
 
Lcy-B
 
Zep
 
Species GI nt GI nt GI nt GI nt 
Solanum lycopersicum 387886 645–1928 6470254 299–1801 1006672 271–1611 154432876 250–1959a 
Citrus spp. 18073983 385–1668 18073985 266–1768 14550424 577–1917 17402596 226–1941a 
Capsicum annuum 17950 325–1608 6006400 208–1710 999440 419–1759 1673405 238–1947a 
Gentiana lutea 18146804 448–1731 6681691 421–1923 117935906 361–1701 6681687 255–1952a 
Daucus carota 79155624 558–1841 79155661 314–1816 79154898 588–1931a 79155189 438–2150a 
Chrysanthemum x morifolium 87299438 354–1637 87299444 444–1946 87299422 252–1592 87299446 289–2001a 
Gene
 
Pds
 
Zds
 
Lcy-B
 
Zep
 
Species GI nt GI nt GI nt GI nt 
Solanum lycopersicum 387886 645–1928 6470254 299–1801 1006672 271–1611 154432876 250–1959a 
Citrus spp. 18073983 385–1668 18073985 266–1768 14550424 577–1917 17402596 226–1941a 
Capsicum annuum 17950 325–1608 6006400 208–1710 999440 419–1759 1673405 238–1947a 
Gentiana lutea 18146804 448–1731 6681691 421–1923 117935906 361–1701 6681687 255–1952a 
Daucus carota 79155624 558–1841 79155661 314–1816 79154898 588–1931a 79155189 438–2150a 
Chrysanthemum x morifolium 87299438 354–1637 87299444 444–1946 87299422 252–1592 87299446 289–2001a 
a

Some internal nucleotides removed due to indels.

Alignments of the translated coding sequence for each gene showed an amino terminal region with low conservation, followed by a region of high similarity. These genes are encoded by the nucleus, but the enzymes function in the chloroplast, so this pattern could be explained by the presence of chloroplast targeting signal sequences. The CHLOROP server was used to provide evidence for such leaders and approximate the length of the leader sequence for each gene. The strength of predictions and lengths of the leader sequences were extremely variable: CHLOROP predicted each copy of Pds had a leader, but the lengths varied from 7 to 70 amino acids. In other cases, no leader was predicted for 1 to several of the species in an alignment. Thus, CHLOROP predictions were used as a guide, but subjectivity was also employed to determine where sequence alignments were trimmed. In addition, because the alignments must contain the same number of base pairs for each sequence, all sequences were trimmed at the same point in the alignment on the 3′ ends. A small amount of internal sequence, corresponding to 9–10 codons for Lcy-B and 6–11 codons for Zep, was removed because of indel polymorphisms. The amount of the S. lycopersicum coding sequence used was 73% for Pds, 85% for Zds, 89% for Lcy-B, and 84% for Zep. All alignments are available as Supplementary Materials.

The phylogeny constructed from the concatenated sequence alignment (Figure 2) shows evolutionary relationships between the selected species that are in accordance with known phylogenetic data (Soltis P and Soltis D 2004). Bootstrap analysis revealed strong support for this tree, suggesting that true orthologs had been obtained from each species.

Figure 2

Phylogram of the 6 species included in the study obtained from maximum likelihood analysis of the concatenated gene sequences. Numbers above the branches are the bootstrap percentages. Oryza sativa was used as the out-group.

Figure 2

Phylogram of the 6 species included in the study obtained from maximum likelihood analysis of the concatenated gene sequences. Numbers above the branches are the bootstrap percentages. Oryza sativa was used as the out-group.

Sequence Evolution

In order to explore patterns of sequence variation, we fit different models of molecular evolution to each set of sequences. Comparisons of these models showed that M1a provided a better fit for each gene when compared with M0 (P < 0.01) but that model M2a was not a better fit than M1a for any of the genes (Table 2). Rejecting model M2a in favor of model M1a indicates that there is no support for positive selection on any of the codons across all 4 genes. Instead, model M1a includes only 2 classes of sites, a class under strong purifying selection (0 < ω0 < 1) and another class evolving neutrally (ω1 = 1), and both the proportion of codons under purifying selection (p0) and the estimates of ω0 from model M1a are plotted for each gene in Figure 3. This analysis shows that the estimated proportions of codons under purifying selection are high overall, but this proportion appears lower for Zep. The estimates of the strength of purifying selection for these codons, however, do not seem to vary systematically with position in the pathway.

Table 2

Parameter estimates and tests of selection using PAML.

Gene M0
 
M1a
 
M2a
 
lnLa ωb lnLa ω0p0χ2,lnLa ω0p0p2χ2,
Pds −4387.5 0.051 −4382.6 0.043 0.967 P < 0.01 −4382.6 0.043 0.967 0.000 P > 0.05 
Zds −5293.1 0.052 −5227.0 0.029 0.939 P < 0.001 −5227.0 0.029 0.939 0.000 P > 0.05 
Lcy-B −4866.8 0.041 −4826.5 0.031 0.945 P < 0.001 −4826.5 0.031 0.945 0.000 P > 0.05 
Zep −7143.8 0.107 −6990.7 0.050 0.831 P < 0.001 −6990.7 0.050 0.831 0.000 P > 0.05 
Gene M0
 
M1a
 
M2a
 
lnLa ωb lnLa ω0p0χ2,lnLa ω0p0p2χ2,
Pds −4387.5 0.051 −4382.6 0.043 0.967 P < 0.01 −4382.6 0.043 0.967 0.000 P > 0.05 
Zds −5293.1 0.052 −5227.0 0.029 0.939 P < 0.001 −5227.0 0.029 0.939 0.000 P > 0.05 
Lcy-B −4866.8 0.041 −4826.5 0.031 0.945 P < 0.001 −4826.5 0.031 0.945 0.000 P > 0.05 
Zep −7143.8 0.107 −6990.7 0.050 0.831 P < 0.001 −6990.7 0.050 0.831 0.000 P > 0.05 
a

Log likelihood of model.

b

Parameter estimate assuming a single dN/dS ratio per gene.

c

Estimated dN/dS for proportion of codons (p0) under purifying selection; other codons assumed to be evolving neutrally.

d

Estimated proportion of codons under purifying selection.

e

Test of M1a versus M0.

f

Estimated dN/dS for proportion of codons (p0) under purifying selection.

g

Estimated proportion of codons under purifying selection.

h

Estimated proportion of codons under positive selection.

i

Test of M2a versus M1a.

Figure 3

Percentage of codons with 0 < ω < 1 (open circles) and the corresponding ω (closed squares) as a function of enzyme position in the carotenoid biosynthetic pathway. Values shown are estimates from model M1a calculated by CODEML (Yang 2007).

Figure 3

Percentage of codons with 0 < ω < 1 (open circles) and the corresponding ω (closed squares) as a function of enzyme position in the carotenoid biosynthetic pathway. Values shown are estimates from model M1a calculated by CODEML (Yang 2007).

Although model M1a is a better fit to the data than model M0, model M0 has still been shown to be a way to test whether there are significant differences in evolutionary rates between genes in similar analyses (Lu and Rausher 2003; Streisfeld and Rausher 2008). In this study, we tested whether pairwise estimates of an overall ω were different by fixing ω values for each gene in model M0 and then testing whether these values still fit the data well (P > 0.05). We found that a value of ω = 0.047 could be substituted for the estimated value of ω from model M0 for Pds, Zds, and Lcy-B without a significant difference in the model's fit, indicating that these 3 genes were not evolving at significantly different rates. When these 3 genes were compared with Zep, however, it was found that a value of ω = 0.07 could not be substituted for the estimated value for any of the 4 genes (P < 0.05), indicating that Pds, Zds, and Lcy-B showed an evolutionary pattern significantly different from Zep.

The difference between Zep and the set of the other 3 genes seen in the M0 analyses with fixed ω could be due to 1 of 2 differences. As shown in the model M1a analyses, these genes are fit best by a model with 2 classes of codons (under purifying and neutral evolution). Thus, overall differences seen when using a single ω for a gene must be due to differences in either the relative proportions of these 2 classes or varying levels of purifying selection. As shown in Figure 3, the most noticeable difference between Zep and the other 3 genes is in the proportion of codons under purifying selection, making this the most likely explanation for the overall differences seen in the M0 analyses.

To further investigate this pattern, we also calculated all 15 pairwise estimates of dN. These data show that the highest level of nonsynonymous substitutions for each comparison is seen with Zep (Figure 4a). In addition, for all but the closest comparison, S. lycopersicum and C. annuum (Figure 4a, bottom line), the values of dN are more tightly grouped for Pds and Zds and more variable for the downstream genes Lcy-B and Zep. Although the set of all comparisons is not independent, the overall trend across all comparisons is that the highest substitution rate observed is always Zep; therefore, each of the sets of independent contrasts would be expected to provide the same results.

Figure 4

Pairwise estimates of nucleotide substitution rates as a function of enzyme position in the carotenoid biosynthetic pathway. Each set of connected points represents 1 of 15 comparisons between species pairs from the 6 species analyzed. Results are given for (A) dN and (B) dS.

Figure 4

Pairwise estimates of nucleotide substitution rates as a function of enzyme position in the carotenoid biosynthetic pathway. Each set of connected points represents 1 of 15 comparisons between species pairs from the 6 species analyzed. Results are given for (A) dN and (B) dS.

To determine whether this pattern for dN could be explained by increased mutation rates for the downstream genes, we also examined the values for pairwise dS comparisons (Figure 4b). Interestingly, these comparisons reveal a distinct pattern, with generally higher and more variable substitution rates at Lcy-B than Zep. If overall mutation rates were higher for downstream genes, then these 2 graphs would be similar. The differences between the observed patterns of dS and dN suggest that there are not higher overall mutation rates at Zep; therefore, this cannot be an explanatory factor for the higher rate of dN for this gene. Taken together, these data suggest that the majority of codons in these 4 genes are under similar levels of strong purifying selection but that the fraction of codons evolving neutrally is higher in the most downstream enzyme (Zep).

Codon Bias

ENC is a measure of the extent of codon usage, ranging from 20, where codon bias is at a maximum and only 1 codon is used for each amino acid, to 61, where there is no codon bias and all codons are used. However, codon usage can also be affected by factors other than usage bias, such as mutation bias. To distinguish between these alternative explanations, we graphed ENC as a function of GC3, along with the expectation of ENC given GC3 (Figure 5). Overall, the enzymes are tightly clustered in a narrow range of GC3 content (Figure 5 inset). On closer inspection, the only visible trend is that the points for Lcy-B are more tightly clustered than any other enzyme, and in general, they lie furthest from the expected values.

Figure 5

Graph of the ENC versus the percent GC at third codon positions. The expectation of the ENC under the assumption of no selection on codon usage is given by the solid curve.

Figure 5

Graph of the ENC versus the percent GC at third codon positions. The expectation of the ENC under the assumption of no selection on codon usage is given by the solid curve.

Discussion

Our work parallels analyses by Rausher and colleagues of the enzymes that produce anthocyanins, an important class of secondary metabolites in plants (Rausher et al. 1999, 2008; Lu and Rausher 2003). These previous studies revealed that genes encoding downstream enzymes have an accelerated rate of nonsynonymous substitution when compared with upstream loci, in both broad (monocot–eudicot) and more restricted (within Ipomoea) comparisons, because of relaxed constraint on downstream enzymes. Although this pattern was not explicitly addressed in the study of the Drosophila glycolytic pathway (Flowers et al. 2007), data included in that paper could also be interpreted as showing an increase in dN for enzymes in the linear portion of the pathway from glucose to pyruvate. Together, these studies suggest a general pattern of protein evolution influenced by position in a metabolic pathway, along with the causative factor of relaxed selective constraint on the downstream loci. We approached these questions using another pathway that produces colored secondary metabolites in plants and a taxonomic range intermediate to the 3 anthocyanin studies.

Our analysis confirms the pattern that the most downstream locus tends to evolve more quickly. In general, all 4 enzymes are under similarly strong levels of purifying selection over most of their sequence, a finding that is commonplace in genomic comparisons (Clark et al. 2007; Foxe et al. 2008). Each gene does, however, have codons that appear to be evolving neutrally. This fraction of codons appears to be higher for Zep than the other 3 loci, indicating that relaxed selective constraint over a greater portion of Zep is the most likely explanation for its increased rate of protein evolution, as opposed to positive selection for novel enzymatic activities (Copley 2003) and/or physical associations (Winkel 2004).

When the substitutions are divided into nonsynonymous and synonymous substitutions, it is apparent that there are differences between these 2 classes. For nonsynonymous substitutions, there is a trend toward increases in the level and variation of dN for at least the 2 most terminal enzymes (Lcy-B and Zep), although there appears to be a higher level of dS seen for the penultimate enzyme analyzed (Lcy-B). The differences between dN and dS argue against a simple explanation of higher mutation rates for downstream loci.

The results of Drummond et al. (2006) suggest that the observed variation in evolutionary rates between these loci may be due to factors influencing expression. We tested these loci for evidence of selection on codon usage, but our analysis indicates that of the 4 genes, only Lcy-B shows evidence for at most mild selective constraint for codon usage. Lcy-B is also the gene with the highest rate of synonymous substitutions, which is counterintuitive: it is expected that greater selection for codon bias would cause a decrease in synonymous substitution rates, but this same apparent anomaly was seen in the anthocyanin study (Lu and Rausher 2003). It is intriguing to note that in both cases, the genes were the only duplicated genes included in the analysis. Overall, however, the general lack of variation amongst species and between genes implies that codon bias has remained low and relatively stable for genes of the carotenoid biosynthetic pathway, both within and across species.

The expression profiles of these genes do vary, however, because of the different roles carotenoids play in plants. On the one hand, carotenoids are important ancillary members of the light harvesting complex, and studies in developing tobacco chloroplasts, where carotenoids are first needed, have demonstrated similar levels of coordinated expression (Woitsch and Römer 2003). On the other hand, the regulation of carotenoid biosynthetic genes is also coordinated in organs where carotenoid hyperaccumulation occurs, such as petals and fruit. Although coordinately regulated in these tissues, the specific constellation of upregulated genes varies, leading to differences in carotenoid accumulation profiles (Bramley 2002; Kato et al. 2004; Kishimoto and Ohmiya 2006). Zeaxanthin is an important component of the photosynthetic apparatus of green tissues but is not present in all species’ flowers and fruits (Goodwin 1980), leaving open the possibility that increases in dN for Zep could be due to reduced expression in the organs of some species.

Supplementary Material

Supplementary material can be found at http://www.jhered.oxfordjournals.org/.

Funding

Trinity University Summer Undergraduate Research Fellowship to S.A.

We thank Brian Malpede for critical readings and 2 anonymous reviewers for suggestions that strengthened the manuscript.

References

Bradshaw
H
Schemske
D
Allele substitution at a flower colour locus produces a pollinator shift in monkeyflowers
Nature
 , 
2003
, vol. 
426
 (pg. 
176
-
178
)
Bramley
P
Regulation of carotenoid formation during tomato fruit ripening and development
J Exp Bot
 , 
2002
, vol. 
53
 (pg. 
2107
-
2113
)
Clark
AG
Eisen
MB
Smith
DR
Bergman
CM
Oliver
B
Markow
TA
Kaufman
TC
Kellis
M
Gelbart
W
Iyer
VN
, et al.  . 
Evolution of genes and genomes on the Drosophila phylogeny
Nature
 , 
2007
, vol. 
450
 (pg. 
203
-
218
)
Clegg
M
Durbin
M
Tracing floral adaptations from ecology to molecules
Nat Rev Genet
 , 
2003
, vol. 
4
 (pg. 
206
-
215
)
Copley
S
Enzymes with extra talents: moonlighting functions and catalytic promiscuity
Curr Opin Chem Biol
 , 
2003
, vol. 
7
 (pg. 
265
-
272
)
Cork
JM
Purugganan
MD
The evolution of molecular genetic pathways and networks
Bioessays
 , 
2004
, vol. 
26
 (pg. 
479
-
484
)
Cunningham
FX
Regulation of carotenoid synthesis and accumulation in plants
Pure Appl Chem
 , 
2002
, vol. 
74
 (pg. 
1409
-
1417
)
Cuttriss
A
Pogson
B
Davies
KM
Carotenoids
Plant pigments and their manipulation
 , 
2004
Oxford (UK)
Blackwell Publishing
(pg. 
57
-
91
)
DellaPenna
D
Pogson
B
Vitamin synthesis in plants: tocopherols and carotenoids
Annu Rev Plant Biol
 , 
2006
, vol. 
57
 (pg. 
711
-
738
)
Dong
Q
Lawrence
CJ
Schlueter
SD
Wilkerson
MD
Kurtz
S
Lushbough
C
Brendel
V
Comparative plant genomics resources at PlantGDB
Plant Physiol
 , 
2005
, vol. 
139
 (pg. 
610
-
618
)
Drummond
DA
Raval
A
Wilke
CO
A single determinant dominates the rate of yeast protein evolution
Mol Biol Evol
 , 
2006
, vol. 
23
 (pg. 
327
-
337
)
Emanuelsson
O
Nielsen
H
Von Hejine
G
CHLOROP, a neural network-based method for predicting chloroplast transit peptides and their cleavage sites
Protein Sci
 , 
1999
, vol. 
8
 (pg. 
978
-
984
)
Felsenstein
J
Phylip (phylogeny inference package). Version 3.6. Distributed by the author
 , 
2008
Seattle (WA)
Department of Genome Sciences, University of Washington
Flowers
J
Sezgin
E
Kumagai
S
Duvernell
D
Matzkin
L
Schmidt
PS
Eanes
WF
Adaptive evolution of metabolic pathways in Drosophila
Mol Biol Evol
 , 
2007
, vol. 
24
 (pg. 
1347
-
1354
)
Foxe
J
Dar
V
Zheng
H
Nordborg
M
Gaut
B
Wright
SI
Selection on amino acid substitutions in Arabidopsis
Mol Biol Evol
 , 
2008
, vol. 
25
 (pg. 
1375
-
1383
)
Fraser
HB
Hirsh
AE
Steinmetz
LM
Scharfe
C
Feldman
MW
Evolutionary rate in the protein interaction network
Science
 , 
2002
, vol. 
296
 (pg. 
750
-
752
)
Galpaz
N
Ronen
G
Khalfa
Z
Zamir
D
Hirschberg
J
A chromoplast-specific carotenoid biosynthesis pathway is revealed by cloning of the tomato white-flower locus
Plant Cell
 , 
2006
, vol. 
18
 (pg. 
1947
-
1960
)
Goodwin
T
The biochemistry of the carotenoids. Volume 1: Plants
 , 
1980
London
Chapman & Hall
Greenberg
AJ
Stockwell
SR
Clark
AG
Evolutionary constraint and adaptation in the metabolic network of Drosophila
Mol Biol Evol
 , 
2008
, vol. 
25
 (pg. 
2537
-
2546
)
Hahn
M
Kern
AD
Comparative genomics of centrality and essentiality in three eukaryotic protein-interaction networks
Mol Biol Evol
 , 
2005
, vol. 
22
 (pg. 
803
-
806
)
Hall
B
Comparison of the accuracies of several phylogenetic methods using protein and DNA sequences
Mol Biol Evol
 , 
2004
, vol. 
22
 (pg. 
792
-
802
)
Hall
TA
BIOEDIT: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT
Nucleic Acids Symp Ser
 , 
1999
, vol. 
41
 (pg. 
95
-
98
)
Hirschberg
J
Carotenoid biosynthesis in flowering plants
Curr Opin Plant Biol
 , 
2001
, vol. 
4
 (pg. 
210
-
218
)
Howitt
C
Pogson
B
Carotenoid accumulation and function in seeds and non-green tissues
Plant Cell Environ
 , 
2006
, vol. 
29
 (pg. 
435
-
445
)
Jordan
IK
Wolf
YI
Koonin
E
No simple dependence between protein evolution rate and the number of protein-protein interactions: only the most prolific interactors tend to evolve more slowly
BMC Evol Biol
 , 
2003
, vol. 
3
 pg. 
1
 
Just
B
Santos
C
Fonseca
M
Boiteux
L
Oloizia
B
Simon
PW
Carotenoid biosynthesis structural genes in carrot (Daucus carota): isolation, sequence-characterization, single nucleotide polymorphism (SNP) markers and genome mapping
Theor Appl Genet
 , 
2007
, vol. 
114
 (pg. 
693
-
704
)
Kato
M
Ikoma
Y
Matsumoto
H
Sugiura
M
Hyodo
H
Yano
M
Accumulation of carotenoids and expression of carotenoid biosynthetic genes during maturation in citrus fruit
Plant Physiol
 , 
2004
, vol. 
134
 (pg. 
824
-
837
)
Kishimoto
S
Ohmiya
A
Regulation of carotenoid biosynthesis in petals and leaves of chrysanthemum (Chrysanthemum morifolium)
Physiol Plant
 , 
2006
, vol. 
128
 (pg. 
436
-
447
)
Koes
RWD
Verweij
W
Quattrocchio
F
Flavonoids: a colorful model for the regulation and evolution of biochemical pathways
Trends Plant Sci
 , 
2005
, vol. 
10
 (pg. 
236
-
242
)
Larkin
M
Blackshields
G
Brown
N
Chenna
R
McGettigan
P
McWillian
H
Valentin
F
Wallace
IM
Wilm
A
Lopez
R
, et al.  . 
CLUSTALW and CLUSTALX version 2.0
Bioinformatics
 , 
2007
, vol. 
23
 (pg. 
2947
-
2948
)
Larracuente
A
Sackton
T
Greenberg
A
Wong
A
Singh
N
Clark
AG
Evolution of protein-coding genes in Drosophila
Trends Genet
 , 
2008
, vol. 
24
 (pg. 
114
-
123
)
Lee
Y
Tsai
J
Sunkara
S
Karamycheva
S
Pertea
G
Sultana
R
Antonescu
V
Chan
A
Cheung
F
Quackenbush
J
The TIGR gene indices: clustering and assembling EST and known genes and integration with eukaryotic genomes
Nucleic Acids Res
 , 
2004
, vol. 
33
 (pg. 
D71
-
D74
)
Lu
Y
Rausher
MD
Evolutionary rate variation in anthocyanin pathway genes
Mol Biol Evol
 , 
2003
, vol. 
20
 (pg. 
1844
-
1853
)
Pál
C
Papp
B
Lercher
M
An integrated view of protein evolution
Nat Rev Genet
 , 
2006
, vol. 
7
 (pg. 
337
-
348
)
Rausher
MD
Lu
Y
Meyer
K
Variation in constraint versus positive selection as an explanation for evolutionary rate variation among anthocyanin genes
J Mol Evol
 , 
2008
, vol. 
67
 (pg. 
137
-
144
)
Rausher
MD
Miller
R
Tiffin
P
Patterns of evolutionary rate variation among genes of the anthocyanin biosynthetic pathway
Mol Biol Evol
 , 
1999
, vol. 
16
 (pg. 
266
-
274
)
Rozas
J
Sánchez-Delbarrio
JC
Messegyer
X
Rozas
R
DNASP, DNA polymorphism analyses by the coalescent and other methods
Bioinformatics
 , 
2003
, vol. 
19
 (pg. 
2496
-
2497
)
Soltis
P
Soltis
D
The origin and diversification of angiosperms
Am J Bot
 , 
2004
, vol. 
91
 (pg. 
1614
-
1626
)
Streisfeld
MA
Rausher
MD
Relaxed constraint and evolutionary rate variation between basic helix-loop-helix floral anthocyanin regulators in Ipomoea
Mol Biol Evol
 , 
2008
, vol. 
24
 (pg. 
2816
-
2826
)
Winkel
B
Metabolic channeling in plants
Annu Rev Plant Biol
 , 
2004
, vol. 
55
 (pg. 
85
-
107
)
Woitsch
S
Römer
S
Expression of xanthophyll biosynthetic genes during light-dependent chloroplast differentiation
Plant Physiol
 , 
2003
, vol. 
132
 (pg. 
1508
-
1517
)
Wright
F
The “effective number of codons” used in a gene
Gene
 , 
1990
, vol. 
87
 (pg. 
23
-
29
)
Yang
Z
PAML 4: phylogenetic analysis by maximum likelihood
Mol Biol Evol
 , 
2007
, vol. 
24
 (pg. 
1586
-
1591
)
Zufall
R
Rausher
MD
The genetic basis of a flower color polymorphism in the common morning glory (Ipomoea purpurea)
J Hered
 , 
2003
, vol. 
94
 (pg. 
442
-
448
)

Author notes

Corresponding Editor: John Stommel