Evolution of a Genome-Encoded Bias in Amino Acid Biosynthetic Pathways Is a Potential Indicator of Amino Acid Dynamics in the Environment

Overcoming the stress of starvation is one of an organism’s most challenging phenotypic responses. Those organisms that frequently survive the challenge, by virtue of their fitness, will have evolved genomes that are shaped by their specific environments. Understanding this genotype–environment–phenotype relationship at a deep level will require quantitative predictive models of the complex molecular systems that link these aspects of an organism’s existence. Here, we treat one of the most fundamental molecular systems, protein synthesis, and the amino acid biosynthetic pathways involved in the stringent response to starvation. These systems face an inherent logical dilemma: Building an amino acid biosynthetic pathway to synthesize its product—the cognate amino acid of the pathway—may require that very amino acid when it is no longer available. To study this potential “catch-22,” we have created a generic model of amino acid biosynthesis in response to sudden starvation. Our mathematical analysis and computational results indicate that there are two distinctly different outcomes: Partial recovery to a new steady state, or full system failure. Moreover, the cell’s fate is dictated by the cognate bias, the number of cognate amino acids in the corresponding biosynthetic pathway relative to the average number of that amino acid in the proteome. We test these implications by analyzing the proteomes of over 1,800 sequenced microbes, which reveals statistically significant evidence of low cognate bias, a genetic trait that would avoid the biosynthetic quandary. Furthermore, these results suggest that the pattern of cognate bias, which is readily derived by genome sequencing, may provide evolutionary clues to an organism’s natural environment.


Introduction
Predicting or measuring the natural microenvironment of an organism is a complex and challenging task (Savageau 1983;Ward et al. 1998;Xu 2006). In contrast, sequencing an organism's genome has become routine, and the scientific community continues to sequence organisms from varied environments at an increasing pace, yet the vast majority of those organisms cannot be cultured by current methods, in part because their natural environment is unknown (Curtis and Sloan 2005;Xu 2006). Genomic data are clearly outpacing environmental data, but the sequences themselves may provide information about the environment from which they were taken. Recent statistical analyses of overall amino acid composition across organisms indicate that the environment is a major evolutionary influence (Chen et al. 2013;Moura et al. 2013). More specifically, the cognate bias hypothesis (Alves and Savageau 2005) suggests that nutritional stress places evolutionary pressure on the composition of the enzymes in the amino acid biosynthetic pathways. (The "cognate amino acid" refers to the amino acid produced by the corresponding biosynthetic pathway, and the "cognate bias" refers to the number of cognate amino acids in the enzymes of the corresponding biosynthetic pathway relative to their number in the proteome of the organism.) The bacterial response to nutritional stress, the wellknown stringent response, has been studied for over five decades. In 1961, it was shown that amino acid starvation inhibited the accumulation of stable RNA, and the locus responsible was christened RC, or the RNA Control gene (Stent and Brenner 1961), later renamed relA in reference to the relaxed response of the mutant phenotype (Friesen et al. 1974). Today, it is clear that the stringent response is a general reaction to stress and starvation that is conserved across species (Draper 1996;Harris et al. 1998;van der Biezen et al. 2000;Chatterji and Ojha 2001), and is characterized by increased levels of guanosine tetraphosphate (ppGpp) (Cashel 1969;Wendrich et al. 2002), which has at least 75 known effects in Escherichia coli, including decreased rRNA and tRNA transcription, decreased growth rate, and increased expression of the biosynthetic enzymes for many amino acids (Draper 1996;Magnusson et al. 2005;Potrykus and Cashel 2008;Dalebroux and Swanson 2012). However, the stringent response may not be enough to protect the cell from the shock of starvation. Part of the response is the upregulation of amino acid biosynthetic pathways, but the situation creates a potential catch-22. The missing amino acid could hold up the construction of the enzymes needed to create more of their cognate amino acid, a stalemate from which the cell might not recover. A logical evolutionary defense would be to remove the vulnerability-to bias the biosynthetic enzymes against the use of their cognate amino acid.
Our first hint that organisms might evolve such a molecular mechanism came in the early days of protein sequencing when tryptophane synthetase, an enzyme in the tryptophan biosynthetic pathway, was sequenced and the alpha subunit was found to contain no tryptophan (Yanofsky 1988). Hardly conclusive, it took many years and a new technology to begin testing the hypothesis more generally. Additional support for what we termed the cognate bias hypothesis was obtained from genomic data for three well-studied bacteria: Two enteric bacteria, E. coli and Salmonella enterica (serovar Typhimurium), and the soil-dwelling bacteria Bacillus subtilis (Alves and Savageau 2005). The results suggested a bias toward fewer cognate amino acids in certain amino acid biosynthetic pathways and a profile of bias across amino acids that differed between the two groups, suggesting a possible correlation with the organisms' ecological niches. The cognate bias hypothesis was recently tested and confirmed using a few organisms from the other domains of life: Methanococcus jannaschii, Saccharomyces cerevisiae, and Homo sapiens (Perlstein et al. 2007;Meiler et al. 2012). However, the number of organisms examined to date is too few to provide for a meaningful statistical test.
Complete genome sequences are now available for more than 1,800 microorganisms. Thus it is an opportune time to go beyond correlations and comprehensively examine the cognate bias hypothesis, with an analysis of the underlying molecular mechanism using a kinetic model of protein synthesis and amino acid starvation, in order to provide a stronger molecular link between the genomic evidence of amino acid composition and the environmental dynamics of amino acid availability.
In particular, we are interested in the environment's effect on the proteome, in terms of three classes of amino acids: 1) For amino acids the organism is never required to create, it could dispense with the biosynthetic pathway entirely (e.g., as occurs with obligate intracellular parasites that receive the amino acid from their host, and with humans that receive their essential amino acids from their diet); 2) for amino acids the organism is always required to create, it could dispense with the regulation and synthesize the amino acids constitutively, without regard for cognate bias; and 3) for all other environments, regulation would be advantageous, and a compensating cognate bias would likely exist for amino acids that experience the most frequent and extreme fluctuations in the organism's natural environment. In order to study the last case more rigorously, we have created a generic model of amino acid biosynthesis and regulation in response to sudden starvation. The results indicate that there are two distinctly different outcomes-partial recovery or full failure-that are dictated by the cognate bias, the number of cognate amino acids in the corresponding biosynthetic pathway relative to the average number of that amino acid in the proteome. Furthermore, we mine the abundant genomic data that are currently available and reveal statistical evidence of cognate bias. The results describe how the natural environment of an organism-or more precisely, the stresses and strains to which the organism is exposed-may leave a genetic fingerprint.

Model of Translation during Starvation
Before describing a larger model of amino acid biosynthesis and regulation, we present a model of translation that accounts for the effect of starvation, or more specifically for a decreased supply of the cognate amino acid of interest. Mathematical models of the translational process have been created, but they are too detailed to be tractable within our larger model of regulation (Gromadski and Rodnina 2004;Elf and Ehrenberg 2005;Gilchrist and Wagner 2006;Fluitt et al. 2007;Shah and Gilchrist 2010;Brackley et al. 2011Brackley et al. , 2012. Here, we consider a somewhat simpler approach. Translation proceeds through three wellknown steps (Alberts et al. 2002). First, the ribosome, which is attached to the mRNA and a growing peptide chain, exposes an empty A-site. Second, a charged, aminoacyl-tRNA (aa-tRNA) fills the empty A-site. In fact, the aa-tRNA may leave the A-site, returning the system to the first step. Finally, the ribosome incorporates the amino acid into the peptide chain, discharges the uncharged tRNA, and advances. The sequence of steps is reminiscent of an enzymatic reaction and has in fact been modeled as such in the past by Elf and Ehrenberg (2005). However, they consider the incorporation of an amino acid in isolation-a single instance of those three steps-and use the result to represent the average rate of amino acid consumption in a larger model of global protein synthesis. Here, we consider the sequential incorporation of every amino acid in the protein as a longer sequence of enzymatic reactions, and consequently the combined impact of starvation at multiple steps in the synthesis of a single protein.
When modeling translation as an enzymatic reaction, the ribosome-mRNA complex represents the catalyst. The transition between the first and second steps is treated as a reversible reaction between a complex with an empty A-site and an intermediate complex with an aa-tRNA in the A-site. The transition between the second and third steps-the addition of the amino acid to the peptide chain and the advancement of the ribosome-is treated as the essentially irreversible step in the enzymatic reaction. Figure 1 depicts the transitions between states. We assume that the concentration of aa-tRNA is coupled to the concentration of the free amino acid pool, so that when the supply of amino acid decreases, the concentration of aa-tRNA decreases proportionally, and the rate of its incorporation into the A-site decreases. In the limiting case where none of the particular amino acid is available, the ribosome remains stalled with an empty A-site. We assume that the longer the ribosome is stalled, the more likely it is to prematurely terminate translation and dissociate from the mRNA, which is consistent with models of nonsense errors produced by frameshift, ribosomal drop-off, or release factors (Jørgensen and Kurland 1990;Gilchrist and Wagner 2006;Shah and Gilchrist 2010) and models of ribosome rescue by transfer-messenger RNA (Roche and Sauer 1999;Moore and Sauer 2007;Keiler 2008).
The complete translation of a protein with N amino acids can be modeled as a linked chain of N enzymatic reactions, as shown in figure 2. The entire process begins with initiation, when the ribosome binds to the mRNA, and ends with the release of the completed protein, which requires the presence of release factors. If all of the amino acid concentrations are high, the rate of abortion becomes vanishingly small, and the entire process can be viewed as an unbranched pathway, implying that at steady state, what goes in must come out. Assuming the concentrations of ribosome and all ancillary factors are constant or saturating, then the steady-state rate of protein production, without starvation and abortion, is simply v out ¼ k in M, or proportional to the concentration of the specific mRNA. Nearly all models of biosynthetic gene circuits treat the rate of protein synthesis as proportional to the concentration of mRNA, and there is good evidence in bacterial literature to support this (Guet et al. 2008). On the other hand, if the supply of a particular amino acid is significantly decreased, then the ribosome will stall at each point that requires that amino acid (Pedersen 1984;Varenne et al. 1984;Sørensen et al. 1989), increasing the rate of abortion and slowing the overall rate of protein production. The exact rate of protein production can be determined by an inductive argument. In the simplest case, where the protein only requires a single amino acid that is in short supply, the process has a single branch point at step i, as shown in figure 1. It can be shown that at steady state, v out at step i is a function of v in and the limiting amino acid concentration, or where More intuitively, K m is the amino acid concentration for half-maximal velocity through step i. v in is a result of the proceeding steps, which in this case form an unbranched pathway, and is therefore k in M at steady state. The remaining steps also form an unbranched pathway, meaning that the overall rate of protein production is equal to the velocity through step i. For a protein that requires more than one of the limiting amino acid, v out at the first occurrence is identical to the case with one. v out at the second occurrence is equal to the output of the proceeding steps multiplied by the same attenuating factor. As such, for a protein that requires n of the limiting amino acid, the overall rate of protein production is where S is the concentration of the limiting amino acid. The implication of equation (3) is dramatic. Considering the average protein in E. coli contains approximately 300 amino acids, and assuming the 20 amino acids are equally represented, n is on the order of 15, making translation ultrasensitive to the limiting amino acid concentration. If the amino acid concentration drops below some critical threshold, K m , then the rate of translation will practically halt. The submodel above captures the essential features needed in our larger model of amino acid biosynthesis and regulation in the next section. One could conceivably add more detail, and attempt to account for such factors as tRNA abundance, specific constants for the binding of . Free ribosome R binds to mRNA M and initiates translation. At each step i, the required aa-tRNA S i is added to the ribosome-mRNA-peptide complex E i that already has a peptide chain of length i. The final complex E N is released by release factor F to create free ribosome R, mRNA M, and protein P. k in , rate of translational initiation; k out , rate of protein release. I i , X i , and k 1 -k 4 are described in figure 1.
Model of a single branch point in translation. At step i, the advancement of the ribosome is modeled by a single enzymatic reaction. The required aa-tRNA S i enters the A-site of the ribosome-mRNA-peptide complex E i , to create a new intermediate complex I i . The process is reversed if the aa-tRNA leaves the A-site. Otherwise, the amino acid is irreversibly linked to the growing peptide chain, advancing the complex to E i + 1 . In the absence of aa-tRNA, the complex can dissociate from the mRNA, creating the aborted complex X i . k 1 , rate constant of the appropriate aa-tRNA entering the A-site; k 2 , rate constant of the appropriate aa-tRNA leaving the A-site; k 3 , rate constant of amino acid i being incorporated into the growing peptide chain; k 4 , rate constant of the ribosome aborting and dissociating from the mRNA; v in , rate of ribosomes advancing to the current position; v out , rate of ribosomes proceeding to the next position. amino acid to various tRNAs and the binding of tRNA anticoding sites to codons in mRNA, salt and pH concentrations, or any other physical-chemcial aspect of the intracellular milieu that differs according to cell type (e.g., high pH in extreme halophilic archaea vs. low pH in halophobic bacteria). However, such detailed models would become so complex that analysis would be difficult if no precluded, most values of the parameters would not be available for most organisms, and even if these obstacles were overcome in a particular case, the results would not generalize to other systems.

Model of Amino Acid Biosynthesis and Regulation
Mathematical models of amino acid biosynthetic systems have been developed in the past, in many cases for specific systems, such as Trp biosynthesis in E. coli (Xiu et al. 2002;Alves and Savageau 2005;Elf and Ehrenberg 2005). Many of these models tend to be complex with idiosyncratic features that do not readily generalize to other systems, as discussed above for our submodel. For instance, the pathways may have different numbers of enzymes with very different kinetic properties. Rather, we require relatively simple models that nevertheless retain the essential generic character of amino acid biosynthetic systems, can be readily analyzed to make testable predictions, and can be used to elucidate general design principles. For example, models of inducible and repressible pathways, very similar to the one developed below, were used to make predictions regarding the coupling of expression in elementary gene circuits; the resulting predictions were confirmed experimentally in over 50 specific cases and the predicted coupling rules are now established as a general design principle Savageau 1996, 1997;Wall et al. 2003Wall et al. , 2004. Figure 3 depicts our model of amino acid biosynthesis, one that includes the transcription and translation of enzymes in the biosynthetic pathway, as well as the synthesis of the cognate amino acid. Feedback repression of the biosynthetic enzymes, which is a prominent control mechanism in bacteria , is also included, as is the ability to import amino acid from the external environment. As was shown in the previous section, the rate of translation of the biosynthetic enzymes depends on the concentration of the free cognate amino acid, which is also depleted by cellular demand.
The model is mathematically described by a conventional system of ordinary differential equations (ODEs): X 1 represents the concentration of mRNA that encodes some critical enzyme of the biosynthetic pathway. Transcription is dependent on the cognate amino acid concentration, X 3 , and is described by a rational function with a Hill number of g 13 , and a ratio between minimum and maximum rates ¼ V 1L =V 1H . The loss of X 1 is dominated by first-order degradation of mRNA. X 2 represents the concentration of the critical enzyme, which is assumed to be stable. As was shown in the previous section, the rate of translation, or protein production, is drastically affected by the limiting amino acid concentration, X 3 , and the exponent n is the number of cognate amino acids in the critical enzyme. The loss of X 2 is dominated by first-order dilution in an exponentially growing cell, and therefore 2 is equal to the growth rate constant, which is in turn affected by the availability of free amino acid, where M is the maximum growth rate constant when X 3 is in excess. The free cognate amino acid concentration, X 3 , can be increased by biosynthesis or import from an external supply, each represented by a positive term. The free cognate amino acid pool is MBE depleted by the cellular demand for amino acid, which we assume is dominated by protein synthesis. If the amino acids are quickly recycled from the aborted ribosome-peptide complex, then intuitively the rate of amino acid utilization is the rate of successful protein production, and the final term of equation (6) is therefore similar to the first term of equation (5). We assume that the exponent m is the average number of cognate amino acids in each protein of the expressed proteome, rather than n, the average number in the critical enzyme of the corresponding biosynthetic pathway. Furthermore, if the cell produces P proteins, each with an average number of cognate amino acids m, at a rate equal to that of the critical enzyme in the biosynthetic pathway, To simplify the analysis, and without loss of generality, the variables and parameters are normalized with respect to initial concentrations and a chosen time constant: The normalized system is described by equations (7-9): where A ¼ 1 M , B ¼ 3 X 80 M X 30 , and C ¼ 31 X 60 X 20 31 X 60 X 20 þ 32 X 70 . It should be noted that care is taken when normalizing the system. To create a well-controlled comparison between systems with different parameters, the terms were chosen to ensure that the corresponding reaction rates in each system were equal at the initial conditions. Simple inspection confirms that the gain and loss of each species at the initial conditions is unity, no matter what the parameter values.

Parameter Estimation
The mathematical model described by equations (7-9) contains nine parameters: A, B, C, g 13 , , k 13 , k 23 , m, and n. Of the nine, five are aggregates of other parameters. Table 1 lists the estimated parameter values. Where possible, the parameters are estimated based on published data for E. coli. In the remaining cases, reasonable estimates are made to reflect expected operating conditions. The aggregate parameter A is calculated based on published values. Similarly, B is based on published values and a reasonable estimate for P. C is chosen to reflect a heavily repressed biosynthetic pathway during growth in an initial state of amino acid abundance. k 13 and k 23 represent the critical thresholds of transcriptional and translational regulation, respectively, and are normalized relative to the initial cognate amino acid concentration, X 30 .
Their values are chosen so that the initial rate of transcription, based on X 30 , is near minimum but still within the regulatory regime, whereas the initial rate of global translation, also based on X 30 , is near maximum but still within the regulatory regime.

Dynamic Response to Starvation
We define the cognate bias as n À m, or the difference between the number of cognate amino acids in a critical enzyme of the corresponding biosynthetic pathway, n, and the average number in the proteome, m. If the critical enzyme of the pathway is compositionally similar to the rest of the proteome, then there is no bias, or n À m = 0. Low bias represents the case where there are relatively fewer cognate amino acids in the biosynthetic pathway, or n À m < 0, whereas high bias represents the case where there are more, or n À m 4 0. To examine the effect of cognate bias, various systems with different biases were computationally simulated in response to rapid and complete starvation: The normalized external supply of amino acid, x 7 , was decreased from 1 to 0 at = 0. The results are shown in figure 4. All of the systems immediately experience a rapid drop in free cognate amino acid and a commensurate rise in the mRNA that encodes the biosynthetic pathway. The response is expected, as the large cellular demand quickly depletes the free amino acid reserves. Systems with low (blue) or no (green) cognate bias compensate by derepression and stabilize at a new steady-state amino acid concentration. Systems with a slight high bias (red and light blue) recover more slowly, but eventually stabilize as well (see the final values in fig. 4C). Systems with slightly higher bias (pink and yellow) do not recover, but do stabilize at lower amino acid concentrations. Figure 4C and D clearly shows how the final steady-state concentration and response times vary with bias: The higher the bias, the slower the response and the lower the final steady-state concentration of free amino acid. Similar behavior is observed when the parameter values are varied-A between 5 and 100, B between 10 4 and 10 6 , and m between 4 and 16-indicating that the result is insensitive to the values of the parameters. Furthermore, the behavior changes significantly between n = 32 (yellow) and 36 (black), at which point the amino acid concentration does not appear to stabilize at all, but rather continues on a trajectory toward zero, suggesting the full failure of the system. However, the response time is extremely slow, and the large exponents magnify rounding errors, which may introduce uncertainty in the final steadystate values. Nevertheless, the concentrations do eventually reach zero, which is confirmed by the following analysis of the steady states.

Steady-State Concentrations after Starvation
At steady state, the derivatives that represent the changing concentrations vanish in equations (7-9). Also, complete starvation sends the external amino acid supply x 7 to zero. Simple inspection of the equations when dx i /d and x 7 are zero reveals that x 3 = 0 is always a solution, meaning that a steady-state amino acid concentration of zero is a possibility. Subsequent manipulation of equations (7-9)

2869
Evolution of a Genome-Encoded Amino Acid Bias . doi:10.1093/molbev/msu225 MBE yields the following equation, in terms of the free amino acid concentration x 3 and the estimated parameters of the system: There is no closed-form solution for x 3 given the general form of equation (10). However, when both sides of the equation, f left and f right , are plotted as shown in figure 5A, the intersections identify the values of x 3 , or the normalized steady-state concentrations, that satisfy equation (10). In figure 5A, f left is drawn for g 13 = 2, whereas f right is drawn for several different values of cognate bias, or n À m. Overall, there are six labeled intersections in figure 5A, and a numerical solver was used to verify and refine each of them, producing six potential steadystate concentrations for x 3 : 0.013, 0.047, 0.090, 0.12, 0.17, and 0.28. However, the number of intersections is not equal to the number of curves-for some values of bias, f left and f right clearly intersect at a single point; for other values, the curves intersect at two points or not at all. Figure 5A reveals three distinct cases of interest. When n 2m, the system has only one intersection, and therefore one steady state, in addition to the zero steady state that was previously identified. The eigenvalues of the system linearized near the steady states confirm, as the simulations indicated, that the positive steady state is stable whereas the steady state at zero is unstable. This implies only one possible outcome for n 2m: A low, albeit nonzero, stable steady-state amino acid concentration, which is considered safe. When n is slightly greater than 2m, but still below some threshold m c , the system has two intersections in figure 5A, or two nonzero steady states, in addition to the ever-present steady state at zero. The eigenvalues of the system linearized near these steady states confirm, as the simulations indicated, that the high steady state is stable, the intermediate steady state is unstable, and the zero steady state is stable. Thus, there are two possible outcomes: Recovery or full system failure. When n is much greater than 2m, or greater than the threshold m c , there are no intersections in figure 5A, but the steady state at zero is still stable, implying that the only possible outcome is full failure. In this final case, the high cognate bias is fatal. Indeed, the steady-state analysis confirms the results indicated by the dynamic simulations. The key determinant of survival is the cognate bias, or the relative values of n and m. In particular, the key measure is n À 2m, or the "critical bias." Values of n À 2m 0 are safe, whereas values of n À 2m 4 0 are potentially fatal. Furthermore, figure 5B shows that even when n < m c , and the system therefore stabilizes at a nonzero steady state amino acid concentration, the normalized growth rate / M = [x 3 /(x 3 + k 23 )] m varies with n, and the lower the bias, the higher the growth rate.

Statistical Evidence of Bias across Genomes
A biological design that would protect against full failure during starvation would be a low cognate bias for the critical enzymes of the amino acid biosynthetic pathways. Past work suggests that there is statistical evidence of low cognate bias in the pathways of some organisms, including B. subtilis, E. coli, and S. enterica (serovar Typhimurium) (Alves and Savageau 2005). However, that analysis used a different model and a different measure of bias based on relative percentage of amino acid composition. Our model presented here indicates that the key measure of bias should instead be based on relative number of cognate amino acids. Furthermore, there is now a substantially larger body of genomic data to mine.

MBE
To search for evidence of cognate bias in the genomes of sequenced prokaryotes, we utilized the MetaCyc pathway database (Caspi et al. 2011) and the UniProt protein database (UniProt Consortium 2014). In our model, n represents the number of cognate amino acids in some critical enzyme of the amino acid biosynthetic pathway. However, the identification of a critical enzyme is problematic. It could be a rate-limiting enzyme-and yet different enzymes may be rate limiting under different conditions. The selection could be based on other factors, including activity, half-life, normal concentration, or regulative capacity, each of which is relatively uncharacterized when compared with the wealth of sequence data. Furthermore, amino acid biosynthetic pathways are intertwined, which complicates the identification of a critical enzyme for a specific biosynthetic pathway. Nevertheless, it can be argued that the selective pressure for cognate amino acid bias applies to all of the regulated enzymes of the pathway-after all, the entire pathway must be upregulated in response to starvation-and the last enzyme in each pathway can be uniquely and easily identified. Using MetaCyc, we compiled a list of the last enzymes in all of the known pathways leading to one of the 20 fundamental amino acids, shown in supplementary table S1, Supplementary Material online. From UniProt, we downloaded the complete proteomes of 1,816 completely sequenced prokaryotes. Within each proteome, we searched for the biosynthetic enzymes identified in supplementary table S1, Supplementary Material online. If an enzyme was found, we used the number of cognate amino acids in the enzyme to represent n in our model. If more than one enzyme was found for the production of a particular amino acid, we assumed that the organism has multiple alternative pathways and the pathway with the lowest

2871
Evolution of a Genome-Encoded Amino Acid Bias . doi:10.1093/molbev/msu225 MBE number of cognate amino acids is the most resilient; therefore the lowest of the numbers was used to represent n. Finally, we counted the number of cognate amino acids in each protein of the proteome and used the average number to represent m. The cognate bias, as we have shown, is n À m, and the critical bias is n À 2m. The resulting data, used in subsequent analyses, are included in supplementary table S2, Supplementary Material online. Figure 6 displays histograms of the cognate biases measured for each of the 20 amino acids over all 1,816 proteomes. As expected, histograms of the critical biases are similar in shape and shown in supplementary figure S2, Supplementary Material online. There are obvious cases of extreme cognate bias in figure 6: Tryptophan, in almost every case, has a low bias, whereas Arginine, in almost every case, has a high bias. To statistically analyze the significance of the biases, we performed a sign test (see Materials and Methods), a nonparametric test that does not assume a particular population distribution and measures the probability that the values are drawn from a population with a median value of zero, or no bias. The results of the test are listed in table 2. Low P values indicate that the population is biased, and the sample median indicates whether it is biased high or low. The results indicate that there is significant statistical evidence of low cognate bias (P < 0.001) in the biosynthetic pathways of six amino acids: Asparagine, Tryptophan, Proline, Leucine, Serine, and Cysteine. All but one of the pathways-Glutamateshow statistically significant evidence of low, or safe, critical bias. Organisms that have a close phylogenetic relationship might be expected to have similar biases and might not be considered independent samples. However, we obtained essentially the same results for the UniProt reference proteomes, which have been selected specifically to provide a wide phylogenetic distribution (UniProt Consortium 2014). Moreover, when clustering on the basis of cognate bias, even organisms that have a close phylogenetic relationship split into different clusters, as shown in the following section.

Clustering of Cognate Bias Compared with Taxonomy
Without any prior knowledge, each protein of the proteome would naively be expected to contain equal amounts, or 5%, of each amino acid. However, it is widely know that this is not the case, and figure 7A depicts our calculation of the amino acid composition biases for each of the completely sequenced prokaryotes found in the UniProt database. The composition bias is measured as the average difference between the number of an amino acid in each protein of the proteome and the expected number, or 5% of the length of the protein. A positive number, or high composition bias, indicates a higher than expected number of a given amino acid in the proteins of the proteome; a low bias indicates a lower than expected number. Figure 7A shows that Cysteine, Tryptophan, Histidine, and Methionine are consistently underrepresented in the proteomes, whereas Leucine is consistently overrepresented. The remaining amino acids vary between a high and low composition bias, depending on the organism. The vertical ordering is sorted by the similarity of the organisms  (10) (dotted black) and f right , the value of the right side of equation (10) for n = 1-50 (gray), specifically n = 22 (dashed green), 29 (green), 31 (dark green), 32 (black), 33 (dark red), 35 (red), and 42 (dashed red). The average number of cognate amino acids in the proteins of the proteome, m, is 16. Cases where n < 2m (dashed green, green, and dark green) have a single stable, positive steady state, and are considered safe. Cases where 2m < n < m c (dark red) have multiple potential outcomes. Cases where n 4 m c (red and dashed red) have only one steady state at zero (not evident on the logarithmic axis), and are therefore fatal. (B) Normalized growth rate / M = [x 3 /(x 3 + k 23 )] m given the steady-state amino acid concentrations found for n = 1-33 (gray), specifically for n = 22, 29, 31, 32, and the coincident doublet at 33 (black).

2872
Fasani and Savageau . doi:10.1093/molbev/msu225 MBE across all 20 amino acids, and reveals large and small clusters with similar bias profiles. The adjacent bar indicates the taxonomic phylum of each organism, and in several cases the bias clusters roughly correspond to phyla, especially in the highly represented cases of Proteobacteria, Fimicutes, and Actinobacteria.
On the other hand, figure 7B depicts the measured cognate biases for each of the complete proteomes. A positive number, or high cognate bias, indicates a higher than expected number of a given amino acid in the final enzyme of the biosynthetic pathway; a low bias indicates a lower than expected number. Missing values indicate that none of the known enzymes, and presumably none of the known pathways, was found in the proteome. Note that the cognate bias, by definition, is measured with respect to the composition bias, and so it is remarkable, for example, that Tryptophan, which is already underrepresented in the proteome, tends to have an even lower number in the biosynthetic pathway. The vertical ordering of the cognate bias profiles, based on similarity, reveals fewer obvious clusters than the ordering based on composition bias, although a small cluster in the center includes E. coli and other organisms that can synthesize all 20 amino acids, and a larger cluster at the bottom includes proteomes with a very high Glutamate cognate bias. Likewise,

Discussion
Part of the cell's stringent response to starvation is the upregulation of amino acid biosynthetic pathways, and yet the stringent response might not be enough to protect the cell from the shock of rapid starvation. Cognate bias-relatively fewer cognate amino acids in the corresponding pathway synthesizing the amino acid-could avoid a potential catch-22, in which the emergency response requires the very amino acid that has disappeared from the environment. Past work (Alves and Savageau 2005) suggested that there is a low cognate bias in the pathways of some amino acids, that the bias tends to be lower for the key enzymes in the pathway, and that the profile of bias differs between E. coli, S. enterica (serovar Typhimurium), and B. subtillis-organisms from different environmental niches. However, the analysis used a different model and a different measure of bias based on relative percentage of amino acid composition, whereas our more detailed model indicates that the key measure of bias should be based on the relative number of cognate amino acids. To illustrate the difference, consider three proteins of 100, 200, and 400 amino acids, each containing two cognate amino acids. The first consists of 2% cognate amino acid, the second consists of 1%, and the third 0.5%. Based on relative percentages, the first protein has the highest bias, whereas the third has the lowest bias; but based on relative numbers of cognate amino acids, their bias is the same. The impact of the different definitions can be significant.
Dynamic simulations and a steady-state analysis of our model show that important aspects of the system depend on the cognate bias, the number of cognate amino acids in the corresponding biosynthetic pathway (n) relative to the number in the expressed proteome (m), or n-m. The lower the cognate bias, the faster the system responds to starvation and the higher the recovered concentration of the free amino acid pool will be, potentially creating a selective evolutionary pressure to lower the cognate bias. Furthermore, our results confirm that a key determinant of the cell fate is the cognate bias. The crucial measure, based on cognate bias, is the critical bias, or n À 2m. A low critical bias is always safe, whereas a very high critical bias is fatal. An ambiguous critical biasslightly high, but below some threshold-can lead to either recovery or failure, depending on the initial conditions and the dynamics of the system. The results suggest immediate predictions that can be experimentally tested. Although single amino acid starvation experiments have been performed in bacteria over the years (Venetianer 1969;Stephens et al. 1975;Shand et al. 1989;Parish 2003;Ohashi et al. 2008), they have not comprehensively tested all of the amino acids. We propose that samples taken from an unstressed culture, grown on rich medium, in steady-state exponential growth, could be used to inoculate a series of cultures grown on 20 different chemically defined media, where each medium has an excess of all amino acids except one. We predict new cultures that upregulate pathways with low critical bias will recover, whereas those that upregulate pathways with high critical bias may fail, or at least experience a long lag in recovery. Furthermore, the model is general enough to make predictions for a variety of organisms. Measures of critical bias should predict relative growth rates and differential recovery times between two organisms starving for the same amino acid or the same organism starving for different amino acids.
A key aspect of our analysis is the changing environment. In a broader context, three classes of amino acids likely correspond to different environmental effects on the proteome: For amino acids that are never required by the organism, the biosynthetic pathway can be dispensed with entirely. For amino acids that are always required by the organism, regulation can be dispensed with, and the amino acids synthesized constitutively without regard for cognate bias. For all other environments, regulation would be advantageous, and a compensating cognate bias would likely exist for amino acids that experience the most frequent and extreme fluctuations in the  ( 4 ) c a n d id a t e d iv  The clustering is based on the composition bias: The average difference between the number of an amino acid in each protein and the expected number, or 5% of the protein length. High composition bias (red) indicates that the amino acid is relatively overrepresented in the proteins of the proteome; low bias (green) indicates that the amino acid is relatively underrepresented. Each row describes the biases measured for a sequenced organism, and the vertical ordering is based on the similarity of the bias profile across all 20 amino acids, with rows closer to the top being more similar. Each column represents one of the 20 amino acids, and the horizontal ordering is based on the similarity of the bias values for that given amino acid over all of the organisms, with columns to the left being more similar. The phylum of each row, or organism, is plotted in a distinct color to the right. The number of organisms in each phylum is shown in parenthesis in the legend. (B) The clustering is based on the cognate bias: The difference between the average number of a given amino acid in each protein of the proteome and the number in the final enzyme of the amino acid biosynthetic pathway. A high cognate bias (red) represents a higher number in the pathway; a low bias (green) represents a lower number in the pathway. Missing values (white) indicate that the biosynthetic pathway was not found in the proteome. The vertical and horizontal ordering are based on similarity, as described in (A). The phylum of each row, or organism, is plotted to the right as in (A).