Abstract

Mutations are central to evolution, providing the genetic variation upon which selection acts. A mutation’s effect on the suitability of a gene to perform a particular function (gene fitness) can be positive, negative, or neutral. Knowledge of the distribution of fitness effects (DFE) of mutations is fundamental for understanding evolutionary dynamics, molecular-level genetic variation, complex genetic disease, the accumulation of deleterious mutations, and the molecular clock. We present comprehensive DFEs for point and codon mutants of the Escherichia coli TEM-1 β-lactamase gene and missense mutations in the TEM-1 protein. These DFEs provide insight into the inherent benefits of the genetic code’s architecture, support for the hypothesis that mRNA stability dictates codon usage at the beginning of genes, an extensive framework for understanding protein mutational tolerance, and evidence that mutational effects on protein thermodynamic stability shape the DFE. Contrary to prevailing expectations, we find that deleterious effects of mutation primarily arise from a decrease in specific protein activity and not cellular protein levels.

Introduction

The fitness landscape model for protein evolution, as first conceptualized by Smith in 1970 (Smith 1970) and generalized by others (Orr 2005), imagines evolution as a process by which a sequence moves by stochastic processes from its wild-type sequence through fitter and fitter sequences until the sequence reaches a local fitness optimum. The nature of the fitness landscape determines the dynamics of evolution and fundamentally shapes what is and is not possible in evolution. Much has been learned from theoretical studies and small-scale interrogations of real fitness landscapes (Eyre-Walker and Keightley 2007). However, we still lack a systematic, assumption-free, experimental determination of the distribution of fitness effects (DFE) for all mutations of a gene performing its native function in its native host. The situation is akin to having a small set of aerial photographs of a geographical area versus having comprehensive satellite coverage such as provided by Google Earth.

We sought to provide a comprehensive, quantitative description of a fitness landscape corresponding to a gene and its nearest neighbors in both DNA and protein sequence space (i.e., the set of all sequences that differ by a single bp, codon, or amino acid) but avoid or ameliorate current limitations of large-scale measurements of fitness. Growth competition experiments or experiments in which alleles are enriched based on a threshold for function are the current state of the art (Fowler et al. 2010; Araya et al. 2012; Deng et al. 2012; McLaughlin et al. 2012; Schlinkmann et al. 2012; Whitehead et al. 2012; Roscoe et al. 2013; Starita et al. 2013). To varying degrees such experiments offer a direct “head-to-head” comparison of alleles but suffer four significant limitations. First, most studies utilize nonnative reporter assays (e.g., phage display, cell surface display, and two-hybrid systems) in which the gene or gene fragment is removed from its native context and host and fused to another gene (but see Deng et al. 2012; Roscoe et al. 2013). Second, population size can affect the measured value of fitness due to stochastic effects. Third, these experiments have limited ability to measure fitness for low fitness alleles because such alleles are depleted during the course of the experiment. For example, Roscoe et al. (2013) were unable to reproducibly measure the fitness of ubiquitin point mutants with a fitness below approximately 40% of wild-type due to their rapid depletion in growth competition experiments. Thus, although such experiments tell us the location of valleys in the landscape, they cannot tell us anything about what the valleys look like. Fourth, the fitness measurements are subject to the extent and underlying form of genotype-by-environment interactions. For example, the fitness of an antibiotic resistance gene measured by a growth competition experiment will be a function of the arbitrary selective pressure used in the experiment (the antibiotic concentration). Alleles conferring resistance far above or below the level necessary for growth at one antibiotic concentration may show no fitness difference in that environment yet show significant differences at a different antibiotic concentration closer to their resistance limit. We desired to decouple fitness from genotype-by-environment interactions as much as possible to quantify the underlying landscape and thus better understand a gene’s intrinsic evolutionary potential and limitations.

A key determinant of the fitness landscape is what we are defining here as “gene fitness.” “Fitness” in the traditional biological definition refers to the extent to which an organism is adapted to or able to produce offspring in a particular environment (often measured as growth rate in the fitness landscape of bacteria). Instead, we are referring to “gene fitness” as a phenotypic signature—the suitability of a gene to provide a particular function. Thus, a gene fitness landscape might also be referred to as a phenotypic landscape. Although to a large extent this landscape reflects a protein functional landscape, we use the term “gene fitness” because our landscape encompasses mutational effects at the DNA, RNA and protein level and is a holistic metric of the ability of an allele to provide a particular function. However, when we average the gene fitness values of synonymous alleles to examine the effect of missense mutations, we refer to this as a protein fitness landscape. “Protein fitness” is similarly defined as the suitability of a protein to provide a particular function. Unless otherwise specified, use of the word fitness in relation to a specific allele refers to “gene fitness,” whereas its use in relation to a specific protein refers to “protein fitness.” These landscapes define the relationship between DNA/protein sequence and biological function. The more that function is related to growth rate in particular environment, the more direct the relationship between gene fitness and organismal fitness. For example, in the case of antibiotic resistance genes, gene/protein fitness (as measured by minimum inhibitory concentration [MIC] of the antibiotic), and organismal fitness (as measured by growth rate) correlate to some extent (Deris et al. 2013; Jacquier et al. 2013), provided the antibiotic concentration is at the appropriate level for comparing the alleles. Here, we present a comprehensive gene fitness landscape for the TEM-1 β-lactamase gene and a protein fitness landscape for the TEM-1 protein.

Results and Discussion

Fitness Landscape of TEM-1 β-Lactamase

We chose to measure the DFE of the TEM-1 β-lactamase gene, a convenient model for the study of evolution and the fitness effects of mutations (Salverda et al. 2010; Soskine and Tawfik 2010). TEM-1 is native to Escherichia coli as a plasmid-borne gene (Medeiros 1984), and we examined TEM-1 in this context. TEM-1 confers high resistance to penicillin antibiotics such as ampicillin (Amp). Thus, when E. coli cells bearing TEM-1 are challenged to grow in the presence of Amp, alleles conferring an enhanced ability to degrade the antibiotic will enrich. Thus, Amp resistance is a key determinant of organismal fitness in the presence of Amp (Bershtein et al. 2006; Weinreich et al. 2006; Jacquier et al. 2013), although assessing TEM-1 fitness by measuring Amp resistance does not capture organismal fitness differences not associated with antibiotic resistance. Previous partial characterizations of the DFE of TEM-1 (Bershtein et al. 2006; Deng et al. 2012; Jacquier et al. 2013) suffer several significant limitations. These studies did not characterize the relationship between sequence and fitness (Bershtein et al. 2006), used error-prone PCR to generate mutations that were heavily biased to A/T to C/G transitions (80%) (Bershtein et al. 2006; Jacquier et al. 2013), and/or focused on the characterization of high fitness alleles with more than one mutation and assumed additivity for predicting the effect of the individual mutations (Deng et al. 2012). In addition, fitness was either measured using either growth competition experiments (Deng et al. 2012), which suffer from limitations described in the Introduction, or in the coarse-grained manner of a MIC assay (Bershtein et al. 2006; Jacquier et al. 2013). MIC assays suffer the drawbacks of being low-throughput and low-resolution. Alleles with known mutations must be isolated and tested individually, and MICs are measured in discrete values (typically 2-fold increments). For example, the resolution of the MIC assay in a study of the amoxicillin resistance effects of 18% of the possible amino acid substitutions in TEM-1 was insufficient to capture the effects of synonymous mutations or to identify any beneficial mutations (Jacquier et al. 2013), both of which we readily achieve.

Here, we describe a synthetic biology approach to quantify fitness of TEM-1 in a single experiment that avoids or ameliorates the limitations of growth competition experiments and MIC assays and allows a comprehensive analysis of the DFE. A synthetic biology approach is by definition artificial in at least some aspects, but unlike several previous studies we measure the DFE of the gene in its native host and do not employ gene fusions or artificial reporters of fitness. Additionally, because TEM-1 increases the Amp resistance of E. coli cells over 1,000-fold, the combination of TEM-1 and Amp afforded the opportunity to determine the DFE over a wide range of fitness values. Our approach decouples genotype-by-environment interactions as far as Amp resistance is concerned. We quantify TEM-1’s underlying fitness landscape and thus its intrinsic evolutionary potential and limitations. This gene fitness landscape is a very significant determinant in the organismal fitness landscape for growth of the bacteria in the presence of Amp (Jacquier et al. 2013). However, the two types of landscapes are not equivalent.

We determined the DFE for 98.2% (2,536/2,583) of all point mutations (i.e., all 1-bp changes) and 83.9% (15,167/18,081) of all codon substitutions in the TEM-1 gene (fig. 1). The latter includes all 1-, 2-, and 3-bp changes of the 287 codons of TEM-1. We also determined the DFE for 95.6% (5,212/5,453) of the possible single amino acid substitutions in the corresponding TEM-1 protein (fig. 2). We excluded insertions and deletions (indels) from our analysis of the DFE, an important class of mutations that generally have more deleterious effects on fitness (Toth-Petroczy and Tawfik 2013). The source of TEM-1 variants was a previously described library (CCM-2) designed to contain all possible single codon substitutions in the TEM-1 gene (i.e., each codon position in the gene could be changed to any of the other 63 codons but each allele had only one position changed) (Firnberg and Ostermeier 2012). To measure gene fitness, we first partitioned the CCM-2 library into 13 partially overlapping sublibraries based on relative Amp resistance using a synthetic gene circuit that functions as a tunable bandpass genetic selection for Amp resistance (Sohka et al. 2009) (supplementary fig. S1, Supplementary Material online). Next, we performed deep sequencing on each of the sublibraries, counting how many times each allele appeared in each sublibrary and used these statistics to quantify each allele’s conferred antibiotic resistance or gene fitness (w) relative to TEM-1 (supplementary fig. S2, Supplementary Material online). We used the fitness effects of synonymous mutations to determine an upper limit on the error of our fitness measurements (supplementary fig. S3 and Supplementary Data, Supplementary Material online). Our method enables the accurate fitness quantification of any allele and avoids population size effects because the alleles are isolated on a plate. Unlike growth competition experiments, the probability of observing an allele in our experiment is predominantly independent of its conferred fitness. Additionally, the method decouples fitness from genotype-by-environment interactions, at least as far as the major environmental factor affecting fitness is concerned (i.e., the antibiotic concentration). We individually sequenced the alleles from 27 randomly selected colonies from two of the sublibraries and found these alleles’ gene fitness values were in the expected range (supplementary fig. S4, Supplementary Material online).

Distribution of gene fitness effects (DFE) of mutations in TEM-1. (A) The DFE of point mutations (i.e., 1-bp changes in the gene). (B) The DFE of all possible codon substitutions (i.e., all 1, 2, and 3 base changes in the 287 codons of TEM-1). Gene fitness values for conferring ampicillin resistance are presented on a log scale with 0 corresponding to the fitness of TEM-1. The contributions of synonymous (red), missense (gray), and nonsense (blue) mutations to the DFE are indicated. Gene fitness as a function of codon substitution is provided as supplementary data S1 (Supplementary Material online).
Fig. 1.

Distribution of gene fitness effects (DFE) of mutations in TEM-1. (A) The DFE of point mutations (i.e., 1-bp changes in the gene). (B) The DFE of all possible codon substitutions (i.e., all 1, 2, and 3 base changes in the 287 codons of TEM-1). Gene fitness values for conferring ampicillin resistance are presented on a log scale with 0 corresponding to the fitness of TEM-1. The contributions of synonymous (red), missense (gray), and nonsense (blue) mutations to the DFE are indicated. Gene fitness as a function of codon substitution is provided as supplementary data S1 (Supplementary Material online).

The sequence-function landscape of TEM-1. The heat map indicates the protein fitness values for ampicillin resistance of the indicated amino acid substitution. The Ambler consensus numbering system (Ambler et al. 1991) for class A β lactamases is used. An asterisk indicates key active site residues. For the start codon, fitness values correspond to the average of the codons for the indicated amino acid though methionine is expected to be the amino acid incorporated. Protein fitness as a function of missense mutation is provided as supplementary data S2 (Supplementary Material online).
Fig. 2.

The sequence-function landscape of TEM-1. The heat map indicates the protein fitness values for ampicillin resistance of the indicated amino acid substitution. The Ambler consensus numbering system (Ambler et al. 1991) for class A β lactamases is used. An asterisk indicates key active site residues. For the start codon, fitness values correspond to the average of the codons for the indicated amino acid though methionine is expected to be the amino acid incorporated. Protein fitness as a function of missense mutation is provided as supplementary data S2 (Supplementary Material online).

TEM-1’s DFE (fig. 1) indicated the gene was fairly robust to mutations as nearly one-half (47.3%) of all alleles retained at least 50% of the fitness of TEM-1. Among alleles with point mutations, 63.8% maintained at least 50% of the fitness of TEM-1 (53.2% of the nonsynonymous and 97.2% of the synonymous point mutations), less than a previous estimation of 75% (Soskine and Tawfik 2010). Still, a sizable fraction of the mutants lost more than 90% of their fitness (19.6% of the point mutations and 30.3% of all codon substitutions), roughly in line with previous estimates of the frequency of mutations having a severe deleterious effect (Camps et al. 2007). Among point mutants, only 6% of the alleles completely lost the ability to provide any Amp resistance and 33% of those were nonsense mutations (fig. 1A), which is similar to the approximately 8% inactivating mutations found in a previous study of an error-prone PCR library of TEM-1 (Soskine and Tawfik 2010). Only 7.1% (1,074/15,167) of the alleles and 7.0% (367/5,212) of the missense mutations increased fitness above that of TEM-1 outside the range of the error. The bimodal distribution was qualitatively similar to the DFE of randomly chosen point mutations in DNA and RNA viruses (Sanjuan et al. 2004; Peris et al. 2010), the DFE of a set of induced mutations in yeast (Wloch et al. 2001), the DFE of missense mutations of ubiquitin (Roscoe et al. 2013), a sampling of TEM-1’s DFE for amoxicillin resistance (Jacquier et al. 2013), and estimations of TEM-1’s DFE for Amp resistance (Soskine and Tawfik 2010).

Benefits of the Genetic Code’s Architecture

TEM-1’s DFE provides evidence that the standard genetic code’s architecture minimizes the deleterious effects of mutations and enriches for adaptive mutations. The adaptive theory on the origin of the genetic code states that the genetic code is arranged to minimize the deleterious effects of mutations and mistranslations (Sonneborn 1965; Woese 1965). This theory predicts that point mutations would be less deleterious than 2- or 3-bp substitutions. We have recently shown this prediction held true for mutations in two small genes (HB36 and HB80; Whitehead et al. 2012) that were reengineered for a new function in a nonnative organism (Firnberg and Ostermeier 2013). Here, we find that this prediction is also true of a wild-type gene in its native host. The median changes in relative gene fitness for 1-, 2-, and 3-bp substitutions at a codon position were −0.36, −0.52, and −0.63, respectively. More significantly, the frequency of point mutations among the alleles with a fitness less than 0.1 was 35.3% less than that expected if the point mutations were evenly distributed across all fitness values (P = 1.1 × 1040 based on comparison with a hypergeometric distribution). For HB36 and HB80, point mutations were 56.4% and 53.8% depleted from clones with a fitness less than 0.1, respectively (P = 1.35 × 1018 and 1.27 × 1021) (Firnberg and Ostermeier 2013). We interpret this result as evidence that the code’s arrangement minimizes the fitness cost of amino acid substitutions. An alternative explanation is that TEM-1, as a product of millions of years of evolution under the standard genetic code (Hall and Barlow 2004), evolved to minimize the deleterious effects of mutations under the rules of this code.

We also find further support for our hypothesis that the standard genetic code’s architecture enriches for adaptive mutations (Firnberg and Ostermeier 2013). Among the 367 beneficial missense mutations in TEM-1, 41.1% can be achieved by point mutations, 32.5% higher than the 31.0% expected if 367 missense mutations were chosen at random (P = 8.8 × 106 based on comparison to a hypergeometric distribution). Our comprehensive analysis of beneficial mutations in a natural gene in its native host is the strongest evidence yet supporting the hypothesis that the code’s arrangement makes adaptive mutations more likely. The role, if any, such enrichment played in the origin of the genetic code and whether the enrichment is a side effect of the code’s error minimization bias are difficult questions to answer (Firnberg and Ostermeier 2013).

The Effects of Synonymous Mutations

The effects of synonymous mutations on protein synthesis and fitness have important implications for evolution and biotechnology. However, despite an abundance of plausible hypotheses, we lack a mechanistic understanding of these effects (Plotkin and Kudla 2011). Our systematic strategy provides an assumption-free approach for testing and generating these hypotheses. We first examined the gene fitness values of 725 alleles synonymous to TEM-1. Beneficial and deleterious synonymous mutations distributed differently across the sequence of TEM-1 (fig. 3A). Beneficial mutations occur primarily in positions 15–30 and 130–260, whereas deleterious mutations appeared in clusters in the first half of the gene and were almost absent from the second half of the gene. No trend in the types of substitutions for either beneficial or deleterious effect was apparent other than eight of the ten beneficial mutations at Arg codons being to the rare E. coli codons AGA (2/10) and AGG (6/10). The pattern of beneficial and deleterious synonymous codons indicates the existence of regions of TEM-1 with suboptimal and less robust mRNA properties, respectively.

Effects of synonymous substitutions. (A) Beneficial and deleterious synonymous mutations in TEM-1 are not evenly distributed. Alleles synonymous to TEM-1 with a gene fitness significantly higher (red circles) or lower (blue squares) than that of TEM-1 are shown. The criteria for significance was that the error did not extend into the range fitness = 1 ± 0.1. Error bars provide an upper estimate on error on the fitness measurements as described in the text. (B) An analysis of pairs of synonymous alleles with mutations in codon positions 2–10 of the gene (supplementary figs. S6 and Supplementary Data, Supplementary Material online) revealed that codon preferences tended to be for codons with a higher frequency in the Escherichia coli genome. (C) Preferred codons at positions 2–10 of the gene were predicted to result in mRNA with less stable structures around the initiation codon. Red bar is the mean. **P < 0.01, ****P < 0.0001 by Student’s t-test.
Fig. 3.

Effects of synonymous substitutions. (A) Beneficial and deleterious synonymous mutations in TEM-1 are not evenly distributed. Alleles synonymous to TEM-1 with a gene fitness significantly higher (red circles) or lower (blue squares) than that of TEM-1 are shown. The criteria for significance was that the error did not extend into the range fitness = 1 ± 0.1. Error bars provide an upper estimate on error on the fitness measurements as described in the text. (B) An analysis of pairs of synonymous alleles with mutations in codon positions 2–10 of the gene (supplementary figs. S6 and Supplementary Data, Supplementary Material online) revealed that codon preferences tended to be for codons with a higher frequency in the Escherichia coli genome. (C) Preferred codons at positions 2–10 of the gene were predicted to result in mRNA with less stable structures around the initiation codon. Red bar is the mean. **P < 0.01, ****P < 0.0001 by Student’s t-test.

We next analyzed the effects of 14,055 synonymous substitutions among the set of 15,167 alleles with gene fitness measurements (supplementary fig. S3, Supplementary Material online). Over the length of the entire gene, CUA (Leu), AGG (Arg), and UCG (Ser) provided an average fitness advantage over some of their synonymous codons (supplementary fig. S5, Supplementary Material online), but the advantage was only approximately 5%. Interestingly, CUA and AGG are rare codons in E. coli. Codon usage often differs in the beginning of the gene from the rest of the gene, which has been hypothesized to result from a selection against 5′ mRNA structure and/or a selection for rare codons that provide a slower elongation time at the 5′ end (Plotkin and Kudla 2011). Our data address both these hypotheses. Positions 2–10 in TEM-1 had an almost 2-fold broader distribution of synonymous effects compared with any other section of the gene (supplementary fig. S3F, Supplementary Material online). Within these nine positions, we observed 26–85% mean fitness increases for certain codons of Ala, Arg, Gly, Leu, Pro, and Ser relative to select synonyms (supplementary fig. S6, Supplementary Material online). These synonymous fitness differences distributed differently among the nine positions (supplementary fig. S7, Supplementary Material online). Contrary to the slow elongation hypothesis, favored codons tended to appear more frequently in the E. coli genome than their corresponding disfavored codon (fig. 3B) (Hilterbrand et al. 2012). However, none of the 16 observed codon preferences were between the most and least frequently used codons within a synonym set suggesting that codon usage was an inadequate explanation for the observed preferences (e.g., as a result of tRNA abundance). We next calculated the folding energy of the mRNA around the initiation codon for alleles exhibiting gene fitness differences (Hofacker 2009). In almost all cases, favored codons reduced mRNA stability around the translation start site compared to disfavored codons (fig. 3C). Our findings reinforce recent studies that undermine the slow elongation hypothesis (Supek and Smuc 2010; Charneski and Hurst 2013) and support the theory that mRNA structure at the beginning of genes determines the translation rate (Bentele et al. 2013; Goodman et al. 2013). Like the most recent of these studies (Goodman et al. 2013), our study shows how systematic analyses of large synthetic libraries is a powerful approach for testing competing hypotheses.

Exceptions to the Standard Genetic Code

Among the three stop codons, UAG (amber) exhibited nonsense suppression (supplementary fig. S8A, Supplementary Material online). A 3′ flanking purine after the UAG enhanced this suppression (supplementary fig. S8B, Supplementary Material online), as has been observed with the amber suppressor tRNA allele supE (Bossi 1983; Miller and Albertini 1983). We sequenced the seven tRNAs known to serve as amber nonsense suppressors and found that E. coli strain SNO301 harbors the supE44 allele, which consists of a duplicate copy of the glnV tRNA gene, glnX, with the expected anticodon mutation (thereby inserting glutamine at UAG codons) (Singaravelan et al. 2010). This allele suppressed a UAG with a 3′ flanking purine at a mean efficiency of 7–10% (supplementary fig. S8C, Supplementary Material online).

Substitutions for the AUG start codon that provided significant antibiotic resistance (>5% of that of TEM-1) included seven of the nine point mutants of AUG (supplementary fig. S9, Supplementary Material online), consistent with known native and nonnative alternative initiation codons in E. coli (Sacerdot et al. 1996; Sussman et al. 1996). In addition, we observed that GUA, GUC, and GUU could serve as weak initiation codons (7–14% as efficient AUG). Initiation from GUA in E. coli has been previously reported (Haggerty and Lovett 1997), but initiation from GUC and GUU has not.

Mutational Tolerance

As the effects of nonsynonymous mutations dwarfed that of synonymous mutations, we combined the gene fitness data of synonymous codons to determine the DFE of missense mutations in the TEM-1 protein (fig. 2). This protein fitness landscape of TEM-1 broadly matched what is known about protein structure in general and TEM-1 in particular. For example, proline was the least tolerated substitution (supplementary fig. S10, Supplementary Material online, which displays TEM-1’s amino acid substitution matrix for Amp resistance), especially in alpha helices, and key TEM-1 active site residues did not tolerate mutation (fig. 2). TEM-1’s signal sequence is required for export via the Sec pathway to the periplasm. The signal sequence (fig. 2) tolerated most mutations consistent with the pathway’s broad specificity (Gierasch 1989). However, the hydrophobic core of the signal sequence did not tolerate substitution of charged residues, consistent with typical export-defective mutants in Sec pathway signal sequences (Gierasch 1989). Signal sequence residue L21 was a hot spot for beneficial mutations, and L21F is found in some extended-spectrum–resistant TEM alleles (Sougakoff et al. 1989).

The comprehensive protein fitness landscape of missense mutations enables a rigorous determination of a protein’s mutational tolerance in its biological context. We determined the effective number of amino acids at a position (k*), which derives from the fitness entropy that is calculated from the distribution of protein fitness values for the 20 amino acids at that position. This measure of tolerance is more informative than establishing an arbitrary fitness cutoff for deciding whether a mutation is tolerated. Our approach is analogous to how information-theoretical entropy is used to measure variability at a position in a set of aligned sequences (Shenkin et al. 1991). However, our measure of tolerance is specific for TEM-1 and the effect of single amino acid substitutions. This tolerance is a measure of TEM-1’s ability to move a Hamming distance of one on the amino acid level and thus does not include epistatic effects. A k* value of 1 corresponds to a position at which all missense mutations completely inactivate the protein and a k* value of 20 means that all 19 amino acid substitutions provide the same fitness as the wild-type amino acid.

The distribution of k* was strongly biased toward high values (fig. 4A). Half of all positions accepted 15.5 or more amino acid substitutions. Under the simple assumption of a linear correlation, percent solvent-accessible surface area accounted for 49% of the k*’s variance (fig. 4B) and predicted k* better than distance from the active site (fig. 4C) or a k* determined from a sequence alignment of 156 class A β-lactamases (Deng et al. 2012) (fig. 4D). Both a k* based on a multiple sequence alignment (fig. 4D) and previous calculations of k* for TEM-1 (Deng et al. 2012) (supplementary fig. S11, Supplementary Material online) greatly underestimated TEM-1’s mutational tolerance to single amino acid substitutions presumably because epistatic constraints will further limit what sequence combinations are seen naturally, the set of known functional sequences is only a small subset of all possible functional sequences, and a high stringency was used in selecting functional sequences in the case of the later study. The tolerance of amino acid position i weakly correlated with positions i + 1, i + 3, and i + 4 (correlation coefficient 0.25–0.28, P ≤ 1.8 × 105) but not i + 2 or i > 4. This correlation primarily occurred at residues with high k* values. The eight positions with k* < 2.5 include the four strictly conserved residues involved in the catalytic mechanism (S70, K73, S130, and E166) and four other highly conserved residues (fig. 4E). In contrast, the 42 most tolerant positions (k* > 19) predominantly appeared away from the active site in surface loops and at position 2 in alpha helices (fig. 4E). Alpha helices (mean k* = 13.5 ± 5.4) tolerated substitutions better than beta strands (mean k* = 9.89 ± 4.8) (P = 0.0005 Student’s t-test), perhaps a reflection of the buried nature of the beta strands.

Tolerance of TEM-1 to missense mutation. Tolerance was measured by the effective number of amino acids at a position (k*), which derives from the distribution of protein fitness values for the 20 amino acids at that position. k* ranges in value from 1 (position is completely intolerant to substitution) to 20 (position tolerates all possible amino acids with no loss in fitness). (A) The distribution of k* values in TEM-1. (B) Correlation of k* correlates with percent solvent-accessible surface. (C) Correlation of k* with distance from the active site. (D) Correlation of k* with a sequence alignment of 156 class A β lactamases (Deng et al. 2012). (E) Model of TEM-1 (PDB ID 1XPB [Fonze et al. 1995]) indicating the least tolerant positions (k* < 2.5, shown in blue), which include the key active site residues S70, K73, S130, and E166, and the most tolerant positions (k* > 19, shown in red).
Fig. 4.

Tolerance of TEM-1 to missense mutation. Tolerance was measured by the effective number of amino acids at a position (k*), which derives from the distribution of protein fitness values for the 20 amino acids at that position. k* ranges in value from 1 (position is completely intolerant to substitution) to 20 (position tolerates all possible amino acids with no loss in fitness). (A) The distribution of k* values in TEM-1. (B) Correlation of k* correlates with percent solvent-accessible surface. (C) Correlation of k* with distance from the active site. (D) Correlation of k* with a sequence alignment of 156 class A β lactamases (Deng et al. 2012). (E) Model of TEM-1 (PDB ID 1XPB [Fonze et al. 1995]) indicating the least tolerant positions (k* < 2.5, shown in blue), which include the key active site residues S70, K73, S130, and E166, and the most tolerant positions (k* > 19, shown in red).

Substitution matrices, such as BLOSUM (Henikoff S and Henikoff JG 1992), score the likelihood of substituting one amino acid for another based on alignments of conserved regions of related proteins. A recent study found that the BLOSUM62 matrix best predicted the effect of nonsynonymous mutations in TEM-1 and explained 16% of the variance in the MIC for amoxicillin (Jacquier et al. 2013). We find that BLOSUM62 matrix scores predict 16% of the variance in protein fitness caused by nonsynonymous mutations.

Determinants of Mutational Effects on Protein Fitness

What basic phenomena underlie the DFE? For an enzyme, fitness (w) will strongly depend on the total catalytic activity in the cell (vt), which is a product of the enzyme’s specific catalytic activity (vsp) and the protein abundance (P), which is how much protein is present in the cell in a correctly folded, soluble form.
(1)
(2)
For many genes, especially essential genes, the functional form of equation (1) is likely complex. For example, an increase in vt may be detrimental for fitness if it negatively perturbs metabolic flux in the cell. In addition, essential genes are likely to have evolved to be buffered against the deleterious effects of mutation (i.e., they possess a vt that is well above a level that would compromise fitness). A study of 27 mutants of the essential enzyme dihydrofolate reductase (DHFR) supports this idea (Bershtein et al. 2012). The mutations were chosen to have little effect on specific catalytic activity but a range of effects on thermostability. The authors found that organismal fitness (i.e., growth rate) only weakly correlated with protein abundance, and large decreases in protein abundance generally had marginal effects on fitness. A follow-up study estimated that DHFR could sustain an 80% cut in protein abundance with little effect on organismal fitness and that the dependence of fitness on protein abundance exhibited Michaelis–Menten-like behavior (Bershtein et al. 2013). This study illustrates the challenge of gaining insight into the basic phenomena underling the DFE without knowledge of the form of equation (1). TEM-1 offers a simple case for addressing this issue, as TEM-1 fulfills a single cellular role (inactivation of β-lactam antibiotics), and the reaction’s substrate and product are not part of any native E. coli metabolic or signaling pathway. As a result, fitness for TEM-1 is directly proportional to the total antibiotic hydrolysis activity in the cell (Soskine and Tawfik 2010), as shown in equation (3):
(3)

and this facilitates an analysis of the relative effects of mutation on both vsp and P.

Experimental evolution studies have shown that vsp and P are equally important targets for adaptive evolution (Counago et al. 2008; Walkiewicz et al. 2012). We expect protein abundance to be a function of the thermodynamic stability (ΔG) as well as protein production rates (i.e., arising from mRNA properties, interactions with chaperones) and degradation rates (i.e., proteolytic susceptibility). Both computational and experimental studies show that, on average, missense mutations decrease thermodynamic stability (Tokuriki et al. 2007). A prevailing hypothesis on the origin of deleterious fitness effects of mutation states that thermodynamic stability is the primary determinant of the DFE through its effect on protein abundance (DePristo et al. 2005; Camps et al. 2007; Tokuriki et al. 2007; Wylie and Shakhnovich 2011). Although the hypothesis is intuitive and appealing, experimental evidence for a significant correlation between protein stability and fitness via an effect on protein abundance is scant (Bershtein et al. 2012). Mutations that reduce function often show decreased protein abundance (Pakula et al. 1986; Schultz and Richards 1986); however, mutations that increase stability can reduce specific activity (Shoichet et al. 1995) and reductions in protein stability often accompany adaptive mutations (Wang et al. 2002). In the aforementioned study of 27 mutations that destabilized DHFR (most of which were buried in the hydrophobic core of the protein) the authors found that although protein abundance correlated with thermostability (r2 = 0.41 at 37 °C), organismal fitness changed very little with protein abundance (Bershtein et al. 2012). The study could not address how the deleterious effects of mutations partition between effects on specific catalytic activity and protein abundance because the mutations were selected to be those that do not affect catalytic activity. A study of 990 missense mutations in TEM-1 found that 15–19% of the variance in amoxicillin resistance could be explained by the computationally predicted change in protein stability caused by the introduction of the mutation in TEM-116 (TEM-116 is TEM-1 with the V84I and A184V mutations) (Jacquier et al. 2013); however, the study did not address the mutations’ effect on protein abundance or specific catalytic activity. A comprehensive, systematic study of 1) the relationship between fitness and thermostability, and 2) the relative contributions of protein abundance and specific activity to the deleterious effects of mutations would more fully address the fundamental phenomena underlying the DFE.

We predicted ΔΔGGwild-type – ΔGmutant) using Rosetta (Das and Baker 2008; Chaudhury et al. 2010) for 4,783 missense mutations of TEM-1, allowing limited backbone flexibility (fig. 5A). Variants that were predicted to be more stable tended to have higher protein fitness (fig. 5A; supplementary fig. S12A, Supplementary Material online). The larger a mutation’s deleterious effect on fitness, the higher the probability that the mutation produced a very large predicted energy score (supplementary fig. S12A, Supplementary Material online). Predictions of ΔΔG using PoPMuSiC (Dehouck et al. 2011), a more empirical approach than Rosetta to predicting changes in protein stability, produced similar results and indicated that 18% of the variance in protein fitness can be explained by thermostability (supplementary fig. S12B, Supplementary Material online). We compared fitness with experimentally measured melting temperatures of 36 TEM-1 alleles and found a positive correlation with an r2 of 0.53 (supplementary fig. S12D, Supplementary Material online). This suggests that limitations in the computational prediction of ΔΔG result in an underestimation of the degree to which thermodynamic stability determines fitness. The lack of a positive correlation between melting temperature and fitness observed in a previous study (Bershtein et al. 2012) underscores the fact that protein stability effects on fitness will be observed only when the fitness function of equation (1) is in a regime where changes to vt affect fitness. Our systematic and comprehensive approach to examining the relationship between protein stability and protein fitness for a single protein, combined with our observed correlation between melting temperature and fitness provides strong experimental evidence that effects on protein stability significantly shape the DFE.

The determinants of protein fitness. (A) Loss of fitness correlates with loss of thermodynamic stability. Protein fitness is shown as a function of change in ΔG as predicted by Rosetta (Chaudhury et al. 2010) for 4,783 missense mutations in wild-type TEM-1. The median predicted ΔΔG for fitness deciles is shown in red triangles. Predicted changes >15 REU are not shown and are not considered in the median calculation. The distribution of ΔΔG for select fitness deciles can be found in supplementary figure S12A, Supplementary Material online. (B) Total cellular catalytic activity determines TEM-1 protein fitness. The average total cellular catalytic activity <vt> and the average protein abundance <P> were experimentally measured for 13 sublibraries of ∼15,000 unique TEM-1 alleles partitioned based on relative fitness. The values of <vt> and <P> are relative to that of TEM-1. The slight sigmoidal form of <vt> is an expected artifact of the methodology (supplementary fig. S14, Supplementary Material online). The error bars represent the standard deviation of six assays from two independent experiments. The lines are guides for the eye. (C) Protein fitness phase space defined by equation (3). The protein abundance and specific catalytic activity (relative to TEM-1) of 26 randomly selected members of sublibraries 6 (red solid circle) and 7 (blue open square) is shown. The dotted line corresponds to an equal decrease in protein abundance and specific catalytic activity. In region 1, a mutation affects specific catalytic activity more than protein abundance. The solid lines are of constant fitness at the average expected fitness values of the two sublibraries from which the alleles were randomly selected.
Fig. 5.

The determinants of protein fitness. (A) Loss of fitness correlates with loss of thermodynamic stability. Protein fitness is shown as a function of change in ΔG as predicted by Rosetta (Chaudhury et al. 2010) for 4,783 missense mutations in wild-type TEM-1. The median predicted ΔΔG for fitness deciles is shown in red triangles. Predicted changes >15 REU are not shown and are not considered in the median calculation. The distribution of ΔΔG for select fitness deciles can be found in supplementary figure S12A, Supplementary Material online. (B) Total cellular catalytic activity determines TEM-1 protein fitness. The average total cellular catalytic activity <vt> and the average protein abundance <P> were experimentally measured for 13 sublibraries of ∼15,000 unique TEM-1 alleles partitioned based on relative fitness. The values of <vt> and <P> are relative to that of TEM-1. The slight sigmoidal form of <vt> is an expected artifact of the methodology (supplementary fig. S14, Supplementary Material online). The error bars represent the standard deviation of six assays from two independent experiments. The lines are guides for the eye. (C) Protein fitness phase space defined by equation (3). The protein abundance and specific catalytic activity (relative to TEM-1) of 26 randomly selected members of sublibraries 6 (red solid circle) and 7 (blue open square) is shown. The dotted line corresponds to an equal decrease in protein abundance and specific catalytic activity. In region 1, a mutation affects specific catalytic activity more than protein abundance. The solid lines are of constant fitness at the average expected fitness values of the two sublibraries from which the alleles were randomly selected.

Whether mutational reductions in protein abundance as opposed to specific activity are the major cause of loss of fitness has not previously been experimentally addressed. We experimentally addressed this question by analyzing the soluble fraction of cell lysates of the 13 sublibraries and randomly selected alleles from our TEM-1 library. We first established that w and vt are directly proportional as predicted by equation (3) by measuring the mean total hydrolysis activity of the cell <vt> for the 13 sublibraries of CCM-2 (fig. 5B). We measured protein abundance by quantitative western blot of the soluble fraction of cell lysates (supplementary fig. S13, Supplementary Material online). We assumed all soluble protein was folded and active. The mean protein abundance <P> of the sublibraries did not decrease nearly as rapidly with decreasing protein fitness as <vt> did (fig. 5B). In addition, an increase in the mean amount of aggregated protein did not accompany a loss of fitness (supplementary fig. S15, Supplementary Material online). These findings suggest that mutational effects on vsp rather than on P may play the larger role in the deleterious effects of mutation. As this interpretation hinges on the distribution of values of P in the sublibraries, we measured P for 27 randomly selected alleles with a protein fitness of about 0.025–0.05 (i.e., the alleles of supplementary fig. S4, Supplementary Material online). We chose this fitness range so that the mutational effects were substantial, but not inactivating. This ensured that our conclusions would not depend on small changes in w and P. We excluded the I13E allele from analysis since this mutation in the signal sequence caused a defect in normal export/proteolytic processing (supplementary fig. S13, Supplementary Material online). The remaining 26 alleles exhibited a decrease in both protein abundance and predicted thermodynamic stability relative to TEM-1 with the exception of the R244E allele, which showed an increase in both (supplementary fig. S12C, Supplementary Material online). From w and P, we calculated vsp using equation (3) and examined the protein fitness phase space by plotting P versus vsp (fig. 5C). We find that the deleterious effects of mutations, at least for mutations with large deleterious effects, arise more from a decrease in specific activity than from a decrease in protein abundance. Despite the large negative effects on specific activity, the mutated residues of the 26 alleles were not clustered around the active site but were scattered throughout the protein (supplementary fig. S4C, Supplementary Material online). Thus, the dominant effect of mutation on specific activity does not arise because the 26 mutations were biased to be proximal to the active site. We postulate that mutational effects on specific activity may be as important to the DFE at high fitness as at low fitness, but this postulate requires experimental investigation.

We do not interpret the diminished role of mutational effects on protein abundance as reducing the role of thermodynamic stability in fitness. Protein stability, in addition to its effect on protein abundance, may exert its effect on fitness through a decrease in a protein’s specific activity. Perhaps, this manifests by perturbing the conformational ensemble away from more active states or by increasing the number of states (i.e., altering protein dynamics). Protein abundance’s relative resilience to decreases in thermodynamic stability is striking but fits the growing appreciation that the cellular environment is not a passive solution at equilibrium (Bershtein et al. 2013). Rather, the cellular environment acts as a buffer for deleterious mutational effects on protein abundance through the effect of chaperones, proteins that facilitate the proper folding of other proteins (Rutherford 2003). Chaperone overexpression can compensate for the deleterious mutational effects on protein abundance (Tokuriki and Tawfik 2009) and fitness (Bershtein et al. 2013). This theory offers an explanation for TEM-1’s stability threshold that buffers the effect of mutations on fitness (Bershtein et al. 2006). We suspect that the relative contribution of protein abundance to fitness may increase with the number of mutations as the protein’s stability margin is exhausted by the cumulative effect of mutations, an effect that is characterized by negative epistasis (Bershtein et al. 2006). As such, negative epistasis may arise in part as a consequence of the beneficial properties of the cellular environment in addition to a protein’s intrinsic stability margin.

Conclusions

The application of synthetic biology to the study of fundamental biological questions, as we have done in this study of gene and protein fitness landscapes, offers a well-defined, systematic approach for testing and generating hypotheses. Our comprehensive determination of the fitness effects of mutation of TEM-1 provides the first detailed maps of fitness landscapes corresponding to a gene and its nearest neighbors at the basepair, codon, and amino acid level. To the extent that TEM-1 is a representative gene, our study provides several important insights. Evolution must traverse fitness landscapes under the constraints of the genetic code—constraints that minimize the effect of mutation and enrich for beneficial mutations. The small fitness effects of synonymous mutations have complex determinants including regional proclivities for synonymous fitness effects in the gene. At the beginning of the gene, fitness effects of synonymous mutations strongly correlate with mRNA stability. Missense mutational effects on thermodynamic stability shape the DFE, but their deleterious effect on specific protein activity tends to exceed that on protein abundance, at least for mutations with large deleterious effects. We hypothesize that TEM-1’s high mutational tolerance may in part derive from the cell’s buffering capacity to mediate the deleterious effects of lost stability on protein abundance, a phenomena that might give rise to negative epistasis. Further inquiry into the fundamental determinants of the landscape’s topology is necessary to address this hypothesis and substantiate these findings.

Materials and Methods

Fitness Determination

Escherichia coli SNO301 (ampD1, ampA1, ampC8, pyrB, recA, and rpsL) cells harboring the comprehensive codon mutagenesis library CCM2 (Firnberg and Ostermeier 2012) were plated on LB-agar plates supplemented with 13 different Amp concentrations (2-fold increments ranging from 0.25 μg/ml to 1,024 μg/ml) to partition the library into 13 partially overlapping sublibraries based on relative Amp resistance using a synthetic gene circuit that functions as a tunable band-pass genetic selection for Amp resistance (Sohka et al. 2009) (supplementary figs. S1 and Supplementary Data, Supplementary Material online). Barcoded PCR amplicons were prepared from each plate and subjected to 454 deep-sequencing. The 1,325,979 sequencing reads were filtered for quality and for reads that only contained one codon substitution. We tabulated the number of sequencing counts for each allele at each Amp concentration and determined the fitness w relative to TEM-1 from the statistics. As the distribution of growth as a function of Amp is roughly symmetric when plotted as the log2(Amp concentration) (Sohka et al. 2009), we determined the unnormalized fitness f of allele i as
(4)
in which ci,p is the number of counts of allele i on sublibrary plate p in the deep sequencing data and ap is the concentration of Amp on sublibrary plate p in μg/ml. We normalized all fitnesses by the fitness of wild type as follows:
(5)

This result is a normalized fitness wi that is 1.0 for wild-type TEM-1, greater than 1.0 for beneficial mutations, and between 0 and 1.0 for deleterious mutations. We determined the fitness of wild-type TEM-1 (fWT) using equation (4) using the counts of all alleles with a synonymous substitution in TEM-1, because the fitness of these varied very little. As a check, we compared this value with the fitness determined by equation (4) using the counts of all sequencing reads that lacked a mutation. The two values differed by only 2.5%. We determined an upper limit on the error in our fitness measurements from the DFEs of synonymous mutations as a function of the number of times an allele was observed (supplementary fig. S3, Supplementary Material online). Fitness values and error estimates are tabulated in supplementary data S1 and Supplementary Data (Supplementary Material online).

Prediction of mRNA Stability at the Transcript Start

The RNAfold utility of the Vienna RNA Package (version 2.1.2) was used to predict the minimum free energy of RNA sequences (Hofacker 2009). For each allele in supplementary figure S7 (Supplementary Material online), the Gibbs free energy was calculated as the average free energy of every 39 nt window centered on nucleotides from −5 to +10 of the gene start as described (Bentele et al. 2013).

Mutational Tolerance

The observed effective number of amino acids formula at a position was determined from the protein fitness values of the n missense mutations with fitness data at that position using equations (6) and (7).
(6)
(7)
We obtained the effective number of amino acids k* by normalizing formula to be based on 20 amino acids by equation (8).
(8)
A table of formula and k* is provided as supplementary data S4 (Supplementary Material online).

Prediction of Protein Thermodynamic Stability

PyRosetta v3.4.0 r55307 (Chaudhury et al. 2010) was used to compute the difference in score (in Rosetta energy units [REU]) between the mature structures (lacking the signal sequence) of each amino acid mutant and wild-type TEM-1 (Protein Data Bank identifier 1XPB; Fonze et al. 1995). PopMusic predictions of ΔΔG (supplementary fig. S12B, Supplementary Material online) were determine online at http://babylone.ulb.ac.be/popmusic (last accessed February 17, 2014) (Dehouck et al. 2011) using 1XPB.

Protein Abundance and Total Catalytic Activity

Relative protein abundance was quantified by using Quantity One 1-D analysis software (Bio-Rad) of Western blots of the soluble fraction of cell lysates in comparison with a standard curve. Representative westerns are shown in supplementary figure S13, Supplementary Material online. Catalytic activity of the sublibraries and clones was determined by measuring the rate of hydrolysis (at 486 nm) of 50 μM nitrocefin in 10 mM phosphate buffer pH 7.4 at 37 °C. The initial rate was normalized by the total amount of protein added for each sample.

Acknowledgments

The authors thank Yousif Shamoo and Barrett Steinberg for helpful comments on the manuscript. This work was supported by the National Science Foundation (DEB-0950939 and MCB-0919377) to M.O.

References

Ambler
RP
Coulson
AF
Frere
JM
Ghuysen
JM
Joris
B
Forsman
M
Levesque
RC
Tiraby
G
Waley
SG
,
A standard numbering scheme for the class A beta-lactamases
Biochem J.
,
1991
, vol.
276
Pt 1
(pg.
269
-
270
)
Araya
CL
Fowler
DM
Chen
W
Muniez
I
Kelly
JW
Fields
S
,
A fundamental protein property, thermodynamic stability, revealed solely from large-scale measurements of protein function
Proc Natl Acad Sci U S A.
,
2012
, vol.
109
(pg.
16858
-
16863
)
Bentele
K
Saffert
P
Rauscher
R
Ignatova
Z
Bluthgen
N
,
Efficient translation initiation dictates codon usage at gene start
Mol Syst Biol.
,
2013
, vol.
9
pg.
675
Bershtein
S
Mu
W
Serohijos
AW
Zhou
J
Shakhnovich
EI
,
Protein quality control acts on folding intermediates to shape the effects of mutations on organismal fitness
Mol Cell.
,
2013
, vol.
49
(pg.
133
-
144
)
Bershtein
S
Mu
W
Shakhnovich
EI
,
Soluble oligomerization provides a beneficial fitness effect on destabilizing mutations
Proc Natl Acad Sci U S A.
,
2012
, vol.
109
(pg.
4857
-
4862
)
Bershtein
S
Segal
M
Bekerman
R
Tokuriki
N
Tawfik
DS
,
Robustness-epistasis link shapes the fitness landscape of a randomly drifting protein
Nature
,
2006
, vol.
444
(pg.
929
-
932
)
Bossi
L
,
Context effects: translation of UAG codon by suppressor tRNA is affected by the sequence following UAG in the message
J Mol Biol.
,
1983
, vol.
164
(pg.
73
-
87
)
Camps
M
Herman
A
Loh
E
Loeb
LA
,
Genetic constraints on protein evolution
Crit Rev Biochem Mol Biol.
,
2007
, vol.
42
(pg.
313
-
326
)
Charneski
CA
Hurst
LD
,
Positively charged residues are the major determinants of ribosomal velocity
PLoS Biol.
,
2013
, vol.
11
pg.
e1001508
Chaudhury
S
Lyskov
S
Gray
JJ
,
PyRosetta: a script-based interface for implementing molecular modeling algorithms using Rosetta
Bioinformatics
,
2010
, vol.
26
(pg.
689
-
691
)
Counago
R
Wilson
CJ
Pena
MI
Wittung-Stafshede
P
Shamoo
Y
,
An adaptive mutation in adenylate kinase that increases organismal fitness is linked to stability-activity trade-offs
Protein Eng Des Sel.
,
2008
, vol.
21
(pg.
19
-
27
)
Das
R
Baker
D
,
Macromolecular modeling with Rosetta
Annu Rev Biochem.
,
2008
, vol.
77
(pg.
363
-
382
)
Dehouck
Y
Kwasigroch
JM
Gilis
D
Rooman
M
,
PoPMuSiC 2.1: a web server for the estimation of protein stability changes upon mutation and sequence optimality
BMC Bioinformatics
,
2011
, vol.
12
pg.
151
Deng
Z
Huang
W
Bakkalbasi
E
Brown
NG
Adamski
CJ
Rice
K
Muzny
D
Gibbs
RA
Palzkill
T
,
Deep sequencing of systematic combinatorial libraries reveals beta-lactamase sequence constraints at high resolution
J Mol Biol.
,
2012
, vol.
424
(pg.
150
-
167
)
DePristo
MA
Weinreich
DM
Hartl
DL
,
Missense meanderings in sequence space: a biophysical view of protein evolution
Nat Rev Genet.
,
2005
, vol.
6
(pg.
678
-
687
)
Deris
JB
Kim
M
Zhang
Z
Okano
H
Hermsen
R
Groisman
A
Hwa
T
,
The innate growth bistability and fitness landscapes of antibiotic-resistant bacteria
Science
,
2013
, vol.
342
pg.
1237435
Eyre-Walker
A
Keightley
PD
,
The distribution of fitness effects of new mutations
Nat Rev Genet.
,
2007
, vol.
8
(pg.
610
-
618
)
Firnberg
E
Ostermeier
M
,
PFunkel: efficient, expansive, user-defined mutagenesis
PLoS One
,
2012
, vol.
7
pg.
e52031
Firnberg
E
Ostermeier
M
,
The genetic code constrains yet facilitates Darwinian evolution
Nucleic Acids Res.
,
2013
, vol.
41
(pg.
7420
-
7428
)
Fonze
E
Charlier
P
To'th
Y
Vermeire
M
Raquet
X
Dubus
A
Frere
JM
,
TEM1 beta-lactamase structure solved by molecular replacement and refined structure of the S235A mutant
Acta Crystallogr D Biol Crystallogr.
,
1995
, vol.
51
(pg.
682
-
694
)
Fowler
DM
Araya
CL
Fleishman
SJ
Kellogg
EH
Stephany
JJ
Baker
D
Fields
S
,
High-resolution mapping of protein sequence-function relationships
Nat Methods
,
2010
, vol.
7
(pg.
741
-
746
)
Gierasch
LM
,
Signal sequences
Biochemistry
,
1989
, vol.
28
(pg.
923
-
930
)
Goodman
DB
Church
GM
Kosuri
S
,
Causes and effects of N-terminal codon bias in bacterial genes
Science
,
2013
, vol.
342
(pg.
475
-
479
)
Haggerty
TJ
Lovett
ST
,
IF3-mediated suppression of a GUA initiation codon mutation in the recJ gene of Escherichia coli
J Bacteriol.
,
1997
, vol.
179
(pg.
6705
-
6713
)
Hall
BG
Barlow
M
,
Evolution of the serine beta-lactamases: past, present and future
Drug Resist Updat.
,
2004
, vol.
7
(pg.
111
-
123
)
Henikoff
S
Henikoff
JG
,
Amino acid substitution matrices from protein blocks
Proc Natl Acad Sci U S A.
,
1992
, vol.
89
(pg.
10915
-
10919
)
Hilterbrand
A
Saelens
J
Putonti
C
,
CBDB: the codon bias database
BMC Bioinformatics
,
2012
, vol.
13
pg.
62
Hofacker
IL
,
RNA secondary structure analysis using the Vienna RNA package
Curr Protoc Bioinformatics.
,
2009
 
Chapter 12:Unit12.12
Jacquier
H
Birgy
A
Le Nagard
H
Mechulam
Y
Schmitt
E
Glodt
J
Bercot
B
Petit
E
Poulain
J
Barnaud
G
et al.
,
Capturing the mutational landscape of the beta-lactamase TEM-1
Proc Natl Acad Sci U S A.
,
2013
, vol.
110
(pg.
13067
-
13072
)
McLaughlin
RN
Jr
Poelwijk
FJ
Raman
A
Gosal
WS
Ranganathan
R
,
The spatial architecture of protein function and adaptation
Nature
,
2012
, vol.
491
(pg.
138
-
142
)
Medeiros
AA
,
Beta-lactamases
Br Med Bull.
,
1984
, vol.
40
(pg.
18
-
27
)
Miller
JH
Albertini
AM
,
Effects of surrounding sequence on the suppression of nonsense codons
J Mol Biol.
,
1983
, vol.
164
(pg.
59
-
71
)
Orr
HA
,
The genetic theory of adaptation: a brief history
Nat Rev Genet.
,
2005
, vol.
6
(pg.
119
-
127
)
Pakula
AA
Young
VB
Sauer
RT
,
Bacteriophage lambda cro mutations: effects on activity and intracellular degradation
Proc Natl Acad Sci U S A.
,
1986
, vol.
83
(pg.
8829
-
8833
)
Peris
JB
Davis
P
Cuevas
JM
Nebot
MR
Sanjuan
R
,
Distribution of fitness effects caused by single-nucleotide substitutions in bacteriophage f1
Genetics
,
2010
, vol.
185
(pg.
603
-
609
)
Plotkin
JB
Kudla
G
,
Synonymous but not the same: the causes and consequences of codon bias
Nat Rev Genet.
,
2011
, vol.
12
(pg.
32
-
42
)
Roscoe
BP
Thayer
KM
Zeldovich
KB
Fushman
D
Bolon
DN
,
Analyses of the effects of all ubiquitin point mutants on yeast growth rate
J Mol Biol.
,
2013
, vol.
425
(pg.
1363
-
1377
)
Rutherford
SL
,
Between genotype and phenotype: protein chaperones and evolvability
Nat Rev Genet.
,
2003
, vol.
4
(pg.
263
-
274
)
Sacerdot
C
Chiaruttini
C
Engst
K
Graffe
M
Milet
M
Mathy
N
Dondon
J
Springer
M
,
The role of the AUU initiation codon in the negative feedback regulation of the gene for translation initiation factor IF3 in Escherichia coli
Mol Microbiol.
,
1996
, vol.
21
(pg.
331
-
346
)
Salverda
ML
De Visser
JA
Barlow
M
,
Natural evolution of TEM-1 beta-lactamase: experimental reconstruction and clinical relevance
FEMS Microbiol Rev.
,
2010
, vol.
34
(pg.
1015
-
1036
)
Sanjuan
R
Moya
A
Elena
SF
,
The distribution of fitness effects caused by single-nucleotide substitutions in an RNA virus
Proc Natl Acad Sci U S A.
,
2004
, vol.
101
(pg.
8396
-
8401
)
Schlinkmann
KM
Honegger
A
Tureci
E
Robison
KE
Lipovsek
D
Pluckthun
A
,
Critical features for biosynthesis, stability, and functionality of a G protein-coupled receptor uncovered by all-versus-all mutations
Proc Natl Acad Sci U S A.
,
2012
, vol.
109
(pg.
9810
-
9815
)
Schultz
SC
Richards
JH
,
Site-saturation studies of beta-lactamase: production and characterization of mutant beta-lactamases with all possible amino acid substitutions at residue 71
Proc Natl Acad Sci U S A.
,
1986
, vol.
83
(pg.
1588
-
1592
)
Shenkin
PS
Erman
B
Mastrandrea
LD
,
Information-theoretical entropy as a measure of sequence variability
Proteins
,
1991
, vol.
11
(pg.
297
-
313
)
Shoichet
BK
Baase
WA
Kuroki
R
Matthews
BW
,
A relationship between protein stability and protein function
Proc Natl Acad Sci U S A.
,
1995
, vol.
92
(pg.
452
-
456
)
Singaravelan
B
Roshini
BR
Munavar
MH
,
Evidence that the supE44 mutation of Escherichia coli is an amber suppressor allele of glnX and that it also suppresses ochre and opal nonsense mutations
J Bacteriol.
,
2010
, vol.
192
(pg.
6039
-
6044
)
Smith
JM
,
Natural selection and the concept of a protein space
Nature
,
1970
, vol.
225
(pg.
563
-
564
)
Sohka
T
Heins
RA
Phelan
RM
Greisler
JM
Townsend
CA
Ostermeier
M
,
An externally-tunable bacterial band-pass filter
Proc Natl Acad Sci U S A.
,
2009
, vol.
106
(pg.
10135
-
10140
)
Sonneborn
TM
Bryson
V
Voge
HJ
,
Degeneracy of the genetic code: extent, nature, and genetic implications
Evolving genes and proteins
,
1965
New York
Academic Press
(pg.
377
-
397
)
Soskine
M
Tawfik
DS
,
Mutational effects and the evolution of new protein functions
Nat Rev Genet.
,
2010
, vol.
11
(pg.
572
-
582
)
Sougakoff
W
Petit
A
Goussard
S
Sirot
D
Bure
A
Courvalin
P
,
Characterization of the plasmid genes blaT-4 and blaT-5 which encode the broad-spectrum beta-lactamases TEM-4 and TEM-5 in enterobacteriaceae
Gene
,
1989
, vol.
78
(pg.
339
-
348
)
Starita
LM
Pruneda
JN
Lo
RS
Fowler
DM
Kim
HJ
Hiatt
JB
Shendure
J
Brzovic
PS
Fields
S
Klevit
RE
,
Activity-enhancing mutations in an E3 ubiquitin ligase identified by high-throughput mutagenesis
Proc Natl Acad Sci U S A.
,
2013
, vol.
110
(pg.
E1263
-
E1272
)
Supek
F
Smuc
T
,
On relevance of codon usage to expression of synthetic and natural genes in Escherichia coli
Genetics
,
2010
, vol.
185
(pg.
1129
-
1134
)
Sussman
JK
Simons
EL
Simons
RW
,
Escherichia coli translation initiation factor 3 discriminates the initiation codon in vivo
Mol Microbiol.
,
1996
, vol.
21
(pg.
347
-
360
)
Tokuriki
N
Stricher
F
Schymkowitz
J
Serrano
L
Tawfik
DS
,
The stability effects of protein mutations appear to be universally distributed
J Mol Biol.
,
2007
, vol.
369
(pg.
1318
-
1332
)
Tokuriki
N
Tawfik
DS
,
Chaperonin overexpression promotes genetic variation and enzyme evolution
Nature
,
2009
, vol.
459
(pg.
668
-
673
)
Toth-Petroczy
A
Tawfik
DS
,
Protein insertions and deletions enabled by neutral roaming in sequence space
Mol Biol Evol.
,
2013
, vol.
30
(pg.
761
-
771
)
Walkiewicz
K
Benitez Cardenas
AS
Sun
C
Bacorn
C
Saxer
G
Shamoo
Y
,
Small changes in enzyme function can lead to surprisingly large fitness effects during adaptive evolution of antibiotic resistance
Proc Natl Acad Sci U S A.
,
2012
, vol.
109
(pg.
21408
-
21413
)
Wang
X
Minasov
G
Shoichet
BK
,
Evolution of an antibiotic resistance enzyme constrained by stability and activity trade-offs
J Mol Biol.
,
2002
, vol.
320
(pg.
85
-
95
)
Weinreich
DM
Delaney
NF
Depristo
MA
Hartl
DL
,
Darwinian evolution can follow only very few mutational paths to fitter proteins
Science
,
2006
, vol.
312
(pg.
111
-
114
)
Whitehead
TA
Chevalier
A
Song
Y
Dreyfus
C
Fleishman
SJ
De Mattos
C
Myers
CA
Kamisetty
H
Blair
P
Wilson
IA
et al.
,
Optimization of affinity, specificity and function of designed influenza inhibitors using deep sequencing
Nat Biotechnol.
,
2012
, vol.
30
(pg.
543
-
548
)
Wloch
DM
Szafraniec
K
Borts
RH
Korona
R
,
Direct estimate of the mutation rate and the distribution of fitness effects in the yeast Saccharomyces cerevisiae
Genetics
,
2001
, vol.
159
(pg.
441
-
452
)
Woese
CR
,
On the evolution of the genetic code
Proc Natl Acad Sci U S A.
,
1965
, vol.
54
(pg.
1546
-
1552
)
Wylie
CS
Shakhnovich
EI
,
A biophysical protein folding model accounts for most mutational fitness effects in viruses
Proc Natl Acad Sci U S A.
,
2011
, vol.
108
(pg.
9916
-
9921
)

Author notes

Associate editor: Howard Ochman

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected]

Supplementary data