An Experimentally Informed Evolutionary Model Improves Phylogenetic Fit to Divergent Lactamase Homologs

Phylogenetic analyses of molecular data require a quantitative model for how sequences evolve. Traditionally, the details of the site-specific selection that governs sequence evolution are not known a priori, making it challenging to create evolutionary models that adequately capture the heterogeneity of selection at different sites. However, recent advances in high-throughput experiments have made it possible to quantify the effects of all single mutations on gene function. I have previously shown that such high-throughput experiments can be combined with knowledge of underlying mutation rates to create a parameter-free evolutionary model that describes the phylogeny of influenza nucleoprotein far better than commonly used existing models. Here, I extend this work by showing that published experimental data on TEM-1 beta-lactamase (Firnberg E, Labonte JW, Gray JJ, Ostermeier M. 2014. A comprehensive, high-resolution map of a gene’s fitness landscape. Mol Biol Evol. 31:1581–1592) can be combined with a few mutation rate parameters to create an evolutionary model that describes beta-lactamase phylogenies much better than most common existing models. This experimentally informed evolutionary model is superior even for homologs that are substantially diverged (about 35% divergence at the protein level) from the TEM-1 parent that was the subject of the experimental study. These results suggest that experimental measurements can inform phylogenetic evolutionary models that are applicable to homologs that span a substantial range of sequence divergence.


Introduction
Most approaches for the phylogenetic analysis of gene sequences require a quantitative evolutionary model specifying the rate at which each site substitutes from one identity to another. In maximum-likelihood and Bayesian approaches, the evolutionary model is used to calculate the likelihood of the observed sequences given the phylogenetic tree (Felsenstein 1981;Huelsenbeck et al. 2001). In distancebased approaches, the evolutionary model is used to calculate the distances between pairs of sequences (Hasegawa et al. 1985;Saitou and Nei 1987). For all these approaches, inaccurate evolutionary models can lead to errors in inferred phylogenetic properties, including incorrect estimates of evolutionary distances (Halpern and Bruno 1998) and incorrect tree topologies (Felsenstein 1978;Huelsenbeck and Hillis 1993;Lartillot et al. 2007;Rokas and Carroll 2008).
Unfortunately, existing phylogenetic evolutionary models are extreme simplifications of the actual process of mutation and selection that shapes sequence evolution (Thorne et al. 2007). At least two major unrealistic assumptions afflict most of these models. First, in order to make phylogenetic algorithms computationally tractable, it is generally assumed that each site within a gene evolves independently. Second, most widely used evolutionary models compound the first assumption of independence among sites with the second unrealistic assumption that all sites evolve identically-a severely flawed assumption as there is overwhelming evidence that proteins have strong preferences for certain amino acids at specific sites (Halpern and Bruno 1998;Ashenberg et al. 2013). It is the second of these unrealistic assumptions that is addressed by the experimentally informed evolutionary model described here.
A major reason that most phylogenetic evolutionary models assume that sites evolve identically is because it has traditionally been thought that there is insufficient information to do better. In the absence of a priori knowledge about selection on individual sites, the parameters of an evolutionary model must be inferred from the same sequences that are being analyzed phylogenetically. For instance, typical codonlevel models infer parameters describing the equilibrium frequencies of different codons, the relative rates of transition and transversion mutations, the relative rates of nonsynonymous and synonymous mutations, and in many cases the shapes of distributions that allow some of these rates to be drawn from several categories (Goldman and Yang 1994;Muse and Gaut 1994;Yang et al. 2000;Kosiol et al. 2007). These parameters can easily be inferred for a single general model that applies to all sites in a gene, but it is much more challenging to infer them separately for each site without overfitting the available sequence data (Posada and Buckley 2004;Rodrigue 2013). Some studies have attempted to bypass this problem by predicting site-specific substitution rates or classifying sites based on knowledge of the protein structure (Thorne et al. 1996;Goldman et al. 1998;Rodrigue et al. 2009;Kleinman et al. 2010;Scherrer et al. 2012)-however, such approaches are limited by the fact that the relationship between protein structure and site-specific selection is complex, and cannot be reliably predicted even by state-of-the-art molecular modeling (Potapov et al. 2009).
A more recent and more promising alternative approach is to infer the site-specific substitution process directly from the sequence data (Lartillot and Philippe 2004;Le et al. 2008;Wang et al. 2008;Rodrigue et al. 2010;Wu et al. 2013). Fully specifying a different substitution process for each site in a maximum-likelihood framework requires inferring a large number of free parameters (there are 19 Â L parameters for a gene with L codons if selection is assumed to act on the amino acid sequence). Therefore, the details of the site-specific substitution process must be inferred without overfitting the finite available data. Two strategies have been successfully employed to do this: Constraining sites to fall in a fixed number of different substitution-model classes (Le et al. 2008;Wang et al. 2008) or using nonparametric Bayesian mixture models that treat the site-specific substitution probabilities as random variables that are integrated over a statistical distribution estimated from the data (Lartillot and Philippe 2004;Rodrigue et al. 2010). Although both of these strategies are somewhat complex, they yield much more nuanced evolutionary models that eliminate some of the problems associated with the unrealistic assumption that sites evolve identically.
Even more recently, a new type of high-throughput experiment has begun to yield data that enables the creation of site-specific evolutionary models without any need to infer site-specific selection from the naturally occurring gene sequences that are the subject of the phylogenetic analysis. This new type of experiment is deep mutational scanning (Fowler et al. 2010;Araya and Fowler 2011), a technique in which a gene is randomly mutagenized and subjected to functional selection in the laboratory, and then deep sequenced to quantify the relative frequencies of mutations before and after selection. In cases where the laboratory selection is sufficiently representative of the gene's real biological function, these experiments provide information that can be used to approximate the site-specific natural selection on mutations. To date, deep mutational scanning has been used to quantify the impact of most nucleotide or codon mutations to several proteins or protein domains (Fowler et al. 2010;McLaughlin Jr et al. 2012;Traxlmayr et al. 2012;Melamed et al. 2013;Roscoe et al. 2013;Starita et al. 2013;Bloom 2014;Firnberg et al. 2014). For a few of these studies, the experimental coverage of possible mutations is sufficiently comprehensive to define sitespecific amino acid preferences for all positions in a gene.
I have previously shown that such experimentally determined site-specific amino acid preferences can be combined with measurements of mutation rates to create a parameterfree evolutionary model that describes the phylogeny of influenza nucleoprotein far better than existing nonsite-specific models that contain numerous free parameters (Bloom 2014). Here, I extend that work by showing that it is also possible to create an experimentally informed evolutionary model for another gene. I do this using deep mutational scanning data published by Firnberg et al. (2014) that quantify the effects of nearly all amino acid mutations on TEM-1 betalactamase. In this case, no measurements of mutation rates are available, so I construct an evolutionary model that is informed by the experimentally measured site-specific amino acid preferences and also contains a few free parameters representing the mutation rates. I also augment this model with an additional parameter that reflects the stringency of the site-specific amino acid preferences in natural evolution versus the deep mutational scanning experiment used to measure these preferences. I show that this evolutionary model greatly improves the phylogenetic fit to both TEM and SHV beta-lactamases, the latter of which are substantially diverged (about 35% divergence at the protein level) from the TEM-1 parent that was the subject of the deep mutational scanning by Firnberg et al. (2014). These results generalize previous work on experimentally determined evolutionary models and suggest that site-specific amino acid preferences are sufficiently conserved during evolution to be applicable to gene homologs that span a substantial range of sequence divergence.

Evolutionary Model
Summary of Evolutionary Model I have previously described a codon-level phylogenetic evolutionary model for influenza nucleoprotein for which both the site-specific amino acid preferences and the nucleotide mutation rates (assumed to be identical across sites) were determined experimentally (Bloom 2014). This work examines a protein for which the site-specific amino acid preferences have been measured experimentally, but for which the nucleotide mutation rates are unknown. It is therefore necessary to extend the evolutionary model to treat the nucleotide mutation rates as unknown free parameters. Here, I describe this extension.
In the model used here, the rate P r;xy of substitution from codon x to some other codon y at site r is where Q xy denotes the rate of mutation from x to y, and F r;xy gives the probability that a mutation from x to y fixes if it occurs. This equation assumes that mutation rates are uniform across sites, and that the selection on mutations is sitespecific but site-independent (i.e., the fixation probability at one site is not influenced by mutations at other sites).

Fixation Probabilities from Amino Acid Preferences
The fixation probability of a mutation from codon x to y is assumed to depend only on the encoded amino acids A x ð Þ and A y ð Þ, as synonymous mutations are assumed to be selectively neutral. The fixation probabilities F r;xy are defined in terms of the experimentally measured amino acid preferences at site r, where r;a denotes the preference for amino acid a at site r, and the preferences at each site sum to one (1 ¼ P a r;a ). As in previous work (Bloom 2014), I consider two different mathematical relationships between the amino acid 2754 Bloom . doi:10.1093/molbev/msu220 MBE preferences and the fixation probabilities. However, novel to this work, I consider a generalization of these relationships that allows the stringency of the amino acid preferences to differ between natural sequence evolution and the deep mutational scanning experiments used to measure r;a . Specifically, I take the probability of fixation to depend on r;a À Á where is a free parameter (constrained to have values !0) that scales the stringency of the amino acid preferences. A value of = 1 implies equally stringent preferences in natural evolution and the deep mutational scanning experiments. A value of < 1 corresponds to less stringent preferences in natural evolution than in the deep mutational scanning experiments, whereas a value of 4 1 corresponds to more stringent preferences in natural evolution in the deep mutational scanning experiments. Naively, one might conjecture that will typically have values greater than 1, as laboratory experiments tend to be less stringent than natural evolutionary selection. In the following sections, I will describe testing evolutionary models that constrain = 1 versus models that treat as a free parameter.
With the addition of the stringency parameter , I consider the following relationships between the amino acid preferences and the fixation probabilities. The first relationship derives from considering the amino acid preferences to be directly related to selection coefficients and is a generalization of the ninth equation derived by Halpern and Bruno (1998): Note that in equation (2) the mutation terms in the original equation derived by Halpern and Bruno (1998) are all set to be equal as the calculations of the amino acid preferences r;a from the deep mutational scanning experiments already correct for differences in the mutagenesis rates, whereas Halpern and Bruno (1998) were inferring the preferences from natural sequences and so had to account for the mutation rates during natural evolution. Note also that the ninth equation of Halpern and Bruno (1998) contains a typographical error in the denominator which is corrected in equation (2). Halpern and Bruno (1998) derive equation (2) with = 1 by assuming that the sequences are evolving in the weak-mutation limit, and rigorous application of this relationship with = 1 in the context of the current work requires assuming that the effective population size in the deep mutational scanning experiment is equivalent to that for the natural sequences that are being phylogenetically analyzed. The second relationship is based on considering the amino acid preferences to reflect the fraction of genetic backgrounds that tolerate specific mutations rather than selection coefficients in any one genetic background. Specifically, experiments have shown that mutations that are deleterious in one genetic background can sometimes be neutral (or even advantageous) in a related genetic background (Lunzer et al. 2010;Gong et al. 2013). One reason that mutational effects depend on genetic background is that most proteins are under selection to maintain their overall stability above some critical threshold . This type of threshold selection gives rise to the evolutionary dynamics described in Bloom et al. (2007), where stabilizing mutations are always tolerated but destabilizing mutations are only tolerated in a fraction of genetic backgrounds. In this case, mutations to higher preference (putatively more stabilizing) amino acids will always be able to fix without deleterious effect, but mutations to lower preference (putatively less stabilizing) amino acids are only sometimes tolerated. One way to represent these dynamics is to use a relationship equivalent to the Metropolis et al. (1953) sampling criterion: Both of these relationships (eqs. 2 and 3) share the feature that mutations to higher-preference amino acids fix more frequently than mutations to lower-preference amino acids as long as 4 0.

Mutation Rates
The rate of mutation Q xy from codon x to y is defined in terms of the underlying rates of nucleotide mutation. Let R m!n denote the rate of mutation from nucleotide m to n. Then Assuming that the same mutation process operates on both the sequenced and complementary strands of the nucleic acid gives the constraint where m c denotes the complement of nucleotide m, as for example a mutation from A to G on one strand induces a mutation from T to C on the other strand. There are therefore six unique nucleotide mutation rates: In principle, these six mutation rates could be measured experimentally for the system of interest. In my previous work on influenza nucleoprotein (Bloom 2014), I was able to devise an experimental system for measuring the mutation rates for influenza in cell culture. The mutation rates measured for influenza in this system were roughly symmetric (R m!n ¼ R n!m ), which was sufficient to make the overall evolutionary model in equation (1) reversible. However, in general it is unlikely to be feasible to measure the mutation rates for most systems of interest. Furthermore, it is known that mutation process is AT-biased for many species (Hershberg and Petrov 2010), meaning that in general mutation rates will not be symmetric. Therefore, in general (and also for lactamase specifically) it is necessary to infer the mutation rates from the sequence data without assuming that they are symmetric.
In the absence of any constraints on the six mutation rates given above, the overall evolutionary model defined by equation (1) will not necessarily be reversible. However, it turns out (see Materials and Methods) that placing the following constraint on the mutation rates is sufficient to make the overall evolutionary model reversible: This constraint lacks a biological justification and is assumed purely for the mathematical convenience that it makes the model reversible. Although there is no biological reason to believe that equation (6) actually holds for real evolution, it is possible to give interpretations about what assuming this equation implies about the mutational process. One interpretation is that the probability of mutating from C to G through an intermediate mutation to T is equal to the probability of mutating from C to G through an intermediate mutation to A, as equation (6) implies that Another interpretation is that the AT bias is the same for transitions and transversions, as equation (6) implies In the absence of independent information to calibrate absolute values for the branch lengths or mutation rates, one of the rates is confounded with the branch-length scaling and so can be assigned an arbitrary value greater than 0 without affecting the tree or its likelihood. Here, the choice is made to assign so that all other mutation rates are defined relative to this rate. With these constraints, there are now four independent mutation rates that must be treated as unknown free parameters: unknown mutation rate parameters ¼ In practice, these four mutation rate parameters will be estimated at their maximum-likelihood values given the sequences and tree topology.

Equilibrium Frequencies
Calculation of a phylogenetic likelihood requires assigning evolutionary equilibrium frequencies to the possible codons in addition to specifying the transition probabilities given by equation (1). In many conventional phylogenetic models, these equilibrium frequencies are treated as free parameters that are estimated empirically from the sequence data.
However, in reality the equilibrium frequencies are the result of mutation and selection, and so can be calculated as the stationary state of the stochastic process defined by the evolutionary model. Specifically, it can be shown (see Materials and Methods) that for the evolutionary model in equation (1), the equilibrium frequency p r;x of codon x at site r is where q x is given by where N AT x ð Þ is the number of A and T nucleotides in codon x, the proportionality constant is never needed as q x values always appear in the form of ratios, and the simplification from the first to second line follows from equations (5)-(7). The equilibrium frequencies p r;x are therefore completely determined by knowledge of the experimentally determined amino acid preferences r;a , the mutation rate parameters in equation (8), and the value of the stringency parameter .

Experimentally Determined Amino Acid Preferences for Beta-Lactamase
The site-specific amino acid preferences for beta-lactamase were determined using data from a previously published deep mutational scanning experiment performed by Firnberg et al. (2014). Specifically, Firnberg et al. (2014) created nearly all possible amino acid mutants of TEM-1 beta-lactamase and then used antibiotic selection to enrich for functional variants at various antibiotic concentrations. Next, they used highthroughput sequencing to examine how the frequencies of mutations changed during this functional selection. They analyzed their data to estimate the impact of individual mutations on TEM-1 function, and had sufficient data to estimate the impact of 96% of the 297 Â 19 = 5,453 possible amino acid mutations. Firnberg et al. (2014) report the impact of mutations in terms of what they refer to as the "fitness" effects. Firnberg et al. (2014) calculate these fitness values from their deep mutational scanning experiment in such a way that a mutation's fitness effect is approximately proportional to the highest concentration of antibiotic on which bacteria carrying that beta-lactamase variant are able to grow. Therefore, although the fitness values are not calculated in a true population-genetics framework, they certainly reflect effects of specific mutations on the ability of TEM-1 mutants to function.
Here, I use the fitness values provided by Firnberg et al. (2014) to estimate the preferences for each of the 20 amino Bloom . doi:10.1093/molbev/msu220 MBE acids at each site in TEM-1. Specifically, let w r;a be the fitness value for mutation to amino acid a at site r reported by Firnberg et al. (2014) in supplementary data S2 of their article. I calculate the preference r;a for a at site r as where the sum over a 0 ranges over all 20 amino acids, the wild-type amino acid at site r is assigned a fitness of w r;a ¼ 1 in accordance with the normalization scheme used by Firnberg et al. (2014), and the w r;a values for the 4% of mutations for which no value is estimated by Firnberg et al. (2014) are set to the average w r;a of all nonwild-type amino acids at site r for which a w r;a value is provided. The amino acid preferences calculated in this manner are displayed graphically in figure 1 along with information about residue secondary structure and solvent accessibility (see supplementary file S1, Supplementary Material online, for numerical data). As is extensively discussed by Firnberg et al. (2014) in their original description of the data, these preferences are qualitatively consistent with known information about highly constrained positions in TEM-1, and show the expected qualitative patterns of higher preferences for specific (particularly hydrophobic) amino acids at residues that are buried in the protein's folded structure. Here, I focus on using these amino acid preferences in a quantitative phylogenetic evolutionary model as described in the next section.

TEM and SHV Beta-Lactamase Phylogenetic Trees
To test whether evolutionary models informed by the experimentally determined amino acid preferences are superior to existing alternative models, I compared the fit of various models with beta-lactamase sequence phylogenies. Firnberg et al. (2014) performed their deep mutational scanning on TEM-1 beta-lactamase. There are a large number of TEM beta-lactamases with high sequence identity to TEM-1; the next closest group of lactamases is the SHV beta-lactamases (Bush et al. 1995), which on average have 62% nucleotide and 65% protein identity to TEM beta-lactamases. I assembled a FIG. 1. The amino acid preferences for TEM-1 beta-lactamase, calculated from the data of Firnberg et al. (2014). The heights of letters are proportional to the preference for that amino acid at that position in the protein. Residues are numbered using the scheme of Ambler et al. (1991). Letters are colored according to the hydrophobicity of the amino acid. Bars above the letters indicate the secondary structure and relative solvent accessibility as calculated from the crystal structure in PDB entry 1XPB (Fonz e et al. 1995) using DSSP (Kabsch and Sander 1983;Joosten et al. 2011), with maximum solvent accessibilities taken from Tien et al. (2013). The figure was generated using "WebLogo" (Crooks et al. 2004) integrated into the "mapmuts" software package (Bloom 2014). The data and source code used to create this plot are provided through http://jbloom.github.io/phyloExpCM/example_ 2014Analysis_lactamase.html (last accessed July 28, 2014).

2757
Experimentally Informed Evolutionary Model . doi:10.1093/molbev/msu220 MBE collection of TEM and SHV beta-lactamases from the manually curated Lahey Clinic database (http://www.lahey.org/ Studies/, last accessed July 28, 2014). These sequences were aligned to TEM-1, and highly similar sequences (sequences that differed by less than four nucleotides) were removed. The resulting alignment contained 85 beta-lactamase sequences (supplementary file S2, Supplementary Material online), of which 49 were TEM and 36 were SHV.
Maximum-likelihood phylogenetic trees of the TEM and SHV beta-lactamases were constructed using "codonPhyML" (Gil et al. 2013) with the codon substitution model of either Goldman and Yang (1994) or Kosiol et al. (2007). The resulting trees are displayed in figure 2. The two different substitution models give similar tree topologiesthe Robinson-Foulds distance (Robinson and Foulds 1981) between the trees inferred with the two different models is calculated by RAxML (Stamatakis 2006) to be 0.14. In both cases, the trees partition into two clades of closely related sequences, corresponding to the TEM and SHV betalactamases.

Experimentally Informed Models Are Superior for Combined TEM and SHV Phylogeny
To compare the evolutionary models, HYPHY (Pond et al. 2005) was used to optimize the branch lengths and free parameters of the evolutionary models to their maximumlikelihood values on the fixed tree topologies in figure 2. This analysis showed that the evolutionary models informed by the experimentally determined amino acid preferences were clearly superior to commonly used alternative codonsubstitution models.
Specifically, tables 1 and 2 show that the experimentally informed evolutionary models fit the combined TEM and SHV phylogeny with higher likelihoods than any of a variety of commonly used alternative models, regardless of which tree topology from figure 2 is used. This superiority is despite Phylogenetic trees of TEM (red) and SHV (blue) beta-lactamases inferred using "codonPhyML" (Gil et al. 2013) with the codon substitution model of (A) Goldman and Yang (1994) or (B) Kosiol et al. (2007). The scale bars have units of number of codon substitutions per site. The inferred trees are similar for both models; the distance between the trees computed using the measure of Robinson and Foulds (1981) is 0.14. The TEM and SHV sequences each cluster into closely related clades: The average number of nucleotide and amino acid differences between sequence pairs within these clades is 13 and 7 for the TEM sequences, and 10 and 5 for the SHV sequences. There is extensive divergence between these two clades: The average number of nucleotide and amino acid differences between sequence pairs across the clades is 326 and 100. For both substitution models, a single transition-transversion ratio () and four discrete gamma-distributed nonsynonymous-synonymous ratios (!) were estimated by maximum likelihood. The equilibrium codon frequencies were determined empirically using the CF3x4 method (Pond et al. 2010) for the model of Goldman and Yang (1994), or the F method for the model of Kosiol et al. (2007) The data and source code used to create these trees are provided through http://jbloom. github.io/phyloExpCM/example_2014Analysis_lactamase.html (last accessed July 28, 2014). the fact that the alternative models (Goldman and Yang 1994;Kosiol et al. 2007) contain many more free parameters. For instance, the most heavily parameterized alternative model contains 60 empirically estimated equilibrium frequency parameters plus an optimized parameter corresponding to the transition-transversion ratio, two optimized parameters corresponding to a gamma distribution of nonsynonymoussynonymous ratios across sites (Yang et al. 2000), and an optimized parameter corresponding to a distribution of substitution rates across sites (Yang 1994). In contrast, the experimentally informed models only contain four or five free parameters (the mutation rates and optionally the stringency parameter )-yet these experimentally informed models have substantially higher likelihoods. When AIC (Posada and Buckley 2004) is used to penalize parameters, the superiority of the experimentally informed models is even more clear.
The experimentally informed models are superior to the nonsite-specific models even when the stringency parameter is fixed to one; however, the phylogenetic fit is substantially enhanced by treating as a free parameter (tables 1 and 2).
In all cases, fitting of the stringency parameter yields values of 4 1, indicating that natural evolution is more stringent than the experiments in its preferences for specific amino acids. This result is consistent with the notion that selection during real evolution selection is more sensitive than the typical laboratory experiment.
To confirm that the superiority of the experimentally informed models is due to the fact that the deep mutational scanning of Firnberg et al. (2014) captures information about the site-specific amino acid preferences, I tested evolutionary models in which these preferences were randomized among sites or were set to the average frequencies of all amino acids in the lactamase alignments. In the former case, the r;a values are randomly shuffled among sites, where in the latter case r;a for all values of r is set to the average frequency of amino acid a over the entire lactamase alignment. As can be seen from tables 1 and 2, these nonsite-specific models perform substantially worse than even the simplest versions of the models of Goldman and Yang (1994) and Kosiol et al. (2007). This result shows that the improved performance of the experimentally informed evolutionary models is  (2), free b 0.0 -4,020.6 5 (5 + 0) R A!G = 2.3, R A!T = 0.6, R C!A = 0.8, R C!G = 0.7, b = 1.6 Experimental, equation (2), b = 1 46.4 -4,044.8 4 (4 + 0) R A!G = 2.3, R A!T = 0.6, R C!A = 0.8, R C!G = 0.8 Experimental, equation (3), free b 77.3 -4,059.3 5 (5 + 0) R A!G = 2.2, R A!T = 0.6, R C!A = 0.8, R C!G = 0.7, b = 1.2 Experimental, equation (3), b = 1 85.7 -4,064.5 4 (4 + 0)  (3) (3) 1,264.9 -4,654.1 4 (4 + 0) R A!G = 2.3, R A!T = 0.6, R C!A = 0.9, R C!G = 0.9 Randomized, equation (2) 1,474.5 -4,758.9 4 (4 + 0) R A!G = 2.6, R A!T = 0.6, R C!A = 0.9, R C!G = 1.0 NOTE.-The difference in AIC relative to the best model (smaller ÁAIC indicates better fit), the log likelihood, the number of free parameters, and the values of key parameters are shown in this table. For each model, the branch lengths and model parameters were optimized for the fixed tree topology in figure 2A. The "Experimental" models use amino acid preferences derived from the data of Firnberg et al. (2014) plus four mutation rate parameters (eq. 8) and optionally the stringency parameter . For the "Randomized" models, the experimentally measured amino acid preferences are randomized among sites-these models are far worse as the preferences are no longer assigned to the correct positions. For the "Avg. frequencies" models, the amino acid preferences are identical across sites and are set to the average frequency of that amino acid in the entire lactamase sequence alignment-these models are also far worse than the experimentally informed models, as they do not utilize site-specific information. Fitting the stringency parameter to a value of 4 1 improves the fit of the experimentally informed models by enhancing the importance of the site-specific amino acid preferences. Fitting the stringency parameter to a value of < 1 improves the fit of the randomized and Avg. frequencies model by effectively equalizing the preferences across amino acids. "GY94" denotes the model of Goldman and Yang (1994) with nine equilibrium frequency parameters calculated using the CF3x4 method (Pond et al. 2010). "KOSI07+F" denotes the model of Kosiol et al. (2007) with 60 equilibrium frequency parameters calculated using the F methods. All variants of GY94 and KOSI07+F have a single transitiontransversion ratio () estimated by maximum likelihood. Different model variants either have a single nonsynonymous-synonymous ratio (!) or values drawn from four discrete gamma-distributed categories (Yang et al. 2000), and either a single rate or rates drawn from four discrete gamma-distributed categories (Yang 1994). The data and source code used to generate this table are provided through http://jbloom.github.io/phyloExpCM/example_2014Analysis_lactamase.html (last accessed July 28, 2014).

2759
Experimentally Informed Evolutionary Model . doi:10.1093/molbev/msu220 MBE overwhelmingly due to incorporation of information on the site-specific amino acid preferences rather than better modeling of the mutational process.

Experimentally Informed Models Are Superior for Individual TEM and SHV Phylogenies
The foregoing results show that experimentally informed models are superior for describing the combined TEM and SHV beta-lactamase phylogeny. Given that the amino acid preferences were determined by experiments using a TEM-1 parent, it is worth asking whether these preferences accurately describe the evolution of both the TEM and SHV sequences, or whether they more accurately describe the TEM sequences (which are closely related to TEM-1; fig. 2) than the SHV sequences (which only have about 65% protein identity to TEM-1; fig. 2). This question is relevant because the extent to which site-specific amino acid preferences are conserved during protein evolution remains unclear. For instance, although several experimental studies have suggested that such preferences are largely conserved among moderately diverged homologs (Serrano et al. 1993;Ashenberg et al. 2013), a simulation-based study has argued that preferences shift substantially during protein evolution (Pollock et al. 2012;Pollock and Goldstein 2014). If the site-specific amino acid preferences are largely conserved during the divergence of the TEM and SHV sequences, then the experimentally informed models should work well for both these groups-but if the preferences shift rapidly during evolution, then the experimentally informed models should be effective only for the closely related TEM sequences.
To test these competing possibilities, I repeated the analysis in the foregoing section separately for the TEM and SHV clades of the overall phylogenetic tree (the red vs. blue clades in fig. 2). This analysis found that the experimentally informed evolutionary models were clearly superior to all alternative models for the SHV as well as the TEM clade (tables 3-6). In fact, the extent of superiority of the experimentally informed model (as quantified by AIC) was greater for the SHV clade than the TEM clade, despite the fact that the former has fewer sequences. These results suggest that the applicability of the experimentally determined amino acid preferences extends to beta-lactamase homologs that are substantially diverged from the TEM-1 parent that was the specific subject of the experiments of Firnberg et al. (2014).

The Stringency Parameter Generally Improves Experimentally Informed Models
The results in the foregoing sections show that use of the stringency parameter improves the phylogenetic fit of the experimentally informed models of lactamase evolution. Previous work (Bloom 2014) has shown that an experimentally informed evolutionary model without a stringency parameter ( = 1) improves the phylogenetic fit to influenza nucleoprotein sequences relative to nonsite-specific models. To test whether the fitting of a stringency parameter further improves the experimentally informed evolutionary model of nucleoprotein, I repeated the analysis of Bloom (2014) but also included a model variant in which was fit by maximum likelihood.  Kosiol et al. (2007) Rather than That of Goldman and Yang (1994).

ÁAIC
NOTE.-This table differs from table 4 in that the phylogenetic fit is to the SHV sequences using the blue portion of tree topology in figure 2B rather than the blue portion of the tree topology in figure 2A.  Kosiol et al. (2007) Rather than That of Goldman and Yang (1994).

2762
Bloom . doi:10.1093/molbev/msu220 MBE from the fact that deep mutational scanning experiments are generally less stringent than real evolution, and so values of 4 1 help scale the experimentally determined site-specific amino acid preferences to a stringency more reflective of those that constrain real evolution.
Experimentally Informed Models Are Slightly Better for Many Sites The results described above and in Bloom (2014) demonstrate that experimentally informed evolutionary models improve phylogenetic relative to nonsite-specific models when analyzing the entire gene sequences of lactamase or nucleoprotein. It is interesting to investigate which sites are described more accurately by the experimentally informed models. This can be done by examining the differences in per-site likelihoods between models after fixing the branch lengths and model parameters to their maximum-likelihood values for the entire gene. Figure 3 compares the per-site likelihoods for the best experimentally informed evolutionary model for lactamase and for nucleoprotein relative to the best nonsite-specific model for each of these genes (using the models from tables 1 and 7). For both lactamase and nucleoprotein, the experimentally informed models lead to small improvements for many sites. Overall, 72% (207 of 286) lactamase sites are described better by the experimentally informed model (supplementary file S3, Supplementary Material online), and 82% (407 of 498) of nucleoprotein sites are described better by the experimentally informed model (4). There appears to be a slight trend for improvements due to the experimentally informed models to be most common for buried sites, but the experimentally informed models lead to small improvements for sites spanning a range of solvent accessibilities and secondary structures.
Interestingly, for both genes there are also a few sites for which the experimentally informed models are far worse than the nonsite-specific models. Presumably, these sites are modeled poorly because the preferences determined by the deep mutational scanning experiments do not accurately capture the real preferences of natural evolution. Such discrepancies could arise from either a failure of the deep mutational scanning experiments to fully reflect natural selection pressures or epistatic interactions that strongly shift the site preferences during natural evolution away from those measured for the parent gene by the experiments. I was unable to observe any obvious features of the specific sites that were described poorly by the experimentally informed models (supplementary files S3 and S4, Supplementary Material online). It therefore remains an open question why certain sites are described very poorly by the experimentally informed models even though the vast majority of sites are described better by these models.

Comparison of Different Methods for Computing Fixation Probabilities
In the foregoing analyses, two different mathematical relationships were used to mathematically relate the experimentally determined amino acid preferences to the substitution probabilities in the evolutionary models. One relationship (eq. 2) is based on a true population-genetics derivation by Halpern and Bruno (1998) under the assumption that the preferences are reflective of selection coefficients for amino acids at specific sites (an assumption that would only be strictly true only in the unlikely case that individual sites in a gene contribute independently to fitness). The other relationship (eq. 3) is a more ad hoc one that I suggested in previous work (Bloom 2014) on the grounds that the amino acid preferences might be best envisioned not as selection coefficients, but rather as measurements of the  (3), b = 1 391.8 -12,341.1 0 (0 + 0) None GY94, gamma x, gamma rates 1,453.0 -12,858.7 13 (4 + 9) j = 5.9, x shape = 0.2, mean x = 0.2, rate shape = 2.5 GY94, gamma x, one rate 1,616.0 -12,941.2 12 (3 + 9) j = 5.6, x shape = 0.2, mean x = 0.2 KOSI07+F, gamma x, gamma rates 1,845.6 -13,004.0 64 (4 + 60) j = 0.2, x shape = 0.2, mean x = 0.9, rate shape = 1.8 GY94, one x, gamma rates 1,884.1 -13,075.3 12 (3 + 9) j = 5.9, x = 0.1, rate shape = 2.0 GY94, one x, one rate 2,153.2 -13,210.8 11 (2 + 9) j = 5.5, x = 0.  (2014) except including the additional models listed here, which include a model with a stringency parameter. With the exception of the stringency parameter, the experimentally informed evolutionary model used here is derived entirely from experimental measurements, as both the mutation rates and amino acid preferences were measured in Bloom (2014). Only the model using fixation probabilities calculated from equation (3) is reported, as Bloom (2014) shows that this is the best model for influenza nucleoprotein. The data and source code used to generate this table are available through http://jbloom.github.io/phyloExpCM/example_2014Analysis_Influenza_NP_Human_1918_Descended_withbeta.html (last accessed July 28, 2014).

2763
Experimentally Informed Evolutionary Model . doi:10.1093/molbev/msu220 MBE fraction of genetic backgrounds that tolerate a specific mutation, as would be implied by the evolutionary dynamics described in Bloom et al. (2007). Although both relationships share the qualitative feature that mutations to higher-preference amino acids are favored over mutations to lower-preference ones, they differ in their quantitative details. In previous work on influenza nucleoprotein (Bloom 2014), I reported that the relationship in equation (3) outperformed the one in equation (2) derived by Halpern and Bruno (1998). In contrast, for the beta-lactamase sequences studied here, the relationship of Halpern and Bruno (1998) outperforms the one in equation (3) (tables 1-6). The reason for and relevance of these discordant results remain unclear. There are almost certainly differences in the evolutionary conditions (population size, degree of polymorphism, etc.) for influenza nucleoprotein and beta-lactamase that influence the relationship between selection coefficients and fixation probabilities. In addition, there are substantial differences between the experiments of Firnberg et al. (2014) on beta-lactamase and my previous work on nucleoprotein-in particular, Firnberg et al. (2014) examine the effects of single mutations to the parental gene, whereas the nucleoprotein experiments examined the average effects of individual mutations in variants that often contained multiple mutations. Finally, the experimental measurements are imperfect-for nucleoprotein, the preferences determined by independent biological replicates of the experiments only had a Pearson correlation coefficient of 0.79; Firnberg et al. (2014) do not provide data on the consistency of their measurements across biological replicates, but it seems safe to assume that their experiments are also imperfect. Therefore, further work is probably needed to determine if any meaning can be ascribed to the differences in fit for equation (2) versus equation (3), as well as to identify the optimal mathematical relationship for connecting experimentally measured amino acid preferences to substitution probabilities in evolutionary models. However, both the results presented here and in Bloom (2014) strongly suggest that using any reasonable mathematical relationship to inform evolutionary models with experimentally determined amino acid preferences is sufficient to lead to dramatic improvements in phylogenetic fit.  table 7 versus the  best GY94 variant in table 7. The experimentally informed model has a higher log likelihood for 82% of sites. For both genes, the per-site likelihoods were computed after fixing the model parameters and branch lengths to their maximum-likelihood values for the entire gene. Sites are classified in terms of their relative solvent accessibility or secondary structure as computed using DSSP (Kabsch and Sander 1983;Joosten et al. 2011) from PDB structures 1XPB (Fonz e et al. 1995) or 2IQH (Ye et al. 2006), normalizing solvent accessibilities to the values provided by Tien et al. (2013). The per-residue numerical data are in supplementary files S3 and S4, Supplementary Material online. The code and data used to create this figure are provided through http://jbloom.github.io/phyloExpCM/example_2014Analysis_lactamase.html and http://jbloom.github.io/phyloExpCM/example_2014Analysis_ Influenza_NP_Human_1918_Descended_withbeta.html (last accessed July 28, 2014).
It is also interesting to speculate on the precise meaning of the stringency parameter . According to populationgenetics theory, the strength of selection increases with effective population size. It is therefore tempting to interpret as reflecting differences in the effective population size in the deep mutational scanning experiments relative to natural evolution. Indeed, one could imagine even attempting to use the inferred value of to indirectly quantify effective population size. However, it is important to temper this tantalizing possibility with the reminder that the fixation probabilities over the entire phylogeny can be related to the fitness effects of specific mutations only under the unrealistic assumption that the fitness contributions of mutations at different sites are entirely independent, as these probabilities reflect the effect of a mutation averaged over all genetic backgrounds in the phylogeny. So although the fact that using 4 1 consistently improves phylogenetic does indicate that natural evolution is more stringent in its site-specific amino acid preferences than the experiments, the exact populationgenetics interpretation of is unclear.

Discussion
I have shown that an evolutionary model informed by experimentally determined site-specific amino acid preferences fits beta-lactamase phylogenies better than a variety of existing models that do not utilize site-specific information. When considered in combination with prior work demonstrating that an experimentally determined evolutionary model dramatically improves phylogenetic fit for influenza nucleoprotein (Bloom 2014), these results suggest that experimentally informed models are generally superior to nonsite-specific models for phylogenetic analyses of protein-coding genes. The explanation for this superiority is obvious: Proteins have strong preferences for certain amino acids at specific sites (Halpern and Bruno 1998;Ashenberg et al. 2013) which are ignored by nonsite-specific models. The use of experimentally measured site-specific amino acid preferences improves evolutionary models by informing them about the complex selection that shapes actual sequence evolution. Interestingly, inclusion of a parameter that allows the stringency of site-specific amino acid preferences during natural evolution to exceed those measured by experiments further improves phylogenetic fit, suggesting that deep mutational scanning experiments remain less sensitive than actual natural selection.
I have not compared the experimentally informed evolutionary models with more recent site-specific models that infer aspects of the substitution process from the sequence data itself (Lartillot and Philippe 2004;Le et al. 2008;Wang et al. 2008;Rodrigue et al. 2010;Wu et al. 2013). Such a comparison is desirable as these site-specific models will almost certainly compare more favorably with the experimentally informed models used here. Unfortunately, such a comparison is sufficiently technically challenging to be beyond the scope of this work, as it requires comparing between maximum-likelihood and Bayesian approaches. In any case, the experimentally informed evolutionary models described here should not be viewed purely as competitors for existing site-specific substitution models. Rather, one can imagine future approaches that integrate the results of a deep mutational scanning experiment with additional sitespecific details inferred from natural sequence data to create even more nuanced evolutionary models.
An advantage of experimentally informed evolutionary models is that they naturally lend themselves to interpretation in terms of quantities that can be directly related to both specific biochemical measurements and the underlying processes of mutation and selection. This stands in contrast to most existing models, which are phrased in terms of heuristic parameters (such as codon "equilibrium frequencies") that reflect the combined action of mutation and selection and are not accessible to direct experimental measurement. Experimentally informed evolutionary models therefore have the potential to facilitate connections between the phylogenetic substitution processes and the underlying biochemistry and population genetics of gene evolution (Halpern and Bruno 1998;Thorne et al. 2007).
The major drawback of experimentally informed models is their more limited scope. Most existing codon-based evolutionary models can be applied to any gene (Goldman and Yang 1994;Muse and Gaut 1994;Kosiol et al. 2007)-but experimentally informed models require experimental data for the gene in question. However, this requirement may not be as crippling as it initially appears. The first experimentally determined evolutionary model for influenza nucleoprotein required direct measurement of both the site-specific amino acid preferences and the underlying mutation rates (Bloom 2014). However, the model presented here only requires measurement of the amino acid preferences, as the mutation rates are treated as free parameters. Rapid advances in the experimental technique of deep mutational scanning are making such data available for an increasing number of proteins (Fowler et al. 2010;McLaughlin Jr et al. 2012;Traxlmayr et al. 2012;Melamed et al. 2013;Roscoe et al. 2013;Starita et al. 2013;Bloom 2014;Firnberg et al. 2014).
In this respect, it is encouraging that the site-specific amino acid preferences determined experimentally for TEM-1 improve phylogenetic fit to substantially diverged (35% protein sequence divergence) SHV beta-lactamases as well as highly similar TEM beta-lactamases. As discussed in the Introduction, there are two major limitations to most existing evolutionary models: They treat sites identically, and they treat sites independently. Experimentally informed evolutionary models of the type described here have the potential to completely remedy the first limitation as deep mutational scanning defines site-specific selection with increasing precision. However, such models still treat sites independently-and this limitation will never be completely overcome by experiments, as the unforgiving math of combinatorics means that no experiment can examine all arbitrary combinations of mutations (e.g., TEM-1 has only 5,453 single amino acid mutants, but it has 14,815,801 double mutants, 26,742,520,805 triple mutants, and over 10 13 quadruple mutants). The utility of experimentally informed evolutionary models therefore depends on the extent to which site-specific

Equilibrium Frequencies and Reversibility
Here I show that the evolutionary model defined by equation (1) is reversible (satisfies detailed balance), and has p r;x defined by equation (9) as its equilibrium frequency.
First, note that the fixation probabilities F r;xy defined by both equations (2) and (3) as can be verified by direct substitution. This relationship means that if all codon interchanges were equally likely (all Q xy values are equal), then the equilibrium frequency p r;x of codon x would simply be proportional to the stringency-adjusted preference r;A x ð Þ À Á for the encoded amino acid. However, in practice all codon interchanges are not equally likely, so the actual equilibrium frequencies p r;x will also depend on the mutation rate parameters R m!n listed in equation (8). This dependence is given by the q x terms in equation (9). These q x terms can be thought of as the expected equilibrium frequencies of the codons in a hypothetical situation in which there is no selection and all 64 codons are equally fit (all F r;xy values are equal). In other words, the q x terms define the stationary state of the reversible stochastic process defined by the mutation rates Q xy . In order to show q x given equation (10) defines this stationary state, it is necessary to show that There are up to 12 possible types of single-nucleotide mutations that can be made to a codon x to create a different codon y (three possible mutations to each of the four possible nucleotides); however, only four of these types of mutations require independent verification of equation (13). Specifically, the possible types of mutations that require independent verification of equation (13) are when x differs from y by a mutation of A ! T, of C ! G, of A ! C, or of A ! G. The other eight types of mutations do not have to be verified because they are equivalent to one of these first four cases: C ! A is equivalent to A ! C by symmetry (i.e., interchange of the labels of codons x and y), G ! A is equivalent to A ! G by symmetry, T ! A is equivalent to A ! T by symmetry, G ! C is equivalent to C ! G by symmetry, T ! G is equivalent to A ! C because of equation (5), T ! C is equivalent to A ! G because of equation (5), G ! T is equivalent to C ! A by equation (5) and then to A ! C by symmetry, and C ! T is equivalent to G ! A by equation (5) and then A ! G by symmetry. So below I verify equation (13) for the four independent types of mutations, full verifying equation (13). The first case is where x differs from y by a mutation of A ! T, such that Q xy ¼ R A!T and Q yx ¼ R T!A . In this case, we have where the first line substitutes the definitions of equations (10) and (4), the second line follows from equation (5) and the fact that N AT x ð Þ ¼ N AT y ð Þ if x and y differ only by an A ! T mutation, and the final line again substitutes the definitions of equations (10) and (4).