The ability to engineer enzymes and other proteins to any desired stability would have wide-ranging applications. Here, we demonstrate that computational design of a library with chemically diverse stabilizing mutations allows the engineering of drastically stabilized and fully functional variants of the mesostable enzyme limonene epoxide hydrolase. First, point mutations were selected if they significantly improved the predicted free energy of protein folding. Disulfide bonds were designed using sampling of backbone conformational space, which tripled the number of experimentally stabilizing disulfide bridges. Next, orthogonal in silico screening steps were used to remove chemically unreasonable mutations and mutations that are predicted to increase protein flexibility. The resulting library of 64 variants was experimentally screened, which revealed 21 (pairs of) stabilizing mutations located both in relatively rigid and in flexible areas of the enzyme. Finally, combining 10–12 of these confirmed mutations resulted in multi-site mutants with an increase in apparent melting temperature from 50 to 85°C, enhanced catalytic activity, preserved regioselectivity and a >250-fold longer half-life. The developed Framework for Rapid Enzyme Stabilization by Computational libraries (FRESCO) requires far less screening than conventional directed evolution.
Metabolic engineering and industrial biocatalysis increasingly need protein engineering of enzymes to provide the desired catalytic properties, such as regio- and stereospecificity, resulting in high product yields and low losses to side products. For applied biocatalysis, an ideal enzyme has a long shelf life and is stable under practical process conditions, which often includes high temperatures that are needed to solubilize substrates and prevent microbial contamination. Mutations that provide a gain of function often decrease stability, and more than a few of such mutations in a mesostable enzyme result in the loss of folding and expression (Bloom et al., 2006; Besenmatter et al., 2007; Tokuriki and Tawfik, 2009). To improve protein function by mutagenesis, thermostable starting points are preferred, but these are often not available from natural biodiversity. For these reasons, methods to improve the stability of enzymes and other proteins are highly relevant (Schmid et al., 2001; Eijsink et al., 2004; Bommarius et al., 2006; Bornscheuer et al., 2012).
Unless thermostability is associated with reversible unfolding, it is difficult to stabilize a protein by site-directed mutagenesis. For proteins that do unfold reversibly, there is an equilibrium between the folded and unfolded states and the effects of mutations on stability can be modeled relatively accurately. Computational design can produce highly stabilized variants of such model proteins (Malakauskas and Mayo, 1998; Borgo and Havranek, 2012). However, for most proteins inactivation is essentially irreversible, often triggered by an initial unfolding of a particular region of the protein (Eijsink et al., 2005). Also, due to the kinetically complicated mechanisms involved, the effects of mutations on stability are hard to predict (Eijsink et al., 2004; Polizzi et al., 2007). For typical proteins, these complications make it challenging to engineer major stability increases.
Existing protein stabilization strategies normally yield only 2–15°C increase in thermostability of enzymes (Williams et al., 1999; Vazquez-Figueroa et al., 2007; Wijma et al., 2013), which is very modest when compared with the ranges of thermostability observed in natural enzymes. Directed evolution can be applied to improve the stability of enzymes by introducing (random) mutations in the coding gene and screening large libraries (typically ≥104 variants) to find the rare mutations that improve thermostability (Giver et al., 1998; Bommarius et al., 2006; Reetz et al., 2006; Turner, 2009). A serious shortcoming of such a random approach is that it can only be applied to enzymes for which high-throughput expression and activity screens are available. Other methods to stabilize enzymes are consensus design (Lehmann and Wyss, 2001; Bommarius et al., 2006), rational protein engineering (Eijsink et al., 2004), the creation of chimeric enzymes (Romero et al., 2013) and computational design (Korkegian et al., 2005; Gribenko et al., 2009; Joo et al., 2011). The number of stabilizing mutations that are introduced is usually rather low and currently none of these methods work well enough to reliably achieve a large stability increase of a target enzyme.
Here, we present a strategy aimed at dramatically improving the thermostability of an enzyme by computational design. Our idea was that existing protein stabilization methods are mainly limited by their inability to find more than a few stabilizing mutations. Thus, a more successful stabilization method should generate many stabilizing mutations in a short time. The developed stabilization procedure (Scheme 1) employs computational methods to predict a large number of independent stabilizing mutations. Subsequent in silico screening steps eliminate chemically unreasonable mutations as well as mutations which increase protein flexibility. This reduces the number of variants that need to be screened in vitro. The selected mutations are tested experimentally, and the most stabilizing mutations are combined to obtain a highly thermostable enzyme variant. This strategy is referred to as Framework for Rapid Enzyme Stabilization by Computational libraries (FRESCO, Scheme 1).
To explore this strategy, limonene epoxide hydrolase (LEH) from Rhodococcus erythropolis DCL14 was selected as it is a target for protein engineering, aimed at improving its applicability in the production of chiral building blocks (Zheng and Reetz, 2010). Further efforts to engineer the substrate specificity would benefit from the availability of a thermostable enzyme (Bloom et al., 2006; Besenmatter et al., 2007; Tokuriki and Tawfik, 2009), but the TMapp of wild-type (WT) LEH is only 50°C. We show that, by applying the described FRESCO strategy, it is possible to produce extremely stabilized enzyme variants (TMapp +35°C) that are still fully functional.
The relative changes in folding free energy ΔΔGFold due to point mutations and the 3D structures of the corresponding mutant enzymes were predicted with FoldX (foldx.crg.es) (Guerois et al., 2002) and with Rosettaddg (www.rosettacommons.org) (Kellogg et al., 2011) on the basis of the known LEH X-ray structure 1NWW (Arand et al., 2003). The predicted ΔΔGFold equals the ΔGFold for the protein carrying the point mutation minus the ΔGFold for the WT protein. For FoldX, the standard settings of the software, which had been optimized on a large test set, were used, except that the calculation was repeated five times to obtain a better averaging. For Rosettaddg, we used the algorithm described by Kellogg et al. (2011) which includes repacking within 8 Å of the mutated residue using a soft-repulsion energy function (options –ddg::local_opt_only true –ddg::opt_radius 8.0 –ddg::weight_file soft_rep_design -ddg::iterations 50 -ddg::min_cst false -ddg::mean true -ddg::min false -ddg::sc_min_only false -ddg::ramp_repulsive false). To avoid mutations that are likely to interfere with catalysis or substrate binding, only residues that were >10 Å away from the active-site-bound heptamide ligand in 1NWW (Arand et al., 2003) were allowed to mutate.
Selection of potentially stabilizing mutations was based on the following two criteria. Any substitution would be selected if its predicted ΔΔGFold was <−5 kJ mol−1, which corresponds to the approximate error (3.3 kJ mol−1 in ΔΔGFold predictions with FoldX (Guerois et al., 2002). For Rosettaddg, no error was reported (Kellogg et al., 2011), but since the correlation coefficients with experimental data were similar to those reported for FoldX, we assumed that Rosettaddg has a similar error. If the mutation had no significant effect, i.e. its predicted ΔΔGFold was in the range of −5 to +5 kJ mol−1, then it was still selected if it belonged to one of the following types of mutations that are often observed to be stabilizing, i.e. XXX→Arg (Mrabet et al., 1992; Kumar et al., 2000; Sokalingam et al., 2012), XXX→Pro, Gly→XXX (Nosoh and Sekiguchi, 1991). However, none of these −5 to +5 kJ mol−1 selected variants were experimentally stabilizing (see Fig. 1A and B).
The newly written Dynamic Disulfide Discovery (DDD) algorithm uses for the design of disulfide bonds an ensemble of structures that are the snapshots from a molecular dynamics (MD) simulation. For all input structures, the algorithm iteratively searches for residues that are within 7 Å but more than 15 positions away in the primary sequence. If such a neighboring residue is found, the algorithm introduces multiple initial geometries of disulfide bonds with dihedrals θ1 for both donor and acceptor cysteine of −60°, 60°, and 180° (thus nine different combinations). These starting structures are energy minimized with fixed backbone atoms, and the resulting structure is analyzed for molecular mechanics energy of the sulfur atoms (to eliminate unnaturally strained disulfide bonds) and geometry (to eliminate disulfide bonds with uncommon geometries, criteria described below). If a disulfide bond passes all these criteria, it is selected.
The geometric criteria for disulfide bonds (Supplementary Fig. S1, Table SI) were selected by us based on the geometries observed for a large set of disulfide bonds in the protein data bank (Petersen et al., 1999; Pellequer and Chen, 2006). An energy criterion for the maximal molecular mechanics energy of the disulfide bond (10 kJ mol−1) was adopted, which in a test set appeared to identify most of the existing disulfide bonds. For the developed algorithm, 12 out of the 14 disulfide bonds in a small test set consisting of X-ray structures 1CC5, 1CPO, 1CRN, 1HNF, 1HXN, 1QBA, 1RLR and 2LBP were acceptable. While adopting more lenient criteria would allow acceptance of all the existing disulfide bonds in the dataset, such relaxed criteria are also expected to result in a higher fraction of false positives.
MD simulations were carried out to predict the backbone flexibility of WT and designed variants after designs with clearly unreasonable features, expected to destabilize the protein (Nosoh and Sekiguchi, 1991), were filtered out by visual inspection. The encountered unreasonable features are quantified in the Results section. Simulations were carried out under Yasara with the Yamber3 force field, which is an Amber ff99 (Wang et al., 2000) derivative that has been specifically parameterized for increased structural accuracy (Krieger et al., 2004). A rectangular simulation box was used (with periodic boundary conditions, extended 7.5 Å around the protein fully solvated in explicit water with sodium chloride counter ions added to a concentration of 0.5%). Long-range (>7.86 Å) electrostatic interactions were modeled with a particle mesh Ewald algorithm (Essmann et al., 1995) with fourth degree B-spline functions. To remove clashes and conformational strain, an energy minimization was carried out before each MD simulation. This energy minimization was continued until the total energy decreased by <0.05 kJ mol−1, which was tested every 400 fs. The time step during the simulations was 1.25 fs with the electrostatics and Lennard–Jones interactions updated once every two time steps. All MD simulations started with an energy-minimized structure, which was heated from 5 to 298 K in 30 ps. MD simulations were started with the original crystal water present. A Berendsen thermostat was used to control the temperature (Berendsen et al., 1984) under an NPT ensemble (number of particles, pressure, and temperature are constant). The time constant τ of the temperature coupling was set to 0.1 ps. To improve reproduction of the canonical ensemble, modifications to the Berendsen thermostat described elsewhere (Krieger et al., 2004) were used.
To analyze a large number of variants for structural flexibility, five MD simulations with different initial atom velocities (Caves et al., 1998) of 100 (for the individual mutants) or 1000 ps (for the combined mutants) were carried out. The predicted flexibility of the enzyme by MD simulation depends on the initial velocities assigned at the start of the MD simulation; if different velocities are assigned initially, a different trajectory is observed (Caves et al., 1998). This provides a better sampling of conformations than a single long MD simulation, even if sub-trajectories are only 100 ps long (Caves et al., 1998). The root mean square fluctuation (RMSF) obtained from 5 of such 100 ps MD simulations correlated well with those from the X-ray structures [Supplementary Fig. S2A and B, the RMSF of the crystal structures were calculated from their B-factors, with the standard equation RMSF = √(3× B-factor/8π2)]. However, the changes in flexibility due to mutations were difficult to detect from the RMSF (Supplementary Fig. S2C and D), while they could more easily be obtained from structural inspection of the simulated protein (Supplementary Fig. S3). All the structural flexibility effects were analyzed by inspecting the effect of the mutations on the averaged structures obtained from the five different trajectories per variant (Supplementary Fig. S3). If one out of the five MD simulations appeared to sample a very different part of the protein conformational space than the other four, it was ignored (Supplementary Fig. S4). Removing these outliers enabled to compare different variants (Supplementary Fig. S3) because otherwise a protein variant that had such an outlier appeared to be much more flexible. For example, initially such an outlier in the simulation of the WT protein (Supplementary Fig. S4) made it appear during the structural inspection as if almost all the mutants were significantly more rigid than the WT.
A detailed account of all experimental methods is provided in the Supplementary data. A plasmid containing the gene for the LEH was kindly provided by Prof. Dr. M. Arand (University of Zurich). Mutants of LEH were created by QuikChange mutagenesis (Agilent, USA) in a 96-well plate. Most enzyme variants, including those with multiple disulfide bonds, were expressed in Escherichia coli TOP10 cells (Life Technologies, CA, USA). Only variants with single disulfide bonds were expressed in E.coli NEB Shuffle Express (New England Biolabs, USA). The mutations were validated by DNA sequencing. Purification was carried out on Ni–NTA columns with a C-terminal hexa-histidine tag using cell lysate prepared from cells grown in 1 l of Terrific Broth. If needed, further purification was carried out by gel filtration (Supradex 75, Life technologies, UK). This procedure yielded around 50 mg/l protein for variants without disulfide bonds, and 5–10 mg protein per liter culture volume for disulfide variants. Proteins containing disulfide bonds are usually less well expressed in the cytoplasm of E.coli (Marco de, 2009).
To determine the TMapp of variants during screening, the thermofluor method was carried out essentially as reported elsewhere (Ericsson et al., 2006). The analyzed samples consisted of either purified protein or cell-free extract for screening of point mutations. For measurements, 5 μl 100× diluted commercial Sypro Orange solution (Life Technologies, CA, USA) was added to a 20 μl protein sample. The apparent melting temperature (TMapp) was determined by heating the samples from 25 to 90°C at 1°C/min in a MyiQ real-time PCR machine (Bio-Rad, Hercules, CA, USA) while recording the fluorescence with a 490 nm excitation filter and a 575 nm emission filter (Ericsson et al., 2006). The maximum of the relative fluorescence change with respect to the temperature (dRFU/dT) was taken as the apparent melting temperature (TMapp).
The presence of inter-subunit disulfide bonds was analyzed by examining the migration patterns of the WT and mutant proteins by sodium dodecyl sulfate–polyacrylamide gel electrophoresis (SDS–PAGE), both under reducing and non-reducing conditions, since reduction of disulfide bonds should cause a shift in migration behavior. To determine both inter- and intramolecular disulfide bonds, the number of free cysteines, which are not engaged in disulfide bonds, was determined using Ellman's reagent [5,5′-dithio-bis-(2-nitrobenzoic acid)] (Ellman, 1959).
Catalytic activities were measured in 4.5 ml potassium phosphate (50 mM, pH 7.1), and the enzyme was pre-incubated at 30°C for 5 min before adding (4R)-limonene 1,2-epoxide (mixture of (1R,2S,4R) and (1S,2R,4R) isomers) as the substrate. After different incubation times, the reaction mixtures were extracted with ethyl acetate, centrifuged and the organic layers were removed and dried by Na2SO4. The production of diasteromers was analyzed by chiral GC, using a Hydrodex β-TBDAc column (Aurora Borealis, The Netherlands), with a temperature program from 40 to 100°C at 10°C/min, 100–150°C at 5°C/min and finally 150–180°C at 1°C/min. The retention times of (1R,2R,4R)-limonene diol and (1S,2S,4R)-limonene diol under these conditions were 27.5 and 29.6 min, respectively. A reference sample with both these diols was prepared (Wang et al., 2008). The nearly exclusive production of the (1S,2S,4R)-limonene diol was confirmed by 1H-NMR (200 MHz), using the same peak assignments as reported earlier (Blair et al., 2007).
To measure residual catalytic activity versus temperature, enzyme samples (1.0 mg ml−1 in 50 mM potassium phosphate, pH 7.1) were incubated for 15 min at the desired temperatures using a peqSTAR gradient polymerase chain reaction heating block (Peqlab Biotechnologie GmbH, Erlangen, Germany). Samples were allowed to cool down to 4°C for 1 min, and subsequently their catalytic activity was analyzed.
Computational design of enriched libraries
The first step in the FRESCO strategy for engineering protein stability consists of the computational design of potentially stabilizing point mutations and disulfide bonds (Scheme 1). Point mutations were selected by computational tools that predict the resulting change in ΔG of folding (ΔΔGFold). The ΔΔGFold values were calculated with both Rosettaddg and FoldX since the underlying algorithms gave significantly different predictions (Supplementary Fig. S5), resulting in different selected mutations. All residues were allowed to mutate, except those inside or near the active site. Of all 1634 evaluated point mutations, 248 were selected either because they were predicted to decrease the ΔΔGFold <−5 kJ mol−1 or because they introduced a known type of stabilizing point mutation, such as those introducing a proline (see the Methods section for criteria). Of these 248 point mutations, 48% was predicted to be stabilizing only by Rosettaddg, 26% only by FoldX and 25% by both algorithms.
Disulfide bonds were designed employing the newly written DDD algorithm, which uses MD simulations to sample backbone conformational space. Without this sampling of backbone conformational space (i.e. using only the X-ray structure), seven disulfide bonds were predicted to be stabilizing (Supplementary Table SII). The sampling of different backbone positions by the MD simulation (Supplementary Fig. S6) resulted in the design algorithm recognizing an additional 21 possible disulfide bonds, providing a total of 28 potentially stabilizing disulfide bonds (Supplementary Table SII).
In the second step (Scheme 1), 130 of the point mutants (52%) were eliminated because structural inspection revealed features that are typically encountered with destabilizing mutations (Nosoh and Sekiguchi, 1991). A control experiment described below confirmed that this step enriches for stabilizing mutations. Furthermore, such visual inspection to filter out the unreasonable variants is commonly employed in computational protein design (Kiss et al., 2013; Wijma and Janssen, 2013). Here, the main reasons to eliminate point mutants were that a hydrophobic side chain became surface exposed (70%) or that an unsatisfied H-bond donor or acceptor was created (20%). Furthermore, all 16 point mutations (12%) of Pro23 were eliminated because it appeared that the calculations erroneously predicted Pro23 to be unfavorable for folding. Other reasons to eliminate variants were because a proline was introduced inside an α-helix (4%) or because a hydrophobic protein cavity was created (2%). The sum is >100% because for 8% of the eliminated point mutants multiple elimination criteria applied. Of the disulfide bonds, five (18%) were eliminated because they created a large hydrophobic cavity. This left 118 point mutations (36% from FoldX, 32% from Rosettaddg, 31% from both) and 23 disulfide bonds.
MD simulations on the surviving point mutations and disulfide bond variants (Scheme 1, third step) were used to select against variants with increased local flexibility relative to the WT, because regions of increased flexibility in a protein are more prone to (partial) unfolding leading to inactivation (Vihinen, 1987). A control experiment described below confirmed that this step eliminated destabilizing mutations. From these MD simulations, it appeared that some of the designed variants had a significantly lower local flexibility than the WT (Supplementary Fig. S7), which suggests increased stability. With the MD simulations, 54% of the 118 variants were eliminated, which reduced the number of variants that were predicted to be stabilized to 64. This included 17 disulfide bonds and 47 point mutants of which 21 originated from FoldX, 12 from Rosettaddg and 14 point mutations, which were predicted to be stabilizing by both Rosettaddg and FoldX.
When these 64 variants were experimentally screened (Scheme 1, fourth step; Supplementary Fig. S8, Table SIII), 21 variants had an improved TMapp (33%). Of the 17 tested disulfide bond variants, 10 had an increased TMapp, ranging from +4 to +15°C. Seven out of the 10 stabilizing disulfide bonds originated from the additional backbone sampling by MD simulation (Table I, Fig. 2). Of the 47 point mutations, 11 were stabilizing (6 from FoldX, 3 from Rosettaddg, 2 from both, Fig. 1A and B). Point mutations with a ΔΔGFold > −5 kJ/mol had also been tested but none of these 15 were experimentally stabilizing (Fig. 1A and B). Of the point mutations with a ΔΔGFold < −5 kJ/mol, 34% was stabilizing. The catalytic activity of the thermostabilized variants was preserved, and most activities differ less than a factor 2 from the catalytic activity of the WT LEH (Supplementary Table SIV).
|Protein structurea||Cysteine positions|
|Protein structurea||Cysteine positions|
aThe first column indicates which X-ray structure (pdb entry) or MD simulation snapshot (ps after start of simulation) was used for the computational design of disulfide bonds.
bInter-subunit disulfide bond.
The discovery of 21 stabilized variants in a library of only 64 variants is based on the use of the orthogonal in silico screening steps (Steps 2 and 3 in Scheme 1) for eliminating false-positive predictions. When using FoldX calculations only, a large fraction of the mutations predicted to be stabilizing appeared to be destabilizing or neutral when tested experimentally. Selection of the mutations with the best predicted ΔΔGFold using Rosettaddg would only have resulted in the introduction of highly surface-exposed aromatic groups, which is a known problem in computational design (see the Discussion section). The best 18 variants at different positions according to FoldX included 6 variants that did pass through Stages 2 and 3, and thus also were included in the FRESCO library. When the discarded 12 variants were experimentally characterized, none were stabilizing (Fig. 1C, Supplementary Table SV) and half were strongly destabilizing. Of the 12 false-positive predictions, 9 were eliminated at Step 2 of FRESCO. Of these nine variants, four were eliminated because a highly surface-exposed hydrophobic group was introduced (Q7M, E68L, A48F, S111M), three were eliminated because unsatisfied H-bond donors and acceptors were created by the mutations (S12M, T22D, G129S) and two were eliminated because a proline was introduced inside an α-helix (A40P, A41P). The other three false positives were eliminated at Step 3 of FRESCO because they were predicted to have increased flexibility (E49P, Y96W, R9P). The importance of eliminating false-positive predictions through Steps 2 and 3 of the screening is also apparent from protein expression levels, where lack of soluble expression of a mutant suggests lack of stability. Whereas in the FRESCO library, only 2 variants (3%) were not expressed in soluble form, the absence of soluble expression was observed for 4 of the 12 discarded variants. Two of these four variants that could not be solubly expressed belonged to the three variants that had been eliminated by MD flexibility screening. These results confirm that a framework type of computational approach, in which orthogonal screening steps serve to eliminate false-positive predictions, improves the accuracy with which stabilizing mutations are predicted and allows the discovery of multiple stabilizing mutations with minimal experimental screening.
Origins of stabilization
The modeled 3D structures of the improved variants were analyzed to investigate the structural basis for the stabilizing effects. Because of the large number of stabilizing mutations, the effects of individual mutations are described in the Supplementary data. The beneficial mutations appear to introduce better H-bonds that stabilize the local protein structure (A19K, N92K), improved surface charge–charge interactions (A19K, E45K, T76K, N92K, N92R), improved hydrophobic interactions (T85I, T85V, T85L, Y96F, Fig. 3) and entropic stabilization (S15P and all disulfide variants). All of the stabilizing mutations are located at or near the surface of LEH. Furthermore, the most successful mutations appear to be predominantly localized inside or near the flexible N-terminus (residues 3–24) and to a much lesser extent in Helices 3 and 4 (Fig. 4). Many of the mutations are also near to the dimer interface, with the exception of the highly stabilizing S3C-V102C, K4C-A82C and A5C-E84C disulfide bonds that are located at the N-terminus but not at the dimer interface. The unsuccessful mutations are more evenly spread over the protein than the stabilizing mutations. These observations indicate that the N-terminal loop is the most critical region for stability of LEH, and that the successful mutations especially introduced stabilizing interactions in and around this region.
Design of combined variants
Aiming to obtain highly thermostable variants, the most stabilizing mutations (Fig. 2) were rationally combined without construction of intermediate variants. If at a single position multiple stabilizing mutations were available, then the most stabilizing variant was used; for example, N92K with a ΔTMapp of +7°C was preferred over N92R with a ΔTMapp of +2°C. Since the stability of LEH can be increased by improving the local stability of the N-terminal region (see above), different combinations of mutations were screened by MD simulation for their effect on the flexibility of the N-terminus. Disulfide bonds were included and MD simulations were again used to test their compatibility. To test the usefulness of the flexibility predictions by MD simulations (third step of Scheme 1) for variants in which multiple mutations are combined, two combinations of disulfide bonds were characterized, of which one (S3C/I5C/E84C/V102C) was predicted to rigidify the N-terminus and thus be stabilizing, while the other combination (A40C/I44C/E68C/A72C) was predicted to increase the flexibility of the N-terminal region of the enzyme, and thus be destabilizing. Indeed, S3C/I5C/E84C/V102C had a higher TMapp than its parents (66.8°C, ΔTMapp = +15.8°C versus +13.5°C for I5C/E84C and +11.0°C for S3C/V102C), while A40C/I44C/E68C/A72C had a lower TMapp than its parents (54.8°C, ΔTMapp = 3.8 versus +5.0°C for A40C/A72C and +5.5°C for 44C/E68C). These experimental results are in agreement with the predictions from MD simulation, and indicate that MD simulations can increase the chance of successfully combining mutations.
The application of this method to combine multiple mutations simultaneously resulted in the final variants F1, with 12 mutations, and F2 with 10 mutations (Fig. 2). These two variants combine the strongest stabilizing mutations that were predicted by MD to rigidify the N-terminus, whereas combinations that enhance local flexibility were discarded. For example, S15P was omitted from variant F2 because in combination with the other mutations, an increased flexibility of the N-terminal loop was predicted by MD simulation. Furthermore, the highly stabilizing mutation N92K was omitted from variant F2 because it cannot be combined with the A17C/N92C disulfide bonds. Also, maximally two disulfide bridges per enzyme were combined to avoid potential problems with the kinetics of protein folding.
Catalytic properties of combined variants
When tested experimentally, variants F1 and F2 both exhibited a dramatically increased TMapp (ΔTMapp = +34.6 and 35.5°C, Fig. 5A). A variant P which lacked the disulfide bonds of F1 still had a +20°C higher TMapp than the WT (Fig. 2). The apparent melting temperatures of the purified WT enzyme and its variants F1 and F2, as measured by differential scanning calorimetry (DSC), were in agreement with results obtained by the thermofluor method (Fig. 5A and B). Thermal inactivation assays also demonstrated that the variants F1 and F2 are inactivated only above 80°C (Fig. 5C). Titration with Ellman's reagent, the comparison of apparent melting temperature and the electrophoretic mobility on SDS–PAGE gel under oxidizing and reducing conditions (Supplementary data) confirmed that the disulfide bonds were formed in F1 and F2. Furthermore, the temperature optimum for catalytic activity of the mutants F1 and F2 was significantly increased compared with the WT. The WT enzyme had an optimum temperature of 50°C, whereas the mutants F1 and F2 were most active at 80 and 70°C, respectively (Fig. 5D). The irreversible inactivation of WT, F1 and F2 at 55°C was followed over time (Fig. 5E). The fitted rates were 0.31 ± 0.03, ≤80 × 10−6 and (130 ± 40) × 10−6 min−1, respectively. This demonstrates that variants F1 and F2 are inactivated at least 250 times slower than the WT. All these results confirm that F1 and F2 are highly thermostable variants of LEH.
Despite the 10–12 introduced mutations, the catalytic activities of variants F1 and F2 were retained. Both variants had a slightly reduced kcat at 30°C (Table II, Fig. 6), but they were more than twice as catalytically active as the WT at their respective optimum temperatures (Figs. 5D and 6). Moreover, the stereoconvergent selectivity of the WT enzyme, which produces enantiopure (1S,2S,4R)-limonene diol from a diasteriomeric mixture of (1R,2S,4R)- and (1S,2R,4R)-limonene epoxide substrates (Van der Werf et al., 1999), was preserved in variants F1 and F2 (Fig. 5F, Supplementary Fig. S9).
|kcat (s−1)||13.9 ± 0.8||63 ± 4||8.9 ± 0.4||135 ± 6||8.2 ± 0.3||160 ± 7|
|KM (mM)||0.3 ± 0.1||0.6 ± 0.1||<0.25a||0.6 ± 0.1||<0.25a||0.3 ± 0.1|
|kcat/KM (s−1 M−1)||4.6 × 104||1.0 × 105||>3.6 × 104||2.1 × 105||>3.3 × 104||4.9 × 105|
|kcat (s−1)||13.9 ± 0.8||63 ± 4||8.9 ± 0.4||135 ± 6||8.2 ± 0.3||160 ± 7|
|KM (mM)||0.3 ± 0.1||0.6 ± 0.1||<0.25a||0.6 ± 0.1||<0.25a||0.3 ± 0.1|
|kcat/KM (s−1 M−1)||4.6 × 104||1.0 × 105||>3.6 × 104||2.1 × 105||>3.3 × 104||4.9 × 105|
aKM was below the detection limit of 0.25 mM. The kinetic parameters for the hydrolysis of (4R)-limonene 1,2-epoxide were determined both at 30°C and at the optimum temperature of WT and variants F1 and F2.
The results show that the computational design of as many stabilizing mutations as feasible in combination with in silico and experimental screening allowed for rapid engineering of enzyme variants with a dramatically increased thermostability. Essential features of the FRESCO strategy proposed here are the use of computational design methods to create a library of potentially stabilizing mutations, followed by a reduction of library size through orthogonal in silico screening aimed at removing false-positive predictions. An experimental screening of the resulting library is then used to select the most stabilizing variants for combination. The best combined variants exhibited an increase in TMapp of ∼35°C as shown by DSC, thermofluor assays and activity–temperature measurements, while catalytic activity and stereoselectivity were maintained. Natural thermostable enzymes are often far less catalytically active at lower temperatures than their mesostable homologs (Fitter et al., 2001; Cheung et al., 2005). The results here show that even a 35°C increase in TMapp is not necessarily accompanied by loss of catalytic activity at lower temperatures.
The remarkable stabilization of LEH was achieved by experimental testing of <100 variants in just two rounds of mutagenesis. This is a very low number when compared with directed evolution, where often >104 variants need to be screened to obtain a strong stabilization, since the vast majority of the tested mutations are neutral or detrimental for stability (Bloom and Arnold, 2009). In most cases, the screening step is the bottleneck of a directed evolution project (Turner, 2009) and limits its application. Often no rapid expression systems or stability assays are available. High-throughput screening can be unfeasible if slow-growing organisms are required for protein expression or if assays cannot be scaled down. Therefore, the protocols presented here will be an attractive alternative for many proteins.
The experimental results revealed that a critical region for stabilization of LEH is located in the vicinity of the flexible N-terminus (Fig. 4B). This includes both the interface of the dimer and the N-terminus itself which is partly more remote from the interface. For example, both the G89C/S91C disulfide bond at the interface and the A5C/E84C disulfide bond at the N-terminus are highly stabilizing. The residues of the N-terminal S3C-V102C disulfide bond are located >12 Å away from the dimer interface, suggesting that its increased thermostability can be unrelated to improved stability of this dimer interface.
Recent strategies for protein stabilization often only select the most flexible residues for mutagenesis (Reetz et al., 2006; Jochens et al., 2010; Joo et al., 2011), with the rationale that these should be the most critical residues. However, some of the highly stabilizing mutations found here are in a rigid part of the protein. For example, T85V (ΔTMapp +7°C) is close to the flexible N-terminus, even though the mutated residue is in a rigid part of the protein as judged by its B-factors (13 Å2, in X-ray structure 1NWW), which are lower than average (15 Å2 is the average B-factor in 1NWW) and much lower than those of the flexible N-terminus (Ile 4, 27 Å2). Thus, the computational methods generated stabilizing mutations that would have been missed if only highly flexible residues had been selected for mutagenesis.
An essential element of the FRESCO strategy is the elimination of mutations that are suggested by the computational protocols, but lack credibility when their predicted flexibility is taken into account or when their predicted structure is examined. Structural inspection showed that about 50% of the mutations predicted to be stabilizing by the initial calculations (Step 1) are probably false positives. They were discarded in Step 2 (Scheme 1) because they introduce structural features that are expected to destabilize the protein, such as water-exposed hydrophobic side chains. The latter is a known problem of computational design algorithms (Jacak et al., 2012), and this justifies the use of rational criteria to remove false positives that result from imperfect energy functions and sampling in the design algorithms (Kellogg et al., 2011; Leaver-Fay et al., 2013). It is common in computational design to eliminate variants that have clear structural problems (Kiss et al., 2013; Wijma and Janssen, 2013). In the future, this could be automated as has been done for finding errors in X-ray structures (Vriend, 1990), or may no longer be necessary if the energy calculations and sampling methods are further improved. Since local unfolding followed by irreversible aggregation may be as important for enzyme inactivation as overall thermodynamic stability (Polizzi et al., 2006; Reetz et al., 2006; Joo et al., 2011), elimination of false positives was also based on MD simulations which predicted effects on local flexibility (Step 3). It is well established that high flexibility can promote unfolding (Vihinen, 1987). Here, experimental characterization of mutants that were eliminated at the third step of FRESCO because of higher flexibility showed that the discarded mutations were not stabilizing and often were even strongly destabilizing (Fig. 1C).
The efficiency of FRESCO as a strategy is confirmed by the large number of mechanistically different stabilizing mutations that were discovered. The point mutations appear to act through various effects that can stabilize a protein, including the removal of unsatisfied H-bond donors/acceptors, introduction of new H-bonds, better charge distribution (Karshikoff and Ladenstein, 2001; Gribenko et al., 2009), less hydrophobic exposure to solvent and entropic stabilization (Nosoh and Sekiguchi, 1991; Eijsink et al., 2004). Multiple disulfide bonds per protein, like in variants F1 and F2, occur naturally in the proteomes of a few thermophiles (Ladenstein and Ren, 2008). A complete list of the mutations and their proposed effects is given in Supplementary data. The ability to obtain mechanistically diverse types of stabilizing mutations is likely to become essential if the goal is to engineer strongly enhanced stability into any target protein.
The developed computational strategy to stabilize an enzyme is reminiscent of directed evolution, in that a library of potentially stabilizing mutations is experimentally screened before combining the most successful mutations to final variants. A more common approach in computational design of thermostability is to select the best set of mutations purely in silico and only characterize the final combined variants (Malakauskas and Mayo, 1998; Korkegian et al., 2005; Gribenko et al., 2009; Diaz et al., 2011; Joo et al., 2011; Borgo and Havranek, 2012; Miklos et al., 2012; Murphy et al., 2012). The results in Fig. 1B show that such an approach would have missed highly stabilizing mutations (T85V/N92K). Another approach is to use the consensus approach in combination with computational design. Using FoldX for the computations, such an approach resulted in a cellobiohydrolase with a 9°C improved TMapp (Komor et al., 2012). However, in a similar study it was reported that FoldX did not correlate well with thermostabilizing mutations (Polizzi et al., 2006), which is in agreement with our results of finding false positives in the absence of orthogonal screening (Fig. 1C). To allow for larger increases in thermostability, the FRESCO approach uses an experimental screening to verify that the mutations indeed stabilize the enzyme and spare catalytic activity before creating variants in which mutations are combined.
The modeling of backbone flexibility is an important problem in computational protein design. With a rigid backbone, many beneficial mutations will be sterically excluded. The unusually large number of stabilizing disulfide bonds discovered in this study is mainly due to the use of an MD simulation that samples the natural backbone flexibility to generate different realistic starting structures for the design of disulfide bonds. Backbone sampling protocols normally do not incorporate explicit water molecules (Su and Mayo, 1997; Georgiev et al., 2008; Smith and Kortemme, 2008; Havranek and Baker, 2009; Babor et al., 2011; Chitsaz and Mayo, 2013). The MD simulations include the surrounding water hydrogen-bonding network, which should make the sampling of energetically accessible conformations more accurate. This protocol produced 7 out of the 10 successful disulfide bonds, which included all three disulfide bonds that were combined in the final highly thermostable variants (Table I, Fig. 2). We are not aware of previous reports describing a similar large number of stabilizing disulfide bonds. With existing methods to stabilize enzymes, typically one or two stabilizing disulfide bonds are reported (Matsamura et al., 1989; Pikkemaat et al., 2002; Dombkowski, 2003; Chen et al., 2009). Such numbers are similar to the finding of three stabilizing disulfide bonds for LEH (Table I) in the absence of backbone conformational sampling. The significant increase in the number of stabilizing disulfide bonds due to the conformational sampling experimentally shows that MD can generate structures that are accurate enough for computational protein design.
The backbone sampling allowed to predict stabilizing disulfide bonds at positions, where based on the X-ray structure a disulfide bond would not be feasible because the backbone atoms were too far away from each other. For example, in case of disulfide bonds distances of 3.6–7.2 Å occur between their respective Cα atoms (Petersen et al., 1999) in natural proteins, whereas the distance between the Cα atoms of residues 4 and 82 (where a stabilizing disulfide bond could be formed, Table I, Fig. 2) is at least 8.90 in the available X-ray structures (1NWW, 1NU3). During the MD simulation, the distance between the Cα atoms of residues 4 and 82 decreased to 6.52 Å (results not shown). Without backbone conformational sampling, the additional disulfide bonds obtained from MD simulation could only have been discovered if the geometric criteria would have been relaxed. However, in that case the algorithms would also have proposed disulfide bonds that are expected to destabilize the protein because their geometries are far outside the naturally occurring ranges.
The multi-site mutants, which harbored 12 (variant F1) or 10 (variant F2) substitutions, are fully catalytically active, with an increase in kcat at the optimum temperature when compared with WT. The only precaution adopted in the FRESCO protocol was not to introduce mutations at residues close to the active site. Regioselectivity of water attack on the diastereomeric substrate is fully retained, allowing enantioconvergent production of (1S,2R,4R)-limonene diol. The resulting variants are suitable for use in protein engineering aimed at introducing new selectivities.
In conclusion, we show that computational library design can identify many mutations with different stabilization mechanisms to cumulatively obtain a large increase in enzyme thermostability. The computational library design enabled a larger jump in enzyme stability while preserving catalytic activity. The developed FRESCO strategy made it feasible to obtain protein variants with high thermostability in a short time with minimal experimental screening.
This work was supported by the European Union 7th Framework via the Kyrobio project (KBBE-2011-5, 289646) and the Metaexplore project (KBBE-2007-3-3-05, 222625) and by NWO (Netherlands Organization for Scientific Research) through an ECHO grant.