Abstract

HotSpot Wizard 2.0 is a web server for automated identification of hot spots and design of smart libraries for engineering proteins’ stability, catalytic activity, substrate specificity and enantioselectivity. The server integrates sequence, structural and evolutionary information obtained from 3 databases and 20 computational tools. Users are guided through the processes of selecting hot spots using four different protein engineering strategies and optimizing the resulting library's size by narrowing down a set of substitutions at individual randomized positions. The only required input is a query protein structure. The results of the calculations are mapped onto the protein's structure and visualized with a JSmol applet. HotSpot Wizard lists annotated residues suitable for mutagenesis and can automatically design appropriate codons for each implemented strategy. Overall, HotSpot Wizard provides comprehensive annotations of protein structures and assists protein engineers with the rational design of site-specific mutations and focused libraries. It is freely available at http://loschmidt.chemi.muni.cz/hotspotwizard.

INTRODUCTION

The development of tailor-made enzymes for industrial applications is facilitated by understanding the molecular mechanisms of protein function. However, despite significant advances in recent decades, it is not yet clear how a protein's sequence encodes its function (1,2). Traditional directed evolution circumvents this problem by using repeated rounds of random mutagenesis and screening of large sequence libraries to explore the mutational landscape and find proteins with desired properties (2–5). This approach has the advantage of requiring no prior knowledge of the protein's structure or understanding of its structure–function relationships (6), but necessitates the laborious and costly screening of very large libraries (4). The efficiency of directed evolution experiments can be significantly improved by creating smaller, higher quality libraries that are more likely to yield positive results. Such ‘smart’ libraries can be generated by focusing mutagenesis on a limited number of ‘hot spot’ positions that are likely to affect the property of interest, or by selecting a limited set of substitutions (1–5).

The optimal strategy for identifying hot spots depends on the property being targeted. Catalytic properties such as activity, specificity and stereoselectivity are often related to amino acid residues that mediate substrate binding, transition-state stabilization or product release (7,8). Such residues can be identified using tools for predicting and analyzing enzyme-ligand interactions (9–11) or detecting binding pockets or access tunnels (12–14). Strategies for improving protein stability include rigidification of flexible sites, cavity-filling, tunnel engineering, consensus and ancestral mutation methods, or redesigning of surface charges (15–17). While hot spots for some of these strategies can be identified straightforwardly using a single computational tool (18), others require multi-step analyses or the use of molecular modelling methods (19). Having obtained a set of promising sites for manipulating the desired property, the next challenge is to draw up a list of allowed substitutions at individual positions. This can be done by considering the amino acid distribution at the corresponding positions in sequence homologs (20,21), by using reduced sets of amino acids with either specific desired physicochemical properties or a balanced set of these properties (22,23), or on the basis of the predicted effects of specific substitutions on the protein's properties (24,25). Finally, an appropriate degenerate codon covering the specified set of amino acids must be selected for each targeted position. Ideally, these codons should exhibit minimal amino acid bias and minimize the frequency of premature stop codons (26). Several tools are available to facilitate this task and to calculate the size of the designed library (27).

Here, we present HotSpot Wizard 2.0, a web server for the automated identification of hot spots and design of smart libraries for engineering protein stability, enzymatic activity, substrate specificity and enantioselectivity. Compared to its predecessor (28), HotSpot Wizard 2.0 introduces several major improvements, extending the scope and quality of its analyses. It implements four different established protein engineering strategies, enabling the user to selectively target sites affecting the protein's stability and catalytic properties. Users can easily select suitable substitutions for individual hot spots based on predictions of tolerated amino acids or amino acid distributions in sequence homologs, and suitable degenerate codons for these substitutions can be designed automatically via the HotSpot Wizard interface. A new graphical user interface provides an intuitive and comprehensive overview of the results of the analysis, allowing users to think directly about the obtained designs. The resulting pipeline of twenty integrated tools and three databases represents a unique one-stop solution that makes library design accessible even to users with no prior knowledge of bioinformatics.

MATERIALS AND METHODS

The workflow of HotSpot Wizard is outlined in Figure 1. In order to explore the mutational landscape and find the most promising mutagenesis targets, a protein selected by the user is annotated using several prediction tools and databases (Phase 1). With this knowledge in hand, four protein engineering strategies are used to identify suitable hot spots for improving desired protein properties (Phase 2). Finally, suitable substitutions and appropriate degenerate codons are proposed for each selected hot spot, enabling the design of a smart library (Phase 3).

Workflow of HotSpot Wizard.
Figure 1.

Workflow of HotSpot Wizard.

Phase 1: annotation of the protein

The first step in the workflow requires the user to specify the protein structure of interest, either by providing its PDB ID or by uploading a suitable PDB file. If possible, the biological assembly of the target protein is automatically generated by the MakeMultimer tool (http://watcut.uwaterloo.ca/tools/makemultimer), and information about ‘essential residues’ directly involved in catalysis or binding is obtained from the Catalytic Site Atlas (29) and UniProtKB/Swiss-Prot (30) databases. The DSSP algorithm (31) is then used to assign the protein's secondary structure, and its accessible surface area is computed using the Shrake and Rupley algorithm (32) with BioJava (33). The average B-factors are computed for the protein's amino acid residues (34). The raw B-factor values are accompanied by residue rankings ranging from 1–100%; rankings of 1–25%, 26–75% and 76–100% indicate high, moderate and low levels of relative structural flexibility, respectively. Protein pockets are then identified with Fpocket (35). For each chain, the pocket containing the greatest number of essential residues is identified as the catalytic pocket. If there are two or more pockets that satisfy this criterion, a decision is made according to the Fpocket score. Having identified the putative catalytic pockets, their centers of mass are determined and used as starting points to identify access tunnels with CAVER (36). Sequence homologs of the target protein are then obtained by performing a BLAST (37) search against the UniRef90 (38) database, using the target protein sequence as a query. All identified homologs are aligned with the query protein using USEARCH (39). By default, sequences whose identity with the query is below 30% or above 90% are excluded from the list of homologs. The remaining sequences are then clustered using UCLUST (39), with a 90% identity threshold to remove close homologs. The cluster representatives are sorted based on the BLAST query coverage and by default, the first 200 of them are used to create a sequence data set. A multiple sequence alignment of the resulting sequence data set is created with Clustal Omega (40) and used to (i) estimate the conservation of each position in the protein based on the Jensen–Shannon entropy (41); (ii) identify correlated positions using an ensemble of the MI (42), aMIc (43), OMES (44), SCA (45), DCA (46), McBASC (47) and ELSC (48) methods; (iii) predict the tolerated amino acids at each position in the protein sequence using RAPHYD (see Supplementary Data 1); and (iv) analyze amino acid frequencies at individual positions within the protein. The conservation scores are used to assign mutability values to individual residues. To facilitate interpretation, these values are divided into three groups: values of 1–3, 4–5 and 6–9 indicate low, moderate and high mutability, respectively.

Phase 2: identification of mutagenesis hot spots

Based on the comprehensive annotation of the target protein, four protein engineering strategies are used to identify different types of hot spots: (i) functional hot spots, (ii) stability hot spots based on structural flexibility, (iii) stability hot spots based on sequence consensus and (iv) correlated hot spots. Some examples illustrating the use of these strategies to engineer selected properties in 12 different proteins (34,49–62) are shown in Figure 2. Functional hot spots correspond to highly mutable residues located in the catalytic pockets or tunnels connecting these pockets with the bulk solvent. Residues located in close proximity to the active site have been identified as good mutagenesis targets for engineering activity, enantioselectivity and substrate specificity (52,63,64). To prevent mutagenesis at positions that are indispensable for protein function, all essential residues are designed immutable and thus excluded from the list of potential hot spots. Supplementary Data 2 shows that HotSpot Wizard provides a significantly greater proportion of viable mutants than random mutagenesis. Stability hot spots are identified by analyzing structural flexibility and sequence consensus. The former approach aims to rigidify flexible protein regions by mutating residues with high average B-factors (34). B-factor provides a metric for flexibility which is due in part to inherent flexibility of the macromolecule, but also includes stabilizing/destabilizing energy from packing in the crystal lattice. The rationale for targeting these flexible residues is that they have relatively few contacts with neighbors, so their substitution can produce more interactions (34,54,55). In contrast, the sequence consensus protocol implements majority and frequency ratio approaches, both of which suggest mutations at positions where the wild-type amino acid differs from the most prevalent amino acid (i.e. the consensus residue) at a given position in the multiple sequence alignment. The assumption that the most common amino acid is likely to be stabilizing has proven to be very successful at creating more stable proteins (56–58,65). By default, if the consensus residue is present in at least 50% of all analyzed sequences, the corresponding position is identified as a hot spot in the majority approach. The frequency ratio approach has a less strict criterion for the consensus residue's frequency – the default value is 40%, but it must also be at least five times more frequent than the wild-type residue as a hot spot. The final strategy involves searching for coordinated changes of the amino acids at two separate positions within the protein. Such pairs of positions are referred to as correlated hot spots, and arise when one amino acid substitution has an unfavorable effect that is compensated for by a second mutation of a residue that is located in close structural proximity to the first. This second, correlated mutation typically helps to maintain protein function, stability or folding (66). Methods developed for identifying correlated pairs have revealed mutations responsible for modulating substrate specificity (67), enantioselectivity (68) and mutagenesis targets for stability engineering (69). The identification of correlated positions in HotSpot Wizard is based on an ensemble of seven prediction tools. Each tool generates a raw score for each pair of residues in the protein that measures the pair's degree of correlation. The mean and standard deviation of the degrees of correlation for all pairs of residues in the protein are then calculated and the raw scores are converted into Z-scores, which measure the number of standard deviations by which each pair's raw score deviates from the mean. Based on the work of Martin et al. (70), a pair is considered to be correlated if its average Z-score ≥ 3.5 and both of its positions have at least a moderate degree of mutability – by definition, highly conserved positions cannot co-evolve (71).

Some notable applications of the four protein engineering strategies implemented in the HotSpot Wizard web server.
Figure 2.

Some notable applications of the four protein engineering strategies implemented in the HotSpot Wizard web server.

Phase 3: design of the smart library

The efficiency of directed evolution experiments can be improved by focusing mutagenesis on a limited number of hot spots, but also by restricting the number of allowed substitutions at individual positions using appropriate codons (20–25). For each protein engineering strategy, HotSpot Wizard provides a way to prioritize amino acids at the randomized positions (Table 1) and identifies degenerate codons encoding all desired amino acids with the minimum redundancy and the smallest possible ratio of stop codons. Alternatively, the SwiftLib tool (73) can be used to calculate optimal degenerate codons while keeping the library diversity within the specified limits (the default 10 000). Although the resulting library may not necessarily fully cover the desired set of amino acids, the probability of omitting the important amino acids is relatively low as their weights are set according to selected prioritization method (e.g. based on amino acid distributions in sequence homologs). For both approaches, the most common metrics, such as expected coverage or library size, are computed with TopLib (72).

Methods for selecting substitutions at hot spot positions identified using the four different protein engineering strategies

Table 1.
Methods for selecting substitutions at hot spot positions identified using the four different protein engineering strategies
Selection modeAvailability in strategiesDescription
Amino acid frequencyFUNC, FLEXsuggests amino acid residues fulfilling the criterion of minimal frequency in the multiple sequence alignment
Mutational landscapeFUNC, FLEXsuggests amino acid residues fulfilling the criterion of minimal probability of preservation of protein function
Sequence consensusCONSsuggests amino acid residues fulfilling the criteria of at least one of approaches implemented in sequence consensus strategy: (i) majority approach or (ii) frequency ratio approach
Correlated positionsCORRELsuggests amino acid residues fulfilling the criterion of minimal frequency of co-occurrence with some other specific residue from coupled position
ManualALLmanual selection of amino acid residues
Selection modeAvailability in strategiesDescription
Amino acid frequencyFUNC, FLEXsuggests amino acid residues fulfilling the criterion of minimal frequency in the multiple sequence alignment
Mutational landscapeFUNC, FLEXsuggests amino acid residues fulfilling the criterion of minimal probability of preservation of protein function
Sequence consensusCONSsuggests amino acid residues fulfilling the criteria of at least one of approaches implemented in sequence consensus strategy: (i) majority approach or (ii) frequency ratio approach
Correlated positionsCORRELsuggests amino acid residues fulfilling the criterion of minimal frequency of co-occurrence with some other specific residue from coupled position
ManualALLmanual selection of amino acid residues

FUNC – Analysis of functional hot spots; FLEX – Analysis of stability hot spots/structural flexibility approach; CONS – Analysis of stability hot spots / sequence consensus approach; CORREL – Analysis of correlated hot spots.

Table 1.
Methods for selecting substitutions at hot spot positions identified using the four different protein engineering strategies
Selection modeAvailability in strategiesDescription
Amino acid frequencyFUNC, FLEXsuggests amino acid residues fulfilling the criterion of minimal frequency in the multiple sequence alignment
Mutational landscapeFUNC, FLEXsuggests amino acid residues fulfilling the criterion of minimal probability of preservation of protein function
Sequence consensusCONSsuggests amino acid residues fulfilling the criteria of at least one of approaches implemented in sequence consensus strategy: (i) majority approach or (ii) frequency ratio approach
Correlated positionsCORRELsuggests amino acid residues fulfilling the criterion of minimal frequency of co-occurrence with some other specific residue from coupled position
ManualALLmanual selection of amino acid residues
Selection modeAvailability in strategiesDescription
Amino acid frequencyFUNC, FLEXsuggests amino acid residues fulfilling the criterion of minimal frequency in the multiple sequence alignment
Mutational landscapeFUNC, FLEXsuggests amino acid residues fulfilling the criterion of minimal probability of preservation of protein function
Sequence consensusCONSsuggests amino acid residues fulfilling the criteria of at least one of approaches implemented in sequence consensus strategy: (i) majority approach or (ii) frequency ratio approach
Correlated positionsCORRELsuggests amino acid residues fulfilling the criterion of minimal frequency of co-occurrence with some other specific residue from coupled position
ManualALLmanual selection of amino acid residues

FUNC – Analysis of functional hot spots; FLEX – Analysis of stability hot spots/structural flexibility approach; CONS – Analysis of stability hot spots / sequence consensus approach; CORREL – Analysis of correlated hot spots.

DESCRIPTION OF THE WEB SERVER

Input

The only required input to the web server is a tertiary structure of the query protein, provided either as a PDB ID or a PDB file. The user can then choose a predefined biological unit generated by the MakeMultimer tool or manually select chains for which the calculation should be performed. The calculations can be configured in either basic or advanced mode. Basic mode directs the user's attention to the most important parameters, providing an overview of the identified essential residues and highlighting the main parameters involved in the identification of pockets and tunnels. The designation of essential residues is a key step in the functional strategy because these residues are excluded from the list of potential hot spots and are also used to detect catalytic pockets and access tunnels. The user should therefore inspect the automatically generated list of essential residues and correct it if necessary. If no essential residues are detected, the user should specify them manually. In basic mode, the user can specify three parameters: (i) the probe radius, which is used in pocket identification and defines the minimum radius of an alpha sphere in a pocket (default 2.8 Å); (ii) the minimum probe radius, which defines the minimum radius of a putative tunnel (default 1.4 Å); and (iii) the clustering threshold, which determines how the hierarchically clustered tunnels are cut and thus affects the number of tunnels that can be identified (default 3.5 Å). Advanced mode allows expert users to fine-tune parameters of individual calculations in the pipeline to achieve more specialized objectives.

Output

Upon submission, a unique identifier is assigned to each job to track the calculation. The ‘Results browser’ panel provides information on the status of individual steps in the computational pipeline (Figure 3A). Once the job is finished, the navigation panel provides links to the results obtained using each of the four different protein engineering strategies (Figure 3B). The result pages for each strategy are all organized in the same way, which is described below.

HotSpot Wizard's graphical user interface, showing results obtained for the haloalkane dehalogenase LinB (PDB ID: 1CV2). (A) The ‘Report’ panel shows the status of the calculations in the individual steps of the computational pipeline. (B) Results obtained using the four protein engineering strategies. (C) The ‘Residue features’ panel, which provides an overview of the identified hot spots. (D) The ‘Residues selected for mutagenesis’ panel, which presents a user-adjustable list of residues representing targets for mutagenesis. (E) The JSmol viewer allows interactive visualization of the protein and the identified tunnels and pockets. (F) The ‘Residue details’ pop-up window, which provides comprehensive information on the residue's annotations, organized under several tabs. (G) The ‘Library design’ panel, which shows the list of substitutions and appropriate codons for randomization of selected positions.
Figure 3.

HotSpot Wizard's graphical user interface, showing results obtained for the haloalkane dehalogenase LinB (PDB ID: 1CV2). (A) The ‘Report’ panel shows the status of the calculations in the individual steps of the computational pipeline. (B) Results obtained using the four protein engineering strategies. (C) The ‘Residue features’ panel, which provides an overview of the identified hot spots. (D) The ‘Residues selected for mutagenesis’ panel, which presents a user-adjustable list of residues representing targets for mutagenesis. (E) The JSmol viewer allows interactive visualization of the protein and the identified tunnels and pockets. (F) The ‘Residue details’ pop-up window, which provides comprehensive information on the residue's annotations, organized under several tabs. (G) The ‘Library design’ panel, which shows the list of substitutions and appropriate codons for randomization of selected positions.

Residue features

The ‘Residue features’ panel lists all of the identified hot spots together with information relevant to the selected protein engineering strategy (Figure 3C). Several checkboxes can be found at the top of this panel, allowing users to reduce the list of hot spots by applying additional criteria such as excluding buried residues, correlated positions or residues forming a catalytic pocket. The ‘Show all residues’ button enables users to inspect any residue of the target protein and possibly select hot spots based on their own criteria. Importantly, a pop-up window containing detailed information about a given residue is displayed after clicking the ‘book’ icon in the last column of the table. Users can visualize individual residues within the protein structure by selecting the ‘eye’ icon in the first column, and can add residues to the list of mutagenesis hot spots by clicking the ‘plus’ icon in the second column. All selected mutagenesis hot spots listed in the ‘Residues selected for mutagenesis’ panel (Figure 3D) can be used for designing a smart library by clicking the ‘Design library’ button.

Residue details

The information in the ‘Residue details’ panel is organized into several tabs (Figure 3F): (i) ‘Overview’, which provides basic information on the residue's characteristics such as its mutability, average B-factor and secondary structure; (ii) ‘Annotations’, describing the residue's function (only available for essential residues); (iii) ‘Tunnels and Pockets’, which lists the pockets and/or tunnels of which the residue is a part; (iv) ‘Sequence consensus’, listing potential consensus mutations for a given position; (v) ‘Amino acid frequencies’, providing the distribution of amino acids in the corresponding column of the multiple sequence alignment; (vi) ‘Mutational landscape’, quantifying the probability of preservation of protein function for individual substitutions at a given site; and (vii) ‘Correlated positions’, listing all positions correlated with the site in question.

Design of smart library

The ‘Library design’ panel allows the user to select a set of substitutions and design degenerate codons for systematic mutagenesis of the selected positions (Figure 3G). An automatic method for prioritizing amino acids suitable for the chosen protein engineering strategy will be pre-selected. The panel contains two tabs, each corresponding to one library optimization mode. In the ‘Standard mode’, users can manually define their own set of required substitutions for individual positions if they so desire. After any change in the list of amino acids, HotSpot Wizard automatically identifies the most suitable codons covering all desired amino acids with the lowest possible redundancy, and the library size corresponding to the specified expected coverage. The parameters of the library can be modified interchangeably, allowing the user to adjust the final library based on its size or preferred degree of its coverage. In the ‘SwiftLib mode’, users specify the maximum acceptable library diversity and the method reports the optimal combination of codons with the minimal redundancy of amino acids. However, this efficiency is often achieved at the price of omitting some of desired amino acids with lower weights. The initial amino acid weights derived from the selected prioritization scheme can be changed by selecting the ‘Edit amino acid weights’. Additionally, users can request multiple solutions and thus inspect also the solutions which are considered as less optimal by the method, but may better meet the users’ needs. Finally, users can generate a nucleotide sequence from the designed amino acid sequence based on the codon usage of selected organism (default is Escherichia coli) with the European Molecular Biology Open Software Suite (EMBOSS) Backtranseq tool (74).

Protein visualization

The protein of interest is interactively visualized in the web browser using the JSmol applet (http://wiki.jmol.org/index.php/JSmol). Users can display individual amino acid residues as well as identified tunnels and pockets (Figure 3E). The hot spot residues are colored in red, residues in tunnels and pockets in yellow and all other residues in grey.

Structural features

The main characteristics of all pockets and access tunnels are presented in the ‘Pockets’ and ‘Tunnels’ panels, respectively. These panels allow users to visualize individual pockets and tunnels in the structure and to open a pop-up window showing a list of all the residues comprising the chosen structural feature.

CONCLUSIONS AND OUTLOOK

HotSpot Wizard 2.0 is a web server for the automatic identification of hot spots and the design of site-specific mutations and mutant libraries for engineering protein stability, catalytic activity, substrate specificity and enantioselectivity. The server provides a unified interface allowing users to apply four well-established protein engineering strategies that combine structural, functional and evolutionary information to identify suitable positions for mutagenesis. Moreover, HotSpot Wizard integrates several schemes for automatic prioritization of mutations and codon optimization for selected hot spot positions to facilitate the design of smart libraries. The automation of the multi-step procedure makes the process of library design accessible to users without expertise in bioinformatics because it eliminates the need to select, install and evaluate tools, optimize their parameters, perform conversions between different data formats, and interpret intermediate results.

In the future, we plan to implement a protocol for structure prediction based on homology modeling, extending the applicability of HotSpot Wizard to proteins for which no experimental structure is yet available. Additionally, we aim to assess other established protein engineering strategies and, if they prove suitable, to develop new modules so they can be added to the server's portfolio of methods.

The authors would like to express many thanks to Dr Antonin Pavelka (Masaryk University, Brno, Czech Republic) for valuable discussions and Dr Yuval Nov (University of Haifa, Haifa, Israel) for kindly providing the source code of TopLib. Uwe Bornscheuer (University Greifswald), Marco Fraaije (Groningen University) and Moshe Goldsmith (Weizmann Institute of Science) are sincerely acknowledged for constructive comments on the tool. MetaCentrum and CERIT-SC are acknowledged for providing access to computing and storage facilities [LM2015085 and LM2015042].

FUNDING

Ministry of Education of the Czech Republic [LO1214, LQ1605, LM2015055 and LM2015047]; Grant Agency of the Czech Republic [GA16-06096S]; European Commission REGPOT [316345]; Horizon 2020 Research Infrastructure ELIXIR-EXCELERATE [676559]; Brno University of Technology [FIT-S-14-2299 to M.M.]. Funding for open access charge: Grants from Czech Ministry of Education [LO1214, LQ1605, LM2015055 and LM2015047].

Conflict of interest statement. None declared.

REFERENCES

1.

Romero
P.A.
Arnold
F.H.
Exploring protein fitness landscapes by directed evolution
Nat. Rev. Mol. Cell Biol.
2009
10
866
876

2.

Currin
A.
Swainston
N.
Day
P.J.
Kell
D.B.
Synthetic biology for the directed evolution of protein biocatalysts: navigating sequence space intelligently
Chem. Soc. Rev.
2015
44
1172
1239

3.

Cheng
F.
Zhu
L.
Schwaneberg
U.
Directed evolution 2.0: improving and deciphering enzyme properties
Chem. Commun. (Camb.)
2015
51
9760
9772

4.

Lutz
S.
Beyond directed evolution–semi-rational protein engineering and design
Curr. Opin. Biotechnol.
2010
21
734
743

5.

Acevedo-Rocha
C.G.
Reetz
M.T.
Nov
Y.
Economical analysis of saturation mutagenesis experiments
Sci. Rep.
2015
5
10654

6.

Lo Surdo
P.
Walsh
M.A.
Sollazzo
M.
A novel ADP- and zinc-binding fold from function-directed in vitro evolution
Nat. Struct. Mol. Biol.
2004
11
382
383

7.

Denard
C.A.
Ren
H.
Zhao
H.
Improving and repurposing biocatalysts via directed evolution
Curr. Opin. Chem. Biol.
2015
25
55
64

8.

Bornscheuer
U.T.
Huisman
G.W.
Kazlauskas
R.J.
Lutz
S.
Moore
J.C.
Robins
K.
Engineering the third wave of biocatalysis
Nature
2012
485
185
194

9.

Xie
Z.-R.
Hwang
M.-J.
Methods for predicting protein-ligand binding sites
Methods Mol. Biol.
2015
1215
383
398

10.

Yuan
Y.
Pei
J.
Lai
L.
Binding site detection and druggability prediction of protein targets for structure-based drug design
Curr. Pharm. Des.
2013
19
2326
2333

11.

Lavecchia
A.
Di Giovanni
C.
Virtual screening strategies in drug discovery: a critical review
Curr. Med. Chem.
2013
20
2839
2860

12.

Sebestova
E.
Bendl
J.
Brezovsky
J.
Damborsky
J.
Computational tools for designing smart libraries
Methods Mol. Biol.
2014
1179
291
314

13.

Brezovsky
J.
Chovancova
E.
Gora
A.
Pavelka
A.
Biedermannova
L.
Damborsky
J.
Software tools for identification, visualization and analysis of protein tunnels and channels
Biotechnol. Adv.
2013
31
38
49

14.

Zhang
Z.
Li
Y.
Lin
B.
Schroeder
M.
Huang
B.
Identification of cavities on protein surface using multiple computational approaches for drug binding site prediction
Bioinformatics
2011
27
2083
2088

15.

Bommarius
A.S.
Paye
M.F.
Stabilizing biocatalysts
Chem. Soc. Rev.
2013
42
6534
6565

16.

Wijma
H.J.
Floor
R.J.
Janssen
D.B.
Structure- and sequence-analysis inspired engineering of proteins for enhanced thermostability
Curr. Opin. Struct. Biol.
2013
23
588
594

17.

Yu
H.
Huang
H.
Engineering proteins for thermostability through rigidifying flexible sites
Biotechnol. Adv.
2014
32
308
315

18.

Folkman
L.
Stantic
B.
Sattar
A.
Zhou
Y.
EASE-MM: Sequence-based prediction of mutation-induced stability changes with feature-based multiple models
J. Mol. Biol.
2016
428
1394
1405

19.

Bednar
D.
Beerens
K.
Sebestova
E.
Bendl
J.
Khare
S.
Chaloupkova
R.
Prokop
Z.
Brezovsky
J.
Baker
D.
Damborsky
J.
FireProt: Energy- and evolution-based computational design of thermostable multiple-point mutants
PLoS Comput. Biol.
2015
11
e1004556

20.

Reetz
M.T.
Wu
S.
Greatly reduced amino acid alphabets in directed evolution: making the right choice for saturation mutagenesis at homologous enzyme positions
Chem. Commun. (Camb)
2008
43
5499
5501

21.

Jochens
H.
Bornscheuer
U.T.
Natural diversity to guide focused directed evolution
Chembiochem
2010
11
1861
1866

22.

Pines
G.
Pines
A.
Garst
A.D.
Zeitoun
R.I.
Lynch
S.A.
Gill
R.T.
Codon compression algorithms for saturation mutagenesis
ACS Synth. Biol.
2015
4
604
614

23.

Reetz
M.T.
Kahakeaw
D.
Lohmer
R.
Addressing the numbers problem in directed evolution
Chembiochem
2008
9
1797
1804

24.

Goldsmith
M.
Tawfik
D.S.
Enzyme engineering by targeted libraries
Methods Enzymol.
2013
523
257
283

25.

Chaparro-Riggers
J.F.
Polizzi
K.M.
Bommarius
A.S.
Better library design: data-driven protein engineering
Biotechnol. J.
2007
2
180
191

26.

Gaytán
P.
Contreras-Zambrano
C.
Ortiz-Alvarado
M.
Morales-Pablos
A.
Yáñez
J.
TrimerDimer: an oligonucleotide-based saturation mutagenesis approach that removes redundant and stop codons
Nucleic Acids Res.
2009
37
e125

27.

Nov
Y.
Probabilistic methods in directed evolution: library size, mutation rate, and diversity
Methods Mol. Biol.
2014
1179
261
278

28.

Pavelka
A.
Chovancova
E.
Damborsky
J.
HotSpot Wizard: a web server for identification of hot spots in protein engineering
Nucleic Acids Res.
2009
37
W376
W383

29.

Furnham
N.
Holliday
G.L.
de Beer
T.A.P.
Jacobsen
J.O.B.
Pearson
W.R.
Thornton
J.M.
The Catalytic Site Atlas 2.0: cataloging catalytic sites and residues identified in enzymes
Nucleic Acids Res.
2014
42
D485
D489

30.

UniProt Consortium
UniProt: a hub for protein information
Nucleic Acids Res.
2015
43
D204
D212

31.

Kabsch
W.
Sander
C.
Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features
Biopolymers
1983
22
2577
2637

32.

Shrake
A.
Rupley
J.A.
Environment and exposure to solvent of protein atoms. Lysozyme and insulin
J. Mol. Biol.
1973
79
351
371

33.

Prlić
A.
Yates
A.
Bliven
S.E.
Rose
P.W.
Jacobsen
J.
Troshin
P.V.
Chapman
M.
Gao
J.
Koh
C.H.
Foisy
S.
et al.
BioJava: an open-source framework for bioinformatics in 2012
Bioinformatics
2012
28
2693
2695

34.

Reetz
M.T.
Carballeira
J.D.
Vogel
A.
Iterative saturation mutagenesis on the basis of B factors as a strategy for increasing protein thermostability
Angew. Chem. Int. Ed Engl.
2006
45
7745
7751

35.

Le Guilloux
V.
Schmidtke
P.
Tuffery
P.
Fpocket: an open source platform for ligand pocket detection
BMC Bioinformatics
2009
10
168

36.

Chovancova
E.
Pavelka
A.
Benes
P.
Strnad
O.
Brezovsky
J.
Kozlikova
B.
Gora
A.
Sustr
V.
Klvana
M.
Medek
P.
et al.
CAVER 3.0: a tool for the analysis of transport pathways in dynamic protein structures
PLoS Comput. Biol.
2012
8
e1002708

37.

Camacho
C.
Coulouris
G.
Avagyan
V.
Ma
N.
Papadopoulos
J.
Bealer
K.
Madden
T.L.
BLAST+: architecture and applications
BMC Bioinformatics
2009
10
421

38.

Suzek
B.E.
Wang
Y.
Huang
H.
McGarvey
P.B.
Wu
C.H.
UniProt Consortium
UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches
Bioinformatics
2015
31
926
932

39.

Edgar
R.C.
Search and clustering orders of magnitude faster than BLAST
Bioinformatics
2010
26
2460
2461

40.

Sievers
F.
Wilm
A.
Dineen
D.
Gibson
T.J.
Karplus
K.
Li
W.
Lopez
R.
McWilliam
H.
Remmert
M.
Söding
J.
et al.
Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega
Mol. Syst. Biol.
2011
7
539

41.

Capra
J.A.
Singh
M.
Predicting functionally important residues from sequence conservation
Bioinformatics
2007
23
1875
1882

42.

Korber
B.T.
Farber
R.M.
Wolpert
D.H.
Lapedes
A.S.
Covariation of mutations in the V3 loop of human immunodeficiency virus type 1 envelope protein: an information theoretic analysis
Proc. Natl. Acad. Sci. U.S.A.
1993
90
7176
7180

43.

Lee
B.-C.
Kim
D.
A new method for revealing correlated mutations under the structural and functional constraints in proteins
Bioinformatics
2009
25
2506
2513

44.

Kass
I.
Horovitz
A.
Mapping pathways of allosteric communication in GroEL by analysis of correlated mutations
Proteins
2002
48
611
617

45.

Lockless
S.W.
Ranganathan
R.
Evolutionarily conserved pathways of energetic connectivity in protein families
Science
1999
286
295
299

46.

Weigt
M.
White
R.A.
Szurmant
H.
Hoch
J.A.
Hwa
T.
Identification of direct residue contacts in protein-protein interaction by message passing
Proc. Natl. Acad. Sci. U.S.A.
2009
106
67
72

47.

Olmea
O.
Rost
B.
Valencia
A.
Effective use of sequence correlation and conservation in fold recognition
J. Mol. Biol.
1999
293
1221
1239

48.

Dekker
J.P.
Fodor
A.
Aldrich
R.W.
Yellen
G.
A perturbation-based method for calculating explicit likelihood of evolutionary co-variance in multiple sequence alignments
Bioinformatics
2004
20
1565
1572

49.

Pavlova
M.
Klvana
M.
Prokop
Z.
Chaloupkova
R.
Banas
P.
Otyepka
M.
Wade
R.C.
Tsuda
M.
Nagata
Y.
Damborsky
J.
Redesigning dehalogenase access tunnels as a strategy for degrading an anthropogenic substrate
Nat. Chem. Biol.
2009
5
727
733

50.

Gopal
S.
Rastogi
V.
Ashman
W.
Mulbry
W.
Mutagenesis of organophosphorus hydrolase to enhance hydrolysis of the nerve agent VX
Biochem. Biophys. Res. Commun.
2000
279
516
519

51.

Watkins
L.M.
Mahoney
H.J.
McCulloch
J.K.
Raushel
F.M.
Augmented hydrolysis of diisopropyl fluorophosphate in engineered mutants of phosphotriesterase
J. Biol. Chem.
1997
272
25596
25601

52.

Reetz
M.T.
Wang
L.-W.
Bocola
M.
Directed evolution of enantioselective enzymes: iterative cycles of CASTing for probing protein-sequence space
Angew. Chem. Int. Ed Engl.
2006
45
1236
1241

53.

Reetz
M.T.
Torre
C.
Eipper
A.
Lohmer
R.
Hermes
M.
Brunner
B.
Maichele
A.
Bocola
M.
Arand
M.
Cronin
A.
et al.
Enhancing the enantioselectivity of an epoxide hydrolase by directed evolution
Org. Lett.
2004
6
177
180

54.

Cerdobbel
A.
De Winter
K.
Aerts
D.
Kuipers
R.
Joosten
H.-J.
Soetaert
W.
Desmet
T.
Increasing the thermostability of sucrose phosphorylase by a combination of sequence- and structure-based mutagenesis
Protein Eng. Des. Sel.
2011
24
829
834

55.

Jochens
H.
Aerts
D.
Bornscheuer
U.T.
Thermostabilization of an esterase by alignment-guided focussed directed evolution
Protein Eng. Des. Sel.
2010
23
903
909

56.

Sullivan
B.J.
Nguyen
T.
Durani
V.
Mathur
D.
Rojas
S.
Thomas
M.
Syu
T.
Magliery
T.J.
Stabilizing proteins from sequence statistics: the interplay of conservation and correlation in triosephosphate isomerase stability
J. Mol. Biol.
2012
420
384
399

57.

Pey
A.L.
Rodriguez-Larrea
D.
Bomke
S.
Dammers
S.
Godoy-Ruiz
R.
Garcia-Mira
M.M.
Sanchez-Ruiz
J.M.
Engineering proteins with tunable thermodynamic and kinetic stabilities
Proteins
2008
71
165
174

58.

Amin
N.
Liu
A.D.
Ramer
S.
Aehle
W.
Meijer
D.
Metin
M.
Wong
S.
Gualfetti
P.
Schellenberger
V.
Construction of stabilized proteins by combinatorial consensus mutagenesis
Protein Eng. Des. Sel.
2004
17
787
793

59.

Akasako
A.
Haruki
M.
Oobatake
M.
Kanaya
S.
Conformational stabilities of Escherichia coli RNase HI variants with a series of amino acid substitutions at a cavity within the hydrophobic core
J. Biol. Chem.
1997
272
18686
18693

60.

van den Heuvel
R.H.H.
Fraaije
M.W.
Ferrer
M.
Mattevi
A.
van Berkel
W.J.H.
Inversion of stereospecificity of vanillyl-alcohol oxidase
Proc. Natl. Acad. Sci. U.S.A.
2000
97
9455
9460

61.

Killick
T.R.
Freund
S.M.
Fersht
A.R.
Real-time NMR studies on folding of mutants of barnase and chymotrypsin inhibitor 2
FEBS Lett.
1998
423
110
112

62.

Encell
L.P.
Friedman Ohana
R.
Zimmerman
K.
Otto
P.
Vidugiris
G.
Wood
M.G.
Los
G.V.
McDougall
M.G.
Zimprich
C.
Karassina
N.
et al.
Development of a dehalogenase-based protein fusion tag capable of rapid, selective and covalent attachment to customizable ligands
Curr. Chem. Genomics
2012
6
55
71

63.

Reetz
M.T.
Bocola
M.
Carballeira
J.D.
Zha
D.
Vogel
A.
Expanding the range of substrate acceptance of enzymes: combinatorial active-site saturation test
Angew. Chem. Int. Ed Engl.
2005
44
4192
4196

64.

Morley
K.L.
Kazlauskas
R.J.
Improving enzyme properties: when are closer mutations better?
Trends Biotechnol.
2005
23
231
237

65.

Lehmann
M.
Loch
C.
Middendorf
A.
Studer
D.
Lassen
S.F.
Pasamontes
L.
van Loon
A.P.G.M.
Wyss
M.
The consensus concept for thermostability engineering of proteins: further proof of concept
Protein Eng.
2002
15
403
411

66.

de Juan
D.
Pazos
F.
Valencia
A.
Emerging methods in protein co-evolution
Nat. Rev. Genet.
2013
14
249
261

67.

Kuipers
R.K.P.
Joosten
H.-J.
Verwiel
E.
Paans
S.
Akerboom
J.
van der Oost
J.
Leferink
N.G.H.
van Berkel
W.J.H.
Vriend
G.
Schaap
P.J.
Correlated mutation analyses on super-family alignments reveal functionally important residues
Proteins
2009
76
608
616

68.

Nobili
A.
Tao
Y.
Pavlidis
I.V.
van den Bergh
T.
Joosten
H.-J.
Tan
T.
Bornscheuer
U.T.
Simultaneous use of in silico design and a correlated mutation network as a tool to efficiently guide enzyme engineering
Chembiochem
2015
16
805
810

69.

Wang
C.
Huang
R.
He
B.
Du
Q.
Improving the thermostability of alpha-amylase by combinatorial coevolving-site saturation mutagenesis
BMC Bioinformatics
2012
13
263

70.

Martin
L.C.
Gloor
G.B.
Dunn
S.D.
Wahl
L.M.
Using information theory to search for co-evolving residues in proteins
Bioinformatics
2005
21
4116
4124

71.

Fodor
A.A.
Aldrich
R.W.
Influence of conservation on calculations of amino acid covariance in multiple sequence alignments
Proteins
2004
56
211
221

72.

Nov
Y.
When second best is good enough: another probabilistic look at saturation mutagenesis
Appl. Environ. Microbiol.
2012
78
258
262

73.

Jacobs
T.M.
Yumerefendi
H.
Kuhlman
B.
Leaver-Fay
A.
SwiftLib: rapid degenerate-codon-library optimization through dynamic programming
Nucleic Acids Res.
2015
43
e34

74.

Li
W.
Cowley
A.
Uludag
M.
Gur
T.
McWilliam
H.
Squizzato
S.
Park
Y.M.
Buso
N.
Lopez
R.
The EMBL-EBI bioinformatics web and programmatic tools framework
Nucleic Acids Res.
2015
43
W580
W584

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected]

Supplementary data

Comments

0 Comments
Submit a comment
You have entered an invalid code
Thank you for submitting a comment on this article. Your comment will be reviewed and published at the journal's discretion. Please check for further notifications by email.