MutationExplorer: a webserver for mutation of proteins and 3D visualization of energetic impacts

Abstract The possible effects of mutations on stability and function of a protein can only be understood in the context of protein 3D structure. The MutationExplorer webserver maps sequence changes onto protein structures and allows users to study variation by inputting sequence changes. As the user enters variants, the 3D model evolves, and estimated changes in energy are highlighted. In addition to a basic per-residue input format, MutationExplorer can also upload an entire replacement sequence. Previously the purview of desktop applications, such an upload can back-mutate PDB structures to wildtype sequence in a single step. Another supported variation source is human single nucelotide polymorphisms (SNPs), genomic coordinates input in VCF format. Structures are flexibly colorable, not only by energetic differences, but also by hydrophobicity, sequence conservation, or other biochemical profiling. Coloring by interface score reveals mutation impacts on binding surfaces. MutationExplorer strives for efficiency in user experience. For example, we have prepared 45 000 PDB depositions for instant retrieval and initial display. All modeling steps are performed by Rosetta. Visualizations leverage MDsrv/Mol*. MutationExplorer is available at: http://proteinformatics.org/mutation_explorer/.


Introduction
Mutations are essential to the persistence of life itself, enabling evolution of species in response to environmental pressures -a process harnessed in modern protein engineering.Mutations can also cause challenging diseases.Throughout the biological sciences, we encounter perhaps the fundamental biophysical question: How does a mutation (or set of mutations) contribute to an observed or desired phenotype?Specific flavors of this question include the search for explanations of phenotype variations, bacterial drug resistance or human genetic disorders-to name only a few ( 1 ).Many questions remain unanswered around cancer tissue, where evolution towards cell proliferation is so accelerated.In cancer, the molecular impact of mutation demands particular attention to understand disease etiology and progression.
Sequencing technologies have advanced rapidly in recent decades.As a result, over 50 000 organisms have been sequenced since 2018.Next generation sequencing (NGS) has reduced the cost of a complete human genome to less than US$1000 ( 2 ) and today the Genome Aggregation Database (gnomAD 3.1.2)includes over 76 000 complete human genomes and over 125 000 exomes (sequenced protein-coding regions) ( 3 ).The most frequently observed changes in proteins are missense mutations, where one amino acid is substituted by another ( 3 ).Except for proline, all amino acids in a peptide chain share the same repeating backbone bonds atoms: -N-C α-C-.Thus, downstream effects of missense mutations arise from biochemical changes introduced by new side chain(s).In cases of side chains which bring great energetic strain, it can be hypothesized that a protein is destabilized, perhaps to the point that it can no longer fold correctly.
In cancer, NGS increasingly informs risk, diagnosis, prognosis, and therapeutic strategies ( 4 ).Even though NGS is not frequently employed for routine laboratory and clinical diagnostics (as only special indications are directly diagnosed by NGS), the trend towards increased patient sequencing will continue, given cost reductions in sequencing and advances in personalized medicine.
While sequencing can reveal, even predict, risks of phenotype effects or disease, as of today, sequencing alone cannot explain the molecular mechanisms by which missense mutations drive changes in protein function.Analyzing these mechanisms requires attention to protein 3D structural context at atomic resolution.Indeed, amino acids quite separated in sequence can interact closely in a folded protein.
Predicting the stability changes in a protein that occur due to mutations can be very computationally expensive and complex.Several webserver approaches have been developed to facilitate this task such as (5)(6)(7)(8), which use different approaches.However, some of these approaches are not interactive or cannot iteratively modify the protein being analyzed.
Here we present Mut a tionExplorer webserver, which unites the advances in gene sequencing with advances in protein structure determination, and cements them with state-ofthe-art computational methods and 3D graphics visualization.For any 3D protein model, Mut a tionExplorer allows both the fluid exploration of mutation sets and a more systematic treatment of mutations through sequence uploads.Supported uploads include data from sequencing experiments, multiple sequence alignments, or reference sequences.On the server's back end, Rosetta software ( 9 ) performs the mutations and calculates per-residue energies.On the web front end, mutated PDBs are displayed with the Mol* viewer ( 10 ).Coloring scales reflect the differences in energy or hydrophobicity of the original compared to mutated proteins.
Through this flexible architecture, Mut a tionExplorer integrates sequence and sequence variation with the exploding availability of 3D protein structures (both experimentally determined and modeled).
The Protein Data Bank (PDB) [ https://www.wwpdb.org] stores coordinates and related information on experimentally determined structures.However, most protein sequences are not directly represented in the PDB.Template-based modeling has long been available, and DeepMind's AlphaFold has now leveraged machine learning to model the full length of most transcripts ( 11 ).This opens the door to new modeling approaches which take protein structure as an input for further analysis.In particular, Mut a tionExplorer benefits greatly from AlphaFold, as our webserver can now reveal predicted mutation energetics for a far wider array of sequence inputs.

Rosetta
The Rosetta software suite performs all computational modeling steps in Mut a tionExplorer .Rosetta debuted from David Baker's group as a tool for structure prediction.Over the years, Rosetta has been extensively reworked and extended to encompass a variety of tools for protein design, docking, and other applications ( 9 ,12 ).Rosetta incorporates both physics and knowledge-based energy potentials, the latter potentials derived from principles of statistical physics and gleaned from analysis of high resolution structures in the PDB.
When predicting structures from sequence, Rosetta executes a full Monte Carlo conformational search for the given sequence.Many protein design protocols alternate between the design stage where the backbone conformation is held fixed, while a Monte Carlo algorithm focuses on replacement of the varying side chains, and a stage where the conformation of the newly designed sequence is explored.The design step of exchanging amino acids is accompanied by an overall side chain energy optimization to allow the local environment to adjust to the new arrival.Thus, prior to the design step, a pre-minimization is required.Without the pre-minimization mathematical artifacts are more likely to arise simply because the substitution and optimization could otherwise be driven by differences between the starting structure and the minimum determined by the Rosetta energy function.The preminimization typically moves the coordinates by only fractions of an Ångström, well within the error of experimental detection.
For the side chain optimization, Rosetta employs a rotamer library gleaned from prior exhaustive conformational analysis of the PDB.Compared to gradient-based optimization on the rugged landscape of a vast conformational space, this rotameric approach drastically reduces the search space and automatically filters out many unfavorable side chain conformations from consideration.
Rosetta writes per-residue energies into each PDB output file.Mut a tionExplorer writes these values into the B-factor column of the output file, allowing coloring of the entire protein molecule by its per residue energy or hydrophobicityeither in absolute values or as differences.That means that for each coloring scheme an individual PDB file is created.

Interface score
One of the visualization options shows the estimated effect of the mutation on binding energies in protein-protein complexes.To obtain these estimates, the InterfaceAnalyzer Rosetta mover ( 13) is called through PyRosetta ( 14) to calculate per residue binding energies of mutated and wild type structures.The difference (MUT -WT) is calculated for every residue.InterfaceAnalyzer calculates the binding energy by scoring the input structure twice: first, as a complex, and second, after moving the two sides of the interface away from each other, exposing the interface.The Rosetta energy scores of the unbound state are then subtracted from the scores of the bound one.Both the bound (input) and unbound structures have their side chains optimized prior to scoring.Since side chain optimization is stochastic, the mover is run 3 times, and the median score is taken for every residue.

RaSP
RaSP is a new deep-learning-based tool that rapidly estimates protein stability changes.RaSP predictions strongly correlate to scores from Rosetta calculations which demand longer compute times ( 15 ).With RaSP, MutationExplorer presents the user with a quick initial estimation of a mutation's (de)stabilizing effect, without having to wait for the longer full minimization process.The RaSP tool consists of two linearly linked networks.A self-supervised 3D convolutional neural network that has learned representations of protein structures is followed by a fully connected neural network that maps these internal representations to Rosetta protein stability changes.RaSP is optimized for accuracy in the range [-1.0, 7.0] kcal / mol, which is most relevant for revealing the loss-of-stability mutations underlying many diseases ( 16 ).

Mol* viewer
Structure predictions are visualized in Mut a tionExplorer with an adapted and extended version of Mol* ( 10 ), the successor of the NGL Viewer ( 17 ).The extensions of Mol* from MDsrv ( 18 ,19 ) have been incorporated into Mut a tionExplorer so that sequence alignments are integratively displayed near structure visualizations.During development we were in close contact with the maintainers of Mol*, and we are incorporating our extensions into the Mol* main code branch.Mut a tionExplorer structure visualization supports analysis in several complementary ways.After modeling the mutations, Mut a tionExplorer highlights structural changes with a color gradient.Alternately, through the 1D sequence alignment control, the user can easily highlight and analyze the modified regions of the structure.In order to make introduced mutations easier to examine, the alignment view from MDsrv was integrated and its capabilities were extended.In Mut a tionExplorer , this view now shows all the sequences of mutated versions of a protein and highlights the mutated loci in each sequence.Additionally, we developed a mutation highlighting approach which is active in the 3D structure visualization.Each mutated locus can be highlighted in a semi-transparent sphere, reducing search time and helping users to orient themselves as they rotate and translate the structure.The Mut a tionExplorer back end conveys not only per-residue energetic changes to the user interface, but also hydrophobicity and sequence conservation metrics.The expert can select any of these values for coloring the mutated protein.Mut a tionExplorer uses an adaptable color scale to visualize these differences so that the expert can tailor interpretation of impact to the various values in context of the protein of interest.When the mouse cursor is hovered over any amino acid, a panel appears which details the various computed values for the underlying residue.Clicking on a single amino acid selects it, and zooms in to the selected region.The side chain is displayed as well, to support regional visual inspection.From RaSP, every amino acid position is supported by an estimate of the global structure impact, should that position be mutated.The predicted values are shown by pressing the Ctrl key and left-clicking on the amino acid.

Sequencing data pipeline (VCF input)
For human variants provided in GRCh38 genomic coordinates ( 20 ), chromosome, position, and alt alleles may be input in Variant Call Format (VCF) ( 21 ).
We employ the Ensembl Variant Effect Predictor (VEP) ( 22) to analyze the VCF file and return impacted Ensembl transcript IDs and amino acid variants.The VEP often returns IDs for computationally predicted splice variants which have not been experimentally verified.We filter to retain only Ensembl transcripts which cross-reference to Swiss-Prot curated, canonical UniProt transcript IDs ( 23 ).Finally, Mut a tion-Explorer loads AlphaFold models ( 11 ) by UniProt ID and launches the analysis tools, as if mutations were input at protein level.

Webserver
The Mut a tionExplorer webserver is written in Python using the Flask framework with additional Javascript and is freely available for all users.

AlignMe
AlignMe is a software package and webserver for detecting similarities between proteins too low to be detected on the sequence level using standard methods ( 24 ).AlignMe is optimized for membrane proteins, but not limited to them.AlignMe offers a link on its result page that allows users to send their results to Mut a tionExplorer for 3D visualization, providing the deepest insight into an alignment.
We have made available this same Javascript interface for other sequence alignment servers to forward their results for 3D visualisation to Mut a tionExplorer .A link to Muta-tionExplorer can be easily integrated with a few lines of code.Further details are in the FAQs of Mut a tionExplorer .

Usage
There are two ways to define mutations for a structure when using the web page provided by the Mut a tionExplorer .
The Upload Structure and Mutations option is the way to go if the user wants to upload their own file or use a structure from a database such as the PDB or AlphaFold.
First, a 3D protein structure is selected in three different ways: by providing the file itself, by the definition of a PDB ID, or by the definition of an AlphaFold ID.Chains can be removed by clicking the Filter Structure button to clean up the structure.Additionally, the user can choose to minimize structures, calculate RASP predictions, and provide an email address to be notified when calculations are complete.

W 135
The mutations are defined in three different ways in the second step: by manual definition, by sequence alignment, and by definition of a target sequence.Manual definition is done by specifying which amino acid to switch at which position with a syntax like: ([chain]:[residue number][target mutation], example: 'A:12G,B:134T') If a sequence alignment is provided, the user simply selects the first sequence in the alignment to match to which chain in the structure.The definition of a target sequence is done either by uploading a fasta file, by writing out the target sequence, or by providing a UniProt ID for the chain that is to be modified.When this is done, the structure is ready to be mutated and the results will be calculated.

The VCF upload
Alternately, a user can upload human SNPs, sequencing data in VCF format.For this second option, missense variants are obtained from Variant Effect Predictor output, and an Al-phaFold model is selected automatically.In addition to the file upload, the user only has to decide on the minimization, the rasp calculation and whether or not to provide an e-mail address.

Via the ALIGNME server
It is also possible to import data from an AlignMe session.After the calculation of the results on the AlignMe web server, it is possible to export the result to the Mut a tionExplorer .Selecting the export button on the server imports the clustal file from the calculation into Mut a tionExplorer .The user just needs to select which sequence from the clustal file should be the base and target sequence, and provide the actual structural file for the sequences.This can be done by providing the ID of a PDB or AlphaFold structure.Alternatively, the file can be uploaded.Additionally, the user can choose whether the structures should be minimized and whether RASP predictions should be calculated.

Navigating variants
Mut a tionExplorer maintains a tree-structured list of previously explored variants.A user is free to return to any model in this tree, and define further mutations (which are added to the growing tree).The workflow is sketched in Figure 1 .Newly created mutations are highlighted by spheres in the 3D view, and in bright-red in the sequence view.

Coloring
All scores (energy , hydrophobicity , sequence conservation and interface binding) are available for coloration.For each score type, the user can switch colors between wildtype scores, mutation scores and their differences.
Informed by this residue-wise coloring flexibility, users can quickly hypothesize mechanistic impacts of mutation for entire molecular systems or intermolecular binding interfaces.

Navigation via sequence or 3D structure
A sequence alignment window is displayed beneath the structure window.Clicking on a residue in one window highlights it in the other, and vice versa.

Visualization
Hovering reveals information about each residue.Left clicking displays side chain atoms.The 'Change Visualization' button at the very bottom replaces the default sequence window with an interactive fine-tuner for color scales and overlaid spheres.The 'View Alignment' button at bottom left re-displays the sequence window.
Additionally, explanations of the current color scheme are shown in an information panel.For example, sequence conservation colorings indicate the severity of an amino acid substitution using a range from white (no mutation) to light blue for mutations within amino acid groups to dark blue for mutations across groups.Sequence gaps are shown in red.
For any color scheme, a color gradient bar provides a key to the score range.Changing the bar's flanking numeric values adjusts the displayed hue range.

Mutation energy preview
Whenever RaSP calculations are included at launch (the default), the RaSP estimates of energetic effect are available via a ctrl-left click on any amino acid residue.

Continuous mutation
From the results page you are free to add new variants to the current model.Previously entered variants automatically advance to subsequent model generations.

Energy optimization
Global and thorough energy optimization before protein mutation is essential for high-quality results.Otherwise, our experience is that Rosetta will find mathematically 'better' side chain conformations which are only distracting noise arising from the many small discontinuities between deposited atom coordinates and the Rosetta computational framework.Proper pre-optimization helps ensure that reported energetic differences are more directly related to the mutation itself.However, minimizing proteins, especially larger ones, is a time consuming step.Mut a tionExplorer offers three kind of minimizations.First, we provide a database of pre-minimized proteins from the PDB.Both backbone and side chain minimizations were performed for these.For proteins not contained in the database, the server offers either a short or a long side chain optimization, using the fixbb application.Alternatively, the user can upload their own minimized models.Command lines are available in the Tutorial section on the website.

A precalculated database
To build the pre-minimized database, we used a list of culled, non-redundant PDB identifiers from the PISCES server ( 25 ).For the proteins contained in the list, we performed backbone minimization using the relax application of Rosetta, followed by a side chain optimization using the fixbb application.This yielded a database of pre-minimized structures for speeding users towards mutant exploration, currently containing 45 000 models.
Since the calculations for the minimization can be very timeconsuming, the results for PDB and AlphaFold structures, which have not yet been optimized in advance, are added to the database so that the optimization does not have to be performed repeatedly.This is also done for the RaSP tool models.At the top, the result page is shown above a bubbled workflow schematic for the 'upload str uct ure and mutations' mode of the MUT A TIONEXPLORER server is shown.Below is a schematic of the (internal) workflow for str uct ure and mutation preparation.There, the schematic user steps (the light green VIEW bubbles) are HTML pages displa y ed in the browser.Shown in slate purple bubbles are the Rosetta modeling steps that are e x ecuted in the background on the server.The user first uploads (or selects) a protein str uct ure or model.Second, mutations are selected.Then a 'waiting room' page is displayed while the mutations are performed.The result screen (top large rectangle, light green border) contains both the 3D vie w er with the model of the mutated protein and below, linked to the 3D viewer, the sequence viewer shows the alignment.The same mutations are highlighted in red in both vie w ers.Ho v ering / clicking on a residue in either vie w er highlights the residue in both.A dditionally, a panel with details about the selected residue opens, including the energy / h y drophobicity v alue of the residue.L ef t mouse clic k adds an all atom licorice representation in the vicinity of the selected residue.A 'Ctrl-left mouse button' click opens a window with a fast estimate of the energy mutations at the given position will cause, calculated using RaSP ( 15 ).The left section of the window has a selection tree of all created models, rooted at the original protein ('mut_0') and branching to all subsequent mutations.'Color by' choices include absolute or relative energies, as well as hydrophobicity.The user can continue exploring mutations by selecting a model and defining new mutations.Additionally, there is an information section, displaying the total energy of the currently displa y ed v ariant, the parent from which the current model w as deriv ed together with the mutations that w ere perf ormed on the parent.Finally, the user can adjust the color scale and toggle on / off the spheres highlighting the mutated positions.

Limitations
Currently, Mut a tionExplorer has no potentials targeted for membrane proteins available.Especially for residues facing the membrane, a special score-function and preparation is desired.Other residues, in particular those outside of the membrane can be investigated with Mut a tionExplorer without limitations.Moreover, some mutations are generally challenging for Rosetta, foremost those from or to proline.Muta-tions from glycine may require conformational adaptation beyond the protocols used here.Mutations to glycine may cause proteins to be more flexible due to its larger torsion space.Ligands present in the PDB will be kept in rigid conformation, custom ligands will be removed.For advanced ligand handling, see ( 26 ).Mut a tionExplorer cannot handle very large structures, such as CryoEM structures available only in CIF format.Partial AlphaFold models for transcripts with W 137 > 2700 amino acids are not supported.We also caution that for large structures our default protocols for minimizations might be insufficient.Users should definitely minimize these on their local computer before upload.For this purpose, we recommend using the corresponding tutorial on Rosetta energy minimization via the command line, which can be found on our website.The fundamental challenge is the limited sampling of individual Monte-Carlo runs.A more reliable strategy for minimization is to create an ensemble of minimized models.

Examples
The examples described below can be found on the server in the corresponding section in the main menu.In addition, there are more examples on the website under the Applications section of the tutorial, such as modeling multiple conformations of a protein and back mutation of PDB structures to wild type.

RTEL1 VCF input example
As of this writing, 248 mutations in Telomere elongation helicase 1 (RTEL1) are classified as either 'Pathogenic' or 'Likely pathogenic' in the Clinvar database ( 27 ).Of these, 13 are missense mutations.No experimental structure for RTEL1 is presently deposited in the PDB.To find structural trends in pathogenicity, we uploaded the 13 Clinvar SNPs to Mut a tionExplorer in VCF format.Following our Rosetta based long-minimization and mutation protocols, Mut a tion-Explorer displays the AlphaFold model for canonical transcript Q9NZ71-1.11 residues are highlighted in the red 3D structure and sequence viewers.Two of these Clinvar variants (Chr20:63662544 A > G and Chr20:63695619 G > A) do not impact the canonical uniprot transcript, and thus are not depicted.Mut a tionExplorer 's 11 red spheres depict all variants in obviously well-structured (high confidence) stretches of the AlphaFold model.Many of these variants involve mutations to or from proline, and Mut a tionExplorer generally reports large energetic differences for these, consistent with general intuition around such mutations.Indeed, several fall at secondary structure transitional points near the ends of alpha helices.Perhaps more surprisingly, our Rosetta protocol calculated quite large energetic impacts ( > 25 REUs) for the M492I ( 28 ) and I699M ( 29 ), both implicated in severe dyskeratosis congenita.More subtle energetic changes are also reported, reminding us that energetic analysis by itself, while necessary, is often insufficient.As example, the impact of V516L was computationally analyzed with an energeticsfree model which based instead on 3D clustering patterns of pathogenic compared to benign variants ( 30 ).In that study V516L clustered strongly with pathogenic variants, and was hypothesized to subtly disrupt DNA binding by adjacent surface residues.Link to example: http://proteinformatics.org/ mutation _ explorer/ examples/ rtel1/ .

The GPCR autoproteolysis inducing domain
The GPCR autoproteolysis inducing (GAIN) domain is an extracellular domain of adhesion G-Protein coupled receptors, mediating receptor activation by releasing an autoproteolytically cleaved peptide from within itself ( 31 ,32 ).The GAIN domain therefore has two functions to serve-autoproteolysis and peptide release-which can be modulated by mutating the residues surrounding its cleavage site, the GPCR proteolytic site (GPS).In the rat ADGRL1 GAIN domain (PDB ID: 4DLQ), a conserved Trp residue, W804, has been demonstrated to abolish cleavage upon mutation ( 31 ).This is reflected in substantial increases in energy for the W804S mutation in Mut a tionExplorer -with a similar effect when mutating another conserved Trp, and W815S-by removing the bulky side chain.In the GAIN domain, two flexible regions termed 'flaps' have been shown to mediate solvent-exposure of the GPS ( 32 ).Point-mutations disrupting interaction of i.e.Flap 2 within residues 805-810, e.g. by Y806A, also result in energy increases not limited to the flap, but also in the interacting helix, hinting at a targeted disruption of the regions interactions, likely affecting overall GAIN domain dynamics.This example demonstrates the ability of Muta-tionExplorer to predict the impact of point mutations on protein regions.Link to example: http://proteinformatics.org/ mutation _ explorer/ examples/ agpcr _ demo .
The MDsrv implementation on which Mut a tionExplorer is build is available at: http://proteinformatics.org/ mdsrv-web .

A c kno wledg ements
We would like to express our gratitude to the team of developers and maintainers of Mol*, especially Alexander Rose, for many valuable remarks.The authors gratefully acknowledge the GWK support for funding this project by providing computing time through the Center for Information Services and HPC (ZIH) at TU Dresden.

Figure 1 .
Figure1.At the top, the result page is shown above a bubbled workflow schematic for the 'upload str uct ure and mutations' mode of the MUT A TIONEXPLORER server is shown.Below is a schematic of the (internal) workflow for str uct ure and mutation preparation.There, the schematic user steps (the light green VIEW bubbles) are HTML pages displa y ed in the browser.Shown in slate purple bubbles are the Rosetta modeling steps that are e x ecuted in the background on the server.The user first uploads (or selects) a protein str uct ure or model.Second, mutations are selected.Then a 'waiting room' page is displayed while the mutations are performed.The result screen (top large rectangle, light green border) contains both the 3D vie w er with the model of the mutated protein and below, linked to the 3D viewer, the sequence viewer shows the alignment.The same mutations are highlighted in red in both vie w ers.Ho v ering / clicking on a residue in either vie w er highlights the residue in both.A dditionally, a panel with details about the selected residue opens, including the energy / h y drophobicity v alue of the residue.L ef t mouse clic k adds an all atom licorice representation in the vicinity of the selected residue.A 'Ctrl-left mouse button' click opens a window with a fast estimate of the energy mutations at the given position will cause, calculated using RaSP( 15 ).The left section of the window has a selection tree of all created models, rooted at the original protein ('mut_0') and branching to all subsequent mutations.'Color by' choices include absolute or relative energies, as well as hydrophobicity.The user can continue exploring mutations by selecting a model and defining new mutations.Additionally, there is an information section, displaying the total energy of the currently displa y ed v ariant, the parent from which the current model w as deriv ed together with the mutations that w ere perf ormed on the parent.Finally, the user can adjust the color scale and toggle on / off the spheres highlighting the mutated positions.

Funding
Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) through CRC 1423 [421152132], sub-projects Z04 to P .W .H.; We also acknowledge the Federal Ministry of Education and Research of Germany (BMBF) and the Sächsische Staatsministerium für Wissenschaft, Kultur und Tourismus in the programme Center of Excellence for AI-research "Center for Scalable Data Analytics and Artificial Intelligence Dresden / Leipzig", project identification number: ScaDS.AI.This study was further funded by the Protein Interactions and Stability in Medicine and Genomics (PRISM) centre funded by the Novo Nordisk