AlphaKnot 2.0: a web server for the visualization of proteins’ knotting and a database of knotted AlphaFold-predicted models

Abstract The availability of 3D protein models is rapidly increasing with the development of structure prediction algorithms. With the expanding availability of data, new ways of analysis, especially topological analysis, of those predictions are becoming necessary. Here, we present the updated version of the AlphaKnot service that provides a straightforward way of analyzing structure topology. It was designed specifically to determine knot types of the predicted structure models, however, it can be used for all structures, including the ones solved experimentally. AlphaKnot 2.0 provides the user’s ability to obtain the knowledge necessary to assess the topological correctness of the model. Both probabilistic and deterministic knot detection methods are available, together with various visualizations (including a trajectory of simplification steps to highlight the topological complexities). Moreover, the web server provides a list of proteins similar to the queried model within AlphaKnot’s database and returns their knot types for direct comparison. We pre-calculated the topology of high-quality models from the AlphaFold Database (4th version) and there are now more than 680.000 knotted models available in the AlphaKnot database. AlphaKnot 2.0 is available at https://alphaknot.cent.uw.edu.pl/.


Introduction
Machine learning models, especially large-scale ones, such as AlphaFold ( 1 ), ESMFold ( 2 ) or RoseTTAFold ( 3 ), make predicting 3D structures of proteins an enticing alternative to solving them experimentally.One can use them to predict the structure of a protein of choice, or just dive right into the wealth of structures already predicted by the creators to showcase the models' capabilities (with AlphaFold Database ( 4 ) boasting more than 200 million modeled structures).Such a vast amount of newly available data opens the way for a broad analysis of different structural characteristics.Additionally, nowadays, the ease and accessibility of predicting protein structures draw many non-experts into the field of structural biology.This makes having accessible tools that help assess the model quality a pressing need.Here, we present AlphaKnot 2.0 which answers that need by giving users an easy-to-use tool for topological analysis of protein structure, and in particular its ability to form knots. Knots, as protein motifs, were discovered almost 50 years ago ( 5 ,6 ).The motifs were found to be conserved within protein families ( 7 ,8 ), with a single exception being ATC / OTCase family which has both knotted and unknotted proteins ( 9 ).For many years, the set of knot types, differing in complexity measured in the number of crossings, to be found in proteins was very limited -to just four possibilities: 3 1 , 4 1 , 5 2 and 6 1 .Only recently a new type was discovered and verified (3 1 #3 1 ) that showed that double knots can exist in natural proteins ( 10 ,11 ).The speed of discovering topologically novel proteins now increased significantly with the power of machine learning algorithms.Analyses based on AlphaFold predictions show that even more types of knotted proteins exist than already known ( 12 ,13 ).There are reports of not only new families but also of new, and often more complex, knot types never before seen in a protein.An excellent example is provided by the protein with 7 1 knot (UniProtKB ID: Q9PR55; also present in our AlphaKnot database), first predicted by the Al-phaFold model ( 12 ) and soon confirmed using X-ray diffraction ( 14 ).Moreover, these analyses also discovered families with probable dual topology (with both knotted and unknotted members) ( 15 ).
The topology of the protein is an important factor in any structure analysis.Studies show that non-trivial topology, such as knots, influence the structure's ability to withstand degradation ( 11 ,16-18 ) and increase its thermodynamic stability (19)(20)(21)(22).Moreover, the knot can be a vital part of the protein's active site, and provide the cleft for the ligand to bind ( 23 ,24 ).As such, deepening our understanding of how topologically different proteins perform the same function can lead to the conception of novel antimicrobial compounds.One such well-studied pair is a knotted bacterial (TrmD protein) and unknotted eukaryotic (Trm5) methyltransferases.Most of the studies focus on TrmD -a vital enzyme, that is universal for bacteria (also for the ones the World Health Organisation lists as pathogens of global priority due to their antibiotic resistance).There were already several attempts to design inhibitors of the TrmD protein that act by targeting the knot and blocking protein function (25)(26)(27).A successful inhibitor that is selective for TrmD (not binding to Trm5) might be the start of a new type of antibiotics.However, even though the knotted proteins are studied from many different perspectives, the fundamental questions remain open, like is there a common role that a knot plays in protein structure?Or what is the evolutionary origin of a knotted structure?Did it emerge from the unknotted protein?We believe that immediate access to protein structures and their topology will aid in answering these important questions.
AlphaKnot 2.0 is an easy-to-use web server that simplifies the analysis of topological characteristics of structure models (proteins, nucleic acids, or other polymers) and gives them visual aids to help understand their complexities.It also includes a database of already analyzed AlphaFold-predicted structures that appear to contain topological knots.The Al-phaKnot 2.0 service is free and open to all users and there is no login requirement.

Topology detection
One of the crucial functionalities of the server side of Alpha-Knot 2.0 is topology detection.It is realized mainly using the Topoly ( 28 ) package, in particular , the HOMFL Y-PT polynomial.The user can select the type (and-if applicable-the number) of chain closures (required to transform an open protein chain into a proper mathematical knot) to optimize either computation time, or the accuracy of the result.Similarly, if the user chooses to calculate the knot map (knot matrix), density can also be controlled by the accuracy parameter.As a result we are reporting either the knot type (including composite knots) or 'Unknown' for topologies with > 12 crossings (appearing in extremely complex structures).
To give more detailed description of the topology, we are also using the knot_pull ( 29 ) package to calculate the Dowker-Thistlewaithe ( 30 ) notation to describe how does the protein chain, an inherently open curve, realize the closed, mathematical knot.This approach is deterministic, thus there are no additional parameters to specify.The knot_pull tool can be accessed via the submission server for new structures or after using the recompute option for records already available in the database.The results appear in the bottom part of the summary site as a new visualization.The trajectory generated by knot_pull can be easily downloaded using the designated button.
The input file should be in a PDB or mmCIF file format (can be either a single or multi-chain).Alternatively, the user can specify a protein ID from the MGnify database ( 31 ), and the corresponding prediction file from the ESM Atlas will be automatically pulled.If the input file contains the pLDDT data in the b-factor column (e.g.mmCIF files from the AlphaFold Model), it will be used for the assessment of the structure, and topological prediction validity.The topology is calculated based on the chain simplified to just the C α atoms.

Database of pre-calculated structures
Along with the server, we include a database of pre-analyzed models.It includes all AlphaFold DB v4-predicted protein structures for which models with average full-chain pLDDT value higher or equal to 70 were proposed, and for which nontrivial topology has been detected ( > 680k structures).This was done in two steps, by first detecting the knot through 100 random chain closures for all high-quality models and then recalculating with 500 closures the structures that appeared to have knots.Additionally, for all knotted (based on Al-phaFold predicted structures) structures of < 400 amino acids in length, we ran the ESMFold model ( 2) to generate predicted 3D structures and then applied the same knot identification method as in the case of the AlphaFold structures and compared the resulting topology.Since calculating the full knot maps for all those structures would be infeasible, we have left the users the opportunity to request a more detailed analysis of a database structure through a two-click 'Recompute in Web Server' tool.To compare, the first version of the Alpha-Knot database had knot maps calculated for all its proteins but it contained topological data of only 21 proteomes from the AlphaFold Database v1.

Visualization
A customized and extended version of the PDBe Mol* Viewer v3.1 ( 32 ) is used for visualizing the protein structures and simplification trajectory from knot_pull.Knot maps (knot matrices) ( 33 ) are drawn using Matplotlib ( 34 ) library for Python.To help users see the topological complexities in the structure, W 189 Figure 1.Web Server features.( A ) Submission options-default and additional parameters.The user can choose from several options of how the topology will be calculated in the str uct ure (including the accuracy).New features provided with the updated AlphaKnot 2.0 are marked with blue and the AlphaKnot's logo.Please note, that the calculation time is directly related to the selected parameters but also the length and complexity of the str uct ure.For a single str uct ure, it varies from a few seconds to several hours.( B ) Frames showing simplification of the str uct ure using knot_pull algorithm (based on UniProtKB ID: F1QYU7).
we are using knot_pull ( 29 ) to generate a trajectory of structure simplification steps.

Submission server
The server performs an analysis of the topology of the input structure which can be given as a PDB / CIF with one or more chains, or a MGnify ( 31 ) protein ID.All correctly formatted structure files can be submitted, however, the ones generated with the AlphaFold (or similar) algorithm will return more information, such as quality analysis (based on pLDDT values).The submitted structure will be processed according to the parameters chosen by the user (Figure 1 ).
The output is a detailed, interactive web page with many different structure visualizations.We present the user with a knot map that shows where (based on the sequence indices) the knots are located within the structure.This matrix shows the protein's knot fingerprint and is useful for finding slipknots ( 7 ,35 ).The protein chain has a slipknot when it is overall unknotted but has an internal knot (found on any given sub-chain).Many slipknotted proteins are found within transmembrane ion transporters (with S3 1 topology) ( 36 ).Given that the main knot detection algorithm we use is probabilistic, the user can see how the probability cutoff changes the knot landscape on the map using a slider.It is particularly useful for detection of knots of lower probability.Next, the user can mark the position of the knot (knot core) directly on the structure.By using the pLDDT button the marked region is colored by pLDDT values to show the quality of the modeling.By using the Rainbow button the knotted region is colored in a rainbow gradient (from red to blue).All the knots found by the algorithm within the structure can be visualized this way.
More often than not, the knot is not an easily identifiable part of the protein structure.In some cases, the knot spans hundreds of amino acids, which can be difficult to recognize, even for a trained eye.Therefore, in the updated version of the web server, we use a new tool for simplifying the structure-knot_pull ( 29 ).With its trajectory of continuously simpler representation of the structure, one can determine exactly how the knot manifests in the structure.Thanks to the knot type calculations provided by knot_pull, we also give the user more detailed information on where is the gap in the knot (i.e. the implicitly connected ends of the protein chain) based on the crossing order in the Dowker-Thistlewaithe notation.This can facilitate an even more precise comparison between different realizations of the same knot type.
Thanks to the database that exists alongside the server, we now also provide additional information regarding other calculated protein structures.We list the most similar proteins in the AlphaKnot database (by default with 70% sequence identity) and their model's knot type, enabling a comparison of the topology between the model of the queried protein and its homologs.The topology is thought to be conserved between similar proteins ( 7 ,36 ), thus any differences might indicate that the model is not topologically correct and should be treated cautiously.All new features provided by the web server are easily accessible for all database proteins via a two-click "Recompute in Web Server" submission button on each protein's page.

Database
The size of the database of AlphaKnot ( 33 ) increased a hundredfold (from 6000 to > 680 000 protein entries).It was done by calculating the topology of most of the models available in the latest (4th) version of the AlphaFold Database ( 4 ).To provide high-quality data we processed only models with the average full-chain pLDDT above 70.A minimal protein page consists of two tabs: one with the latest AlphaFold model ( Knotting Data -AF v4 ) and the Protein information tab.Additional tabs include models of the protein either available in the first version of the AlphaFold Database ( AF v1 ) or modeled by us with the ESMFold algorithm ( ESM v1 ). Figure 2 shows the contents of a protein page and Knotting Data -AF v4 tab, with additional elements marked with dashed lines.These will be available after recomputing the protein in the Web Server.From each page, the user can get the most important information about the knot present in the protein model, such as the position of the knot in the structure ( knot core ), its quality ( knot pLDDT ) and type ( main knot ).W 191

Distinctive features of updated AlphaKnot
The AlphaKnot was updated to include tools that help the user assess the topological correctness of their modeled structure and gather information about all the knotted models predicted with the AlphaFold.In particular, the two components of the AlphaKnot -web server and database, are intrinsically connected.The web server uses the information from the database and the database pages provide a quick job submission to the web server option (to unlock more information about the protein model other than the default, like the knot map, simplified structure, or the topology of homologs).

Visualization of the simplified structure
Knots in proteins are usually difficult to observe in the 3D structure due to their complexity.In AlphaKnot 2.0 we utilize a tool that gradually simplifies the structure-knot_pull ( 29 ).It smooths the protein chain by approximating the process of pulling atoms towards the chain ends (N-and C-terminus).Starting from the native structure, the visualization shows each step of the simplification and ends with a string-like protein chain with a knot.The user can view the visualization in the form of a movie which can be downloaded in the mp4 format, or select a specific frame for closer analysis.Additionally, the structure can be colored with a rainbow gradient for an even easier knot identification.

Analysis of topology of homologous proteins
AlphaKnot 2.0 analyzes knotted homologous proteins of the queried protein submitted in the Web Server and returns their knot types.In particular, this feature can be used to find knotted homologs to new proteins with de novo designed sequences.The homology is based on sequence identity (userspecified parameter with the default value of 70%).The analysis is done locally using the MMseqs2 tool ( 37 ) and is based on proteins available in AlphaKnot's database, thus knotted and with an average model pLDDT > 70.As a result, the user is provided with a list with direct links to the homologs in the database, and information about their topology (AlphaFold model and ESMFold model where available) and length.This feature allows an easy way of noticing differences in knot types of similar proteins, which might be an indication of incorrect modeling.This is especially important in the field of knotted proteins since the machine learning algorithms were trained on a small number of such proteins and might produce artifacts leading to non-trivial topologies ( 13 ).

Information about > 680 0 0 0 knotted AlphaFold models
The database of AlphaKnot is now expanded to contain hundreds of thousands of proteins, which models predicted by AlphaFold are knotted.Due to a significant number of new entries, manual verification of the models (which we did for the AlphaKnot 1.0) is no longer possible.However, we believe that any manual confirmation of the automatically generated models will benefit the user.Therefore, with the new version, we enabled users to leave comments on each protein model and to vote by choosing one of the three categories Knot , Unsure and Artifact .

Extensive database filtering
The vast number of knotted models in the AlphaKnot 2.0 database provides an opportunity for large-scale analyses of knotted proteins.For convenient access to the data we store, the user can specify precisely what they seek by using the expanded Advanced search functionality, which applies filters on the database entries and is also available as an API.In particular, there are three main categories for the queries, that relate to the information about the protein, the model, or the knot.The search criteria include protein name, gene name, taxonomy (kingdom, family, organism), and cross-references (to UniProt, InterPro, Pfam, PDB and EC databases) for the protein.For model-related searches, the user can specify average pLDDT values (either for the whole chain or only the knotted region), model category, and chain length.Lastly, the criteria for the knot are its type, probability, and knot core's range and length.It is possible to join the search criteria with logical operators (AND, OR NOT).

Comparison to other services
As for the server section of the AlphaKnot 2.0 Database, we are aware of an alternative: 'Protein Knot server' which also allows users to detect topology in the provided protein structure ( 38 ).The service provided via AlphaKnot 2.0, however, provides the user with much more flexibility in parameters, allowing for the detection of not only knots (with two different methods) but also slipknots and the calculation of the full knot map and simplified structure via Knot_pull.Additional tools may also help with differentiating artifacts and identification of similar proteins already available in the database.For data already available in AlphaKnot 2.0, the user is also provided with additional biological annotations.The 'KnotGenome' ( 39 ) on the other hand allows for topological detection in chromosomal data and 'KymoKnot' ( 40 ) for linear and circular polymers.We are also aware of the 'PyKnot' plugin for visualization and characterization of knots in proteins ( 41 ).Also in this case the user cannot detect slipknots and construct knot maps, limiting analysis capabilities.As for the alternatives for the Topoly package ( 28 ), we acknowledge the respective list of the following packages: 'GISA' ( 42 ), 'Knoto-ID' ( 43 ), 'SKMT algorithm' ( 44 ) and 'TEPPP' ( 45 ).
From the database perspective, the closest available alternative is the KnotProt Database ( 46 ).In comparison, however, the AlphaKnot Database analyses all proteins provided in the AlphaFold Database (protein predictions) instead of the PDB Database (protein structures obtained from the experimental methods) thus differing both in data type and size.Additionally, while both servers provide the option to calculate the topology of the given protein structure, AlphaKnot boasts more options to choose from including more detailed parameters and the use of the simplification algorithm.From further alternatives, we acknowledge 'PconsFam' which also provides simplified topological information for Pfam families' representatives ( 47 ).In this database, protein structure predictions are coming from the CONFOLD algorithm ( 48 ) and the focus of the database is shifted more towards predicting contact maps.

Summary
AlphaKnot 2.0 is a web server for detailed visualization of knotting in user-provided structures, as well as a database of W 192 Nucleic Acids Research , 2024, Vol.52, Web Server issue entanglements in AlphaFold-predicted protein models.This updated service offers several new features that aid with the analysis of the topology and help the user assess the topological correctness of their query.
The web server gives detailed topological information on the structure (provided by the user as either a mmCIF-or PDBformatted file), as well as a visual guide to understand it with a trajectory of simplification steps calculated by the knot_pull package.Additionally, it is integrated with the AlphaKnot database to provide information about knot types of proteins (from the database) sequentially similar to the query.Given that topology should be conserved between similar proteins, if the query has a different knot type it might indicate that the model is not topologically correct and should be treated with caution.All new features provided by the web server are easily accessible for all database proteins via a two-click job submission button on each protein page.
The database of AlphaKnot was also significantly updated.By using data from the latest (4th) version of the AlphaFold Database, we expanded the number of knotted models to more than 680 000 (1 140 000 models including those generated with ESMFold).With such a vast amount of data, manual verification is not possible, thus we provide the users with the option to leave a comment and vote (choosing 'Knot', 'Artifact' or 'Unsure') on each model.To further help the user, each protein shorter than 400 amino acids has an ESMFold model generated to allow for comparison between predictions of these two ML methods.

Figure 2 .
Figure 2. Database protein page-default and additional features.Additional features available after recomputing the model in the Web Server are marked with dashed lines.New features provided with the updated AlphaKnot are marked with the AlphaKnot's 2.0 logo.( A ) UniProtKB ID of the protein, model number, and protein name.( B ) Available tabs.( C ) Categories voted by users.( D ) Button for submitting the current model to our Web Server.( E ) Basic information about the protein and the model.( F ) knot map with the slider applying different knot probability cutoffs.( G ) 3D str uct ure of the model.( H ) Trajectory of simplification steps.( I ) Information about found knots.( J ) Comment section.