DaReUS-Loop: a web server to model multiple loops in homology models

Abstract Loop regions in protein structures often have crucial roles, and they are much more variable in sequence and structure than other regions. In homology modeling, this leads to larger deviations from the homologous templates, and loop modeling of homology models remains an open problem. To address this issue, we have previously developed the DaReUS-Loop protocol, leading to significant improvement over existing methods. Here, a DaReUS-Loop web server is presented, providing an automated platform for modeling or remodeling loops in the context of homology models. This is the first web server accepting a protein with up to 20 loop regions, and modeling them all in parallel. It also provides a prediction confidence level that corresponds to the expected accuracy of the loops. DaReUS-Loop facilitates the analysis of the results through its interactive graphical interface and is freely available at http://bioserv.rpbs.univ-paris-diderot.fr/services/DaReUS-Loop/.


Simultaneous modeling of multiple loops
The original DaReUS-Loop protocol is based on modeling a single loop at a time: it requires an initial model and a single loop region, which is re-modeled.
In contrast, the web server can accept homology models with multiple loop regions, up to 20. As shown in the flowchart (see Figure 1), it is still based on considering a single loop region at a time, but it runs in parallel, predicting one loop while keeping the other loops constant. The server supports three different approaches on how to treat these other loops, in order to avoid clashes among them. Note that this is irrelevant if there is only one loop region.
• Remodeling: The other loops are kept in their initial configuration (from the input structure file).
• Modeling: The server first builds a consensus model, choosing the top candidate of each loop. Note that other candidates might be considered in case of loop-loop clashes between top candidates. Then final models for every loop are built using this consensus structure.
• Advanced modeling: All loops are modeled independently, and all other loops are omitted as gaps. In this mode the loop accuracy is slightly improved at the cost of introducing gaps in the final models.  All three modes were tested using the two CASP11 and CASP12 test sets. The average backbone flanked RMSD between the best of top 10 predictions and the native targets are shown in Table S3. Note that results are reported for the high confidence predictions. In summary, we found remodeling mode to be (on average) the most accurate, followed by advanced modeling mode, followed by modeling mode. Starting from the initial homology models, remodeling mode performs slightly better than our previous single loop Comparison is between the three prediction modes of the web server. All the values reported in this table correspond to the best flanked RMSD (Å) over top 10 models.
modeling method, DaReUS-Loop. On the other hand, starting from gapped homology models, advanced modeling performs better than modeling scenario and on par with the two remodeling and single loop modes. We speculate that the reason why remodeling works the best is that the initial model often contains decent (albeit suboptimal) conformations for most of the loops. If the user has expert knowledge on which loops are initially poor, we believe that advanced loop modeling works better. Another use case for advanced loop modeling would be if experimental information are available. In that case, after advanced loop modeling, the Cartesian combination of all loop candidates could be considered, applying the experimental data as a highly selected filter.
We report the performance of the server for modeling loops that connect different secondary structures in Table S4. For that, all the loops in the benchmark have been divided into three main gropus, according to the secondary structures of their flanks: (i) α−α, (i) α−β and (i) β −β. The results of each modeling modes have been reported for every group. Results suggest that the performance is the best for modelling loops connecting two different α-helices, and is better for the loops joining one α-helix to a β-strand compared to the loops connecting two different β-strands.

Limitations of the web server
Initial homology model DaReUS-Loop requires an initial homology model to (re)model the loops. The model must already have the correct sequence, i.e. raw template structures are not accepted. The residue numbering must start from 1. DaReUS-Loop can model the loops of a single protein chain, that may contain only standard amino acids. DaReUS-Loop will model missing side chain atoms using OSCAR-star [3], but all backbone atoms must be present.
Protein sequence: The protein sequence may contain only standard amino acids. and must follow exactly the same numbering scheme as of the structure. For example, if the protein has residue "ALA 16", the 16th letter of the sequence (including gaps) must be "A".  T0762  T0766  T0773  T0796  T0800  T0807  T0817  T0818  T0821  T0854  T0856  T0861  T0879  T0889  T0891  T0902  T0909  T0920  T0928  T0944  T0945 CASP11 CASP12 Number of loops Frequency Figure S1: Distribution of loops in the test set. (a) Number of loops for every CASP target is reported.
The size of the loops is within the range of 5-30 amino acids and their frequencies are depicted here.
Loop length: Loops are represented as gaps in the input sequence or structure. Between two gaps, at least four non-gap residues must be present. The minimum loop length is 2, and the maximum loop length is 30. DaReUS-Loop cannot model N-or C-termini. It is still recommended to provide full sequences rather than truncated ones, since this leads to better sequence profiles. Otherwise, missing N-or C-termini in sequence or structure are ignored. The maximum number of loops in the structure is 20.
Initial model quality: DaReUS-Loop assumes that, other than the (gapped) loop regions, the initial homology model is of decent quality (TM-score > 0.5). In particular, the flank regions (the four residues adjacent to each gap) must be accurate. It is highly recommended to define gaps such that all flanks are in a helix or sheet region of the homology model.