The recent expansion in our knowledge of protein–protein interactions (PPIs) has allowed the annotation and prediction of hundreds of thousands of interactions. However, the function of many of these interactions remains elusive. The interactions of Eukaryotic Linear Motif (iELM) web server provides a resource for predicting the function and positional interface for a subset of interactions mediated by short linear motifs (SLiMs). The iELM prediction algorithm is based on the annotated SLiM classes from the Eukaryotic Linear Motif (ELM) resource and allows users to explore both annotated and user-generated PPI networks for SLiM-mediated interactions. By incorporating the annotated information from the ELM resource, iELM provides functional details of PPIs. This can be used in proteomic analysis, for example, to infer whether an interaction promotes complex formation or degradation. Furthermore, details of the molecular interface of the SLiM-mediated interactions are also predicted. This information is displayed in a fully searchable table, as well as graphically with the modular architecture of the participating proteins extracted from the UniProt and Phospho.ELM resources. A network figure is also presented to aid the interpretation of results. The iELM server supports single protein queries as well as large-scale proteomic submissions and is freely available at http://i.elm.eu.org.
The interactions of Eukaryotic Linear Motif (iELM) web server facilitates the exploration of short linear motif- (SLiM) mediated interfaces within protein–protein interaction (PPI) networks (1). The importance of SLiMs in the regulatory and signalling mechanisms of the cell is becoming increasingly apparent, as highlighted by their use as molecular switches coordinating phase transitions in the cell (2) and their increasing association with disease (3–5). SLiMs are key components in a wide range of biological pathways and are known to act as sites for post-translational modifications such as phosphorylation or ubiquitination, as targeting signals for particular subcellular locations and as ligand-binding sites for protein recruitment (6,7). The majority of known motifs bind onto the surface of globular domains and exhibit specificity for a particular subgroup of a domain family (1). SLiMs tend to be just 3–10 amino acids in length with only 2–5 residues responsible for the majority of the binding affinity and specificity (6). This means that discriminating bioinformatically between a stochastic match and a result of biological relevance is fraught with difficulties (8).
A number of resources have undertaken the task of annotating experimentally validated SLiM classes with the most notable examples being the Eukaryotic Linear Motif (ELM) (3), MiniMotif (9) and ScanSite (10) databases. These resources also allow searching of protein sequences for novel instances of these annotated classes using regular expression patterns or position-specific scoring matrices. However, due to the high likelihood of motifs occurring in a stochastic manner, the use of pattern matching alone produces a large number of false positive hits (6). Methods have, therefore, been developed to incorporate additional filters based on the attributes of SLiMs, including sequence conservation (11–13), structural availability (14–16), biophysical feasibility (17) and biological keywords (18). Recently, a number of de novo motif prediction tools have also emerged, capable of predicting new classes of SLiMs (19–22). However, difficulties arise in removing the experimental bias towards medically relevant proteins as well as biases due to evolutionary relationships (12).
A number of resources have been developed using PPI data, to help predict the SLiM functional class associated with a particular protein-binding domain. Dilimot (21) and SLiMFinder (19) use the over-representation of sequence motifs in proteins, known to interact with a particular globular domain, to predict the regular expression of the binding SLiM; the ADAN database (23) uses high-resolution structures to predict SLiM-mediated interactions for well-known modular protein domains (SH3, SH2, WW, etc). In contrast, NetworKIN (24) employs interaction data to predict which kinase is responsible for a particular phosphorylation site. The identification of SLiM-mediated interactions within PPI data on the fly has however, to the best of our knowledge, not been investigated. To alleviate this, we introduce the iELM server that uses the annotated ELM regular expressions, especially trained Hidden Markov Models (HMMs) based on the manual annotation of SLiM-binding domains and PPI data to identify SLiM-mediated interactions. In addition, iELM takes into consideration many of the important attributes of SLiMs identified in some of the aforementioned studies, including the tendency of SLiMs to occur in regions of intrinsic disorder (25) and the propensity of functional motifs to be evolutionary conserved (6,13). The iELM web server allows the identification of SLiM-mediated interactions associated with a protein of interest or within a users’ PPI network.
THE iELM ALGORITHM
The iELM algorithm has been previously described and benchmarked (1) and can be summarized as follows: iELM assesses binary protein associations for SLiM-mediated interactions using the ELM annotated regular expressions together with HMMs (26) trained on manually annotated SLiM-binding domains and their orthologs. The assessment of whether or not a binary interaction is SLiM-mediated can be divided into four sections. The first module uses the 3DID database (27) to check if the two proteins interact via a domain–domain interaction. If they do not, the second and third parts occur simultaneously to assess each protein for SLiM and SLiM-binding domain matches. In the second part, SLiMs are identified using the regular expressions annotated by the ELM resource, and scored using the SLiMSearch algorithm (12) based on the conservation of the motif in a multiple sequence alignment of the queried protein and its orthologs (12). The predicted SLiM and its surrounding amino acids are also assessed by the IUPred algorithm (14) for their propensity to be in a region of intrinsic disorder (14). In the third part, SLiM-binding domains are identified by HMMs trained to recognize SLiM-binding domains using the HMMSearch programme (26). An option is also available to search using Pfam HMMs (28); however, these domains do not take into account the specificity of motifs for subgroups of a domain family. If a complementary SLiM and SLiM-binding domain partnership exists within the two associated proteins, the algorithm uses a cut-off system based on the results from the benchmarking data sets (see Supplementary Figure S1), as well as recommendations present in the respective papers (1,12,14). The respective cut-offs are 0.3 for disorder scores, 0.6 for motifs scores and 0.35 for domain scores. Any scores below these values will not be returned by the web server.
The calculations by iELM are time-consuming and therefore, to ensure the results from the iELM server are returned in a reasonable time, the majority of the data is precalculated. The HMMs for SLiM-binding domains (1) were used to scan the human UniProt database (29) and all hits above a predefined cut-off were recorded. The precalculated conservation scores were calculated using the SLiMSearch algorithm based on a multiple sequence alignment of orthologous proteins identified using the Gopher programme (30) from a database of 70 complete EnsEMBL proteomes (Ensembl 59) (31). The SLiMSearch algorithm used all the SLiM classes annotated within the ELM database. Disorder scores for each motif were calculated using IUPred. All the protein–protein associations annotated within the STRING database (version 9.0 – STRING score >0.6) (32) were assessed by iELM for SLiM-mediated interactions.
Technical details of the web server
The iELM web server is freely available at http://i.elm.eu.org with no login required. The server aims to provide a user-friendly interface for exploring a protein or proteome of interest for SLiM-mediated interactions. The server can be queried in two ways: ‘protein iELM’ searches the precalculated high-quality associations (score >0.6) from the STRING resource for SLiM-mediated interactions, whereas ‘proteomic iELM’ allows users to explore their own protein–protein interactome of interest for SLiMs. The server also provides a list of all 835 annotated linear motif-binding domains that can be freely downloaded at http://i.elm.eu.org/domains.
For a single query protein, the ‘protein iELM’ server searches a precalculated database, based on results of the iELM algorithm using the high-quality interactions from the STRING database (see Figure 1).
A single protein ID is required as input, with a drop-down menu available to specify the type of sequence ID, which is subsequently used to query the ID mapping service provided by UniProt. The user can also choose between the especially trained iELM HMMs and the Pfam HMMs. Upon submitting the job, precalculated data are searched ensuring results are returned promptly.
The output is divided into a tabular and a graphical display:
The tabular output (see Figure 2A) consists of the two tables: the first table (if applicable) consists of SLiMs found within the query protein; the second table (if applicable) consists of SLiM-binding domains found within the query protein. Both tables are divided into three parts: the left part contains the UniProt ID of the motif-containing protein, the motif type (ELM functional class), the location of the motif, its sequence and the associated scores. The central portion shows the UniProt ID of the protein containing the motif-binding domain, the domain name (Pfam) and the domain score. The final part provides a link to Pepsite (17), via the ‘Structure’ button, for a structural prediction of the interaction and a biophysical feasibility assessment (if applicable). The table is fully searchable and can be copied to the clipboard, printed or downloaded as a comma-separated values (CSV) document.
The graphical output (see Figure 2A) displays a representation of the predicted SLiMs and SLiM-binding domains along with the modular architecture of the query protein extracted from the UniProt database and the annotated phosphorylation sites from Phospho.ELM (33). The modular architecture predicted by the iELM method is divided by colour into SLiM functional types, as classified by ELM (Ligand, Targeting, Cleavage and Modification), with the annotated instances from ELM outlined in green. A key describing these types is fully clickable enabling the filtering of both the graphical and tabular content. Tool tips are also integrated to allow the user to gain immediate information on individual SLiMs, SLiM-binding domains and UniProt domains, as well as to help the user interpret the output. The annotated domains are linked to the UniProt database and the individual predicted motifs are also clickable resulting in the filtering of the tabular content.
This section allows the user to input an individualised PPI network that is searched using the iELM algorithm (see Figure 1).
The user may submit either a tabulated list of interactions or a list of IDs that will be searched in an all-against-all manner. Once again, a drop-down menu is available to specify the type of ID that the user wishes to input and for the type of HMMs the user wishes to use. There is a limit of 75 000 interactions for a tabulated list and 400 IDs for an all-against-all search. Upon submitting the job, the user is redirected to a wait page while the results are calculated. The waiting time is normally less than 5 min.
As with the iELM section, the output is divided into two sections:
The tabular output is of the same structure as described in ‘protein iELM’ (see Figure 2A), except only one table is displayed containing all the interactions, the originally queried protein is displayed next to the converted UniProt ID and there is an additional button called ‘Interaction’. Clicking on this button leads to the production of a graphical representation of the PPIs linked to this interaction. If there are associations that are not predicted to be SLiM-mediated, an additional table is displayed in the left-hand column for users’ inspection. In the same column, if any of the IDs submitted fail to be converted, a link is displayed that connects to a page displaying these proteins.
The graphical output contains the modular architecture as outlined above, as well as a network of all the connecting interactions in one connected cluster of up to 75 proteins (Figure 2B). On the initial production of the results page, a network is displayed based on the best scoring SLiM-mediated interaction; pressing the aforementioned ‘Interaction’ button in the table can alter this. The edges of the network are coloured depending on the type of interaction (ELM type) and the nodes are coloured depending on whether they contain a SLiM, a SLiM-binding domain or both. Clicking on the ‘Interaction’ button also reveals the globular architecture of the interacting proteins of interest (as described in ‘Protein iELM’ Section).
Currently, only the human proteome is fully searchable, however, in the near future we plan to include additional model organisms. We also wish to incorporate an additional section that will allow users to search PPIs with their own regular expression and SLiM-binding domains. We will update iELM regularly to ensure newly annotated binding domains are incorporated into the precalculated data. To further facilitate our annotation process, we have included a form in the domains section, which allows users to inform us of known linear motif-binding domains that are not presently annotated in iELM.
The iELM web server is, to the best of our knowledge, the first algorithm that facilitates the exploration and identification of SLiM-mediated interactions within PPI networks on the fly. The user-friendly platform allows enquiries at the single protein level as well as within large-scale proteomic studies. The iELM resource can, therefore, be useful in guiding experimental studies and facilitating the analysis of pathways within PPI networks. To accommodate a wide range of users, the server supports multiple database types as input format and allows the download of results as easily parsable CSV data file. The web server is freely available at http://i.elm.eu.org.
Supplementary Data are available at NAR Online: Supplementary Figure 1.
EMBL international PhD program fellowship to R.J.W. Funding for open access charge: EMBL.
Conflict of interest statement. None declared.
We would like to thank Kim Van Roey for his critical reading of the manuscript and Norman Davey for providing the SLiMSearch algorithm. We would also like to thank Leonardo Trabuco for providing the REST link to the Pepsite web server.