ELM 2016—data update and new functionality of the eukaryotic linear motif resource

The Eukaryotic Linear Motif (ELM) resource (http://elm.eu.org) is a manually curated database of short linear motifs (SLiMs). In this update, we present the latest additions to this resource, along with more improvements to the web interface. ELM 2016 contains more than 240 different motif classes with over 2700 experimentally validated instances, manually curated from more than 2400 scientific publications. In addition, more data have been made available as individually searchable pages and are downloadable in various formats.


INTRODUCTION
Short Linear motifs (SLiMs) are small protein-interactionmediating modules that have unique properties (1,2). They play an important role in biological systems (3)(4)(5)(6) and the analysis of motifs in protein sequences remains an important step in protein research. The ELM database provides manually curated classes of motifs as well as instances thereof. Furthermore, it provides a web interface that allows users to explore possible instances of annotated classes in proteins of interest. More than 10 years since its inception (7,8) it remains a popular resource being continually used by scientists worldwide. It has proved invaluable for investigating interactions mediated by short linear motifs, be it for individual proteins (9)(10)(11), large-scale analyses (12)(13)(14), algorithm development (15), prediction of novel motifmediated interactions (16) or investigation of host-virus interactions (17)(18)(19)(20). Here we give an overview of the latest developments and new features introduced to the ELM resource (see Figure 1) since the last update (21).

RESOURCE DESCRIPTION
The ELM resource provides two main services, a database of short linear motif annotation and a tool that uses this information to explore possible instances of motifs in any given protein sequence. The main database content are annotations of motif classes, hand-curated from the scientific    (27), Pfam (28) and KEGG (29).
An ELM motif class is described by a unique regular expression pattern, such as '...([ST])P.' (meaning any three residues followed by a serine or threonine, followed by proline and another wildcard residue) for the DOC WW Pin1 4 class. Ideally, each motif class has multiple example instances of this motif annotated, whereby an instance is described as a match to the regular expression pattern of the ELM motif class in a protein sequence. For each instance entry, ideally, multiple sources of experimental evidence are recorded (identifying participant, detecting motif presence and detecting interaction), and, following annotation best practices, a reliability score is given by the annotator.

Interface
The HTML interface has been updated and new functionality has been added. A multi-tier navigation menu has been implemented to provide improved accessibility to the individual pages. Also, on each page, a unified search box is available, providing auto-completion and faster access to the ELM database content via simple keyword search (see Figure 2).
Several new webpages have been introduced to allow easier access to the database content. There are now individual pages for interaction domains, motif-mediated switches, PDB structures, ELM methods and GO terms. This allows easy access to individual content searching and selecting data by user provided keywords, fostering data dissemination and re-use.

Database content
The ELM database has been updated and existing data types have been enriched with more data, see Tables 1 and 2    experimental methods annotated for these instances. Furthermore, new data types have been added: Many motif instances have been annotated from large-scale studies (30) for which known mutations in the motif contribute to diseases. These can be collectively viewed at the ELM disease page (http://elm.eu.org/infos/diseases.html). Each annotated disease is described by a short abstract, links to the sequence variation causing the condition (linked to swissprot (31)) in the the motif-bearing protein as well as reference article(s). Each entry is linked to the corresponding ELM instances page.

Database and web server optimization
The ELM resource is implemented using a PostgreSQL (https://postgresql.org) relational database as a backend to store all annotations and associated data, while the frontend web interface makes use of the Django web-framework (https://djangoproject.com). Recent improvements include an updated annotation system allowing easy annotation and correction of entries. This aids annotators in inserting novel entries and updating existing ones. Figure 1 illustrates how this annotation system has helped increasing the database content: since its implementation (in the year 2014) many new ELM classes and instances have been annotated, and a large number of classes and instances have been revised. The high degree of data integration and interconnectedness caused some ELM database queries to become slow. To remedy this and thus increase user experience, an HTTP Cache/Reverse Proxy has been employed caching rendered HTML pages, which significantly speeds up page delivery and increases user experience.

Downloads
The data annotated in the ELM database is freely available to the scientific community and the ELM team tries to make this data as easily accessible as possible. Several new pages are now available providing more download formats/options; the best starting point for looking for ELM downloads is the web page http://elm.eu. org/downloads.html. Novel pages include the experimental methods used during annotation, the PDB structures associated with motif-domain interactions, or all linked GO-terms. All of these are also available for download in computer-parseable tab-separated format. Also, a simple timestamp has been implemented, which allows clients to update their data only when newer data is available. Further formats and options can be implemented upon request; the authors welcome suggestions to implement user-suggested features.

SHORT LINEAR MOTIFS IN BIOLOGICAL PATHWAYS
Relevance of linear motifs is well known in signaling pathways (32). Furthermore, accumulation of more experimental evidence demands a systematic analysis of all biological pathways for the presence of linear motifs. The ELM resource assists in this endeavor by providing a distinct color mapping of different ELM classes on the proteins in-   volved in different pathways, information of which is obtained from KEGG (29). We have used the Wnt signaling pathway (hsa04310) to illustrate this feature ( Figure 6). Signaling in this pathway starts from Wnt proteins that transduce the signal from the extracellular part of the cell to the interior via the Frizzled receptor, which in turn trans-mits it to other regulatory proteins. A key component of Wnt signaling is ␤-catenin which needs to accumulate in the cytoplasm in order to be translocated into the nucleus, where it subsequently induces a cellular response via gene transduction (33). This accumulation however is regulated by APC/Axin1 destruction complex in which ␤-catenin is ubiquitinylated (34). Thereafter, modified ␤-catenin is degraded via proteosomal machinery. Short linear motifs play a prominent role in this process as exemplified by the presence of at least four different motifs in ␤-catenin (annotated at the ELM resource): It contains modification sites for Casein kinase 1 (CK1) and glycogen synthase kinase 3 (GSK3) and the sequential modification of these sites by CK1 and GSK3 generates a phosphodegron at the N-terminus of this protein. This phospho-regulated degron acts as a recognition site for ␤-TrCP, a subunit of the SCF-␤-TrCP E3 ligase enzyme (35). After binding to the activated degron, ␤-TrCP ubiquitinylates ␤-catenin, which gets subsequently degraded by the proteosomal machinery (36). Among other classes, ␤-catenin contains ligand binding (LIG) motifs for 14-3-3 proteins and the binding of 14-3-3 along with Chibby protein has been suggested to facilitate the nuclear export of ␤-catenin, which puts an end to its signaling (37). The presence of each of these motifs is necessary for modulation of this pathway and the regulation of ␤-catenin by different motif classes clearly underlines the important role of short linear motifs in signaling pathways.

CONCLUSION AND FUTURE DIRECTIONS
The number of motifs that have been experimentally validated up to today is still small, with annotated instance numbers in the low thousands, compared to the estimated total number, which might exceed a million (38); hence, we expect many more motifs and motif-mediated interactions to be discovered (for guidelines on motif discovery see (39)). The ELM resource will continue to provide support to the scientific community with a repository of high quality annotations and facilitate the linear motif analysis of protein sequences.