The Death Domain (DD) superfamily, which is one of the largest classes of protein interaction modules, plays a pivotal role in apoptosis, inflammation, necrosis and immune cell signaling pathways. Because aberrant or inappropriate DD superfamily-mediated signaling events are associated with various human diseases, such as cancers, neurodegenerative diseases and immunological disorders, the studies in these fields are of great biological and clinical importance. To facilitate the understanding of the molecular mechanisms by which the DD superfamily is associated with biological and disease processes, we have developed the DD database (http://www.deathdomain.org), a manually curated database that aims to offer comprehensive information on protein–protein interactions (PPIs) of the DD superfamily. The DD database was created by manually curating 295 peer-reviewed studies that were published in the literature; the current version documents 175 PPI pairs among the 99 DD superfamily proteins. The DD database provides a detailed summary of the DD superfamily proteins and their PPI data. Users can find in-depth information that is specified in the literature on relevant analytical methods, experimental resources and domain structures. Our database provides a definitive and valuable tool that assists researchers in understanding the signaling network that is mediated by the DD superfamily.
The Death Domain (DD) superfamily is one of the largest and most studied protein–protein interaction (PPI) modules and is comprised of four subfamilies: DD (1), Death Effector Domain (DED) (2), CAspase Recruitment Domain (CARD) (3) and PYrin Domain (PYD) (4). The functional and structural similarity are key feature for defining DD superfamily (1–4). The classification of the subfamily in the superfamily is mainly by sequence homology (3–7). Accumulating structural information, however, indicates that each subfamily has unique structural characteristics, such as a more flexible and exposed third helix in the DDs, the presence of an RxDL-motif in the DEDs, an interrupted first helix in the CARDs and relatively small third helix (or no well defined third helix) in the PYDs. All together, the sequence of amino acids and the structural characteristics defined the subfamily in the DD superfamily (5–7).
The DD superfamily plays a pivotal role in apoptosis, inflammation, necrosis and immune cell signaling pathways (5–9). Upon receiving signals to trigger apoptosis or inflammation signals, the assembly of caspase activating complexes occurs via the DD superfamily (10–12). The DD superfamily is also involved in recruiting downstream effectors for immune cell receptor signaling, intracellular pathogen sensing and DNA damage responses (13,14). Because DD superfamily-mediated signaling events are associated with various human diseases, such as cancers, neurodegenerative diseases and immunological disorders, the DD superfamily has emerged as a promising target for therapeutic intervention (15–17). A detail of the role and importance of DD superfamily in many cellular signaling pathways is beyond the scope of this article, and the reader is referred to recent review articles (5–9).
In the human genome, 37 proteins with DDs, 7 proteins with DEDs, 33 proteins with CARDs and 22 proteins with PYDs have been identified (5,6). Because specific DD superfamily-mediated PPIs are critical for determining downstream events, the investigation of PPIs among the DD superfamily proteins (i.e. proteins that contain the DD superfamily) will facilitate the understanding of DD superfamily-mediated molecular and cellular processes and their related diseases.
PPI modules, including the DD superfamily, have been intensively studied, because most proteins form complexes that achieve specific functions. Accumulating information on the PPIs in many cellular signaling pathways provides crucial insights into understanding the molecular mechanisms and their related disease processes. Because PPI interfaces have emerged as promising drug targets, there has been substantial progress in developing small molecular compounds that competitively interfere with PPIs (18).
Although there are several well-known global PPI network databases, such as DIP (19), IntAct (20), MINT (21) and STRING (22), these databases do not provide sufficient information to scientists who want to focus their research on specific protein families. Thus, it is desirable to construct extensively curated databases that can provide in-depth information on specific molecules or molecule families, and such databases will be more effective than the present global PPI network in stimulating the formulation of new knowledge, hypotheses or experiments (23).
Here, we present the DD database, a manually curated database that aims to provide comprehensive information on PPIs of the DD superfamily. This database has a user-friendly interface with many useful features, including a search engine, an interaction map and a function for cross-referencing useful external databases. Our DD database will provide a valuable tool to assist in understanding the molecular interaction and signaling network of the DD superfamily. The DD database can be accessed at http://www.deathdomain.org.
CONTENTS OF THE DATABASE
Our DD database contains 175 PPI pairs among the 99 DD superfamily proteins. The PubMed database has been used to collect information on the DD superfamily and its PPIs. After reviewing hundreds of articles, 295 peer-reviewed articles were manually selected to build the database. Users can access all the DD superfamily proteins, their PPI pairs and selected literature via the ‘Statistics’ page on the website (Supplementary Figure S1). An overview of the PPI information described in this article is presented in the form of a matrix (Table 1). Based on common research schemes of PPI studies (24), columns are labeled with the following three categorizations: validation for interaction, characterization and functional consequence (Supplementary Table S1). Validation for interaction is subdivided into in vitro and in vivo. These categorizations contain the answers for three basic but critical questions about PPIs. Validation and characterization will provide answers on how to identify and confirm PPIs and what are the biochemical properties of the PPIs, respectively. Functional consequence will present an answer to what are the biological meanings of the PPIs. Each category has a brief summary of experimental results or information on experimental resources.
Analytical methods and their related categories are represented by gray boxes.
Validation refers to the type of experimental methods that are used to verify the DD superfamily PPIs. Validation for the PPIs is classified either as in vitro or in vivo. In vitro interaction refers to the methods that use assays with recombinant proteins, whereas in vivo interaction represents those processes that use assays with endogenous or overexpressed proteins. The DD database provides a detailed summary of experimental resources, including the gene constructs and expression systems that are used for recombinant protein preparation and cell/tissue types. Characterization refers to the data regarding the biochemical properties of PPIs, such as binding region mapping, stoichiometry or affinity among interacting proteins as specified in the articles. We curated the mapping data for binding regions with regard to the gene constructs that are used and corresponding amino acid regions. In the DD database, stoichiometry is described in quantitative terms and affinity is defined by dissociation constants. Because the DD superfamily proteins have apoptotic or non-apoptotic functions, we classified and curated their functional roles into two subcategories: death-related and death-unrelated.
The DD database provides information on the amino acid sequences and the domain boundaries of the DD superfamily proteins, which is retrieved from the UniProtKB/Swiss-Prot and UniProtKB/TrEMBL databases (25) and is available in FASTA, EMBL and GenBank formats (Supplementary Figure S2). External databases that introduce specific DD superfamily proteins and their PPIs are linked at the ‘External Database Link’ page on the website. Because the structural information of the DD superfamily is important for understanding PPIs at an atomic level, we also included any available 3D structure information of each DD superfamily. The details of procedures for structural determination are also contained in our database via the ‘3D Structure’ page. In addition, our database provides the information of natural mutations and related diseases via ‘Disease’ tab. Because the DD superfamily proteins are involved in many diseases through the certain types of mutation, this information might be particularly useful to understand mutation based diseases.
DATA COLLECTION AND CURATION
The PubMed database was used as the primary source for collecting information and constructing the DD database. After finding synonyms for each of the 99 DD superfamily proteins using UniProtKB (25) and Entrez Gene (26), we obtained a list of articles using each name of the proteins and its synonyms on a PubMed search, and we selected the articles that contained evidence for physical binding among the proteins denoted. We also manually screened information that was in other databases, such as DIP, IntAct, MINT, STRING and Entrez Gene. All of the 295 articles used for database construction are listed on our database website (Supplementary Figure S1).
We curated the data that were based on analytical methods, experimental results, resources (e.g. genes or proteins, primary cells, tissues and cell lines) and nomenclature. The curation of the analytical methods and experimental results was partly described in the previous section ‘CONTENTS OF DATABASE’. We manually curated the information on the gene constructs that were denoted in the literature with respect to species and amino acid region. With regard to the nomenclature that describes the proteins, we adopted common names that are widely used in the literature. If information on the gene constructs was not found in the literature, but relevant references were provided, we gathered information from the referenced articles. However, if no appropriate data were available, the entry would be designated as ‘Not specified’ in our database. Because the nomenclatures describing primary cells or tissues are mostly unified, we faithfully cited primary cell or tissue names that were designated in the literature.
The DD database contains several useful features, such as a search engine, an interaction map and a tool that cross-references other external resources. Navigation of the DD database is illustrated in Figure 1, and the website also provides instruction in the ‘Tutorial’ section. Users can browse the database by clicking the name of a subfamily of a DD superfamily (CARD, DD, DED and PYD) or the name of an individual the DD superfamily protein. If users choose to begin their search based on a specific subfamily of the DD superfamily, they will obtain all the information that is related to a subfamily containing proteins, including lists of PPI pairs of the subfamily and the availability of data for the validation of the interaction, characterization and functional roles of each PPIs in the ‘At a glance’ section on the website (Supplementary Figure S3). Users also can access all of the detailed PPI information of a certain DD superfamily by clicking the ‘In detail’ section on the website (Supplementary Figure S3). If users choose to browse based on a specific name for the DD superfamily proteins, they will see the information on the brief introduction of the proteins, amino acid sequences, domain boundaries and PPIs on the front page (Figure 1C and D). Users also can click on the arrow between two interacting DD members to get information on the specific interactions. UniProtKB IDs and availability of 3D structures of a specific DD superfamily also can be accessed by clicking ‘External Database Link’ and ‘3D structure’, respectively (Figure 1E).
The DD database provides a full-text search tool for searching target proteins, PPIs and relevant literatures (Figure 1B). When users enter a query (i.e. protein names or UniProtKB IDs) into the search form, the system presents corresponding information on all of the proteins and PPI pairs. In addition, the system can provide a list of indexed entries that contain the search word in the title, author, affiliation or abstract. The DD database also provides graphical displays for the complicated network of DD superfamily-mediated PPIs for better visualization. The proteins are represented as round-shaped nodes that are labeled with their names, and PPIs are illustrated as edges connecting the protein nodes (Figure 1F). By clicking the nodes, all of the PPI pairs that are generated by a specific protein become highlighted. Detailed information on a PPI can be obtained by double-clicking the representative node. Users can also drag the nodes for better visualization. A control panel on the left side of the toolbar allows users to choose each of the DD subfamilies and PPI categories.
To complement the DD database and to provide additional information on each gene that is contained in the DD database, we provide hyperlinks to other useful databases, including UniProtKB, DIP, IntAct, MINT, STRING and KEGG. Thus, the DD database contributes as a central hub for the collection of structural, functional and signaling information on the DD superfamily and its PPIs in detail.
DISCUSSION AND FUTURE DIRECTIONS
Studies on PPI modules and their PPIs in many cellular signaling pathways are critical for understanding the molecular mechanisms and their related disease processes. Because publications that are related to the DD superfamily-mediated signaling pathways have increased remarkably in recent years, additional effort to organize the comprehensive information of the DD superfamily and its PPIs is still needed. In addition, because global PPI databases do not provide sufficient information to scientists who want to focus their research on specific protein families, it is desirable to construct specific databases that contain detailed information on relevant analytical methods, experimental resources and summarized results.
Our DD database provides in-depth information on the DD superfamily and its PPIs, which assists in developing an understanding of the molecular and signaling network that is mediated by the DD superfamily. Our database provides documentation on the experimentally validated PPIs among the DD superfamily proteins. Manual curation is currently the best method for constructing reliable biological databases and is also able to include all of the detailed information, such as the species of the genes, the domain boundaries of the proteins, the relevant experimental methods and the structural information. The superiority and accuracy of the manual curation become even clearer when our database was compared with other PPI databases (Supplementary Table S2). Therefore, the aim of this database is to provide the scientific community with a comprehensive and integrated tool for efficiently, conveniently and accurately extracting information about PPIs of the DD superfamily
We are also planning to further extend the contents of our database, to add to the DD superfamily. Because several more cell death-related domains, including the CIDE, BH3 and BIR domains, have been identified and characterized, we plan to include these domains in the content of our DD database. Additionally, we are developing computational methods for discovering unidentified or possible PPI pairs using network analysis tools. Finally, we believe that our database constitutes a step toward optimizing and generalizing the use of the DD superfamily PPI search tools to be accessible beyond the specialized research community.
Supplementary Data are available at NAR online: Supplementary Figures 1–3 and Supplementary Tables 1 and 2.
Funding for open access charge: Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (2011-0003406 and 2011-0025697 to H.H.P., 2008-05943 to I.S. and 2011-0022437 to D.K.).
Conflict of interest statement. None declared.