Abstract

The database iPfam, available at http://ipfam.org, catalogues Pfam domain interactions based on known 3D structures that are found in the Protein Data Bank, providing interaction data at the molecular level. Previously, the iPfam domain–domain interaction data was integrated within the Pfam database and website, but it has now been migrated to a separate database. This allows for independent development, improving data access and giving clearer separation between the protein family and interactions datasets. In addition to domain–domain interactions, iPfam has been expanded to include interaction data for domain bound small molecule ligands. Functional annotations are provided from source databases, supplemented by the incorporation of Wikipedia articles where available. iPfam (version 1.0) contains >9500 domain–domain and 15 500 domain–ligand interactions. The new website provides access to this data in a variety of ways, including interactive visualizations of the interaction data.

INTRODUCTION

Systems biology is the examination of the organization and dynamics of cellular and organismal function (1,2). Modern sequencing techniques have provided large sets of genes and proteins, with the complete, or near complete, repertoire of sequences known for many source organisms. However, to understand an organism at the system level, one requires knowledge of how these sets of proteins interoperate with other cellular components, such as other proteins, DNA or small molecular ligands, to achieve their function.

High-throughput experiments have provided large datasets of protein–protein interaction in a variety of organisms [e.g. (3–9)]. However, such experiments do not provide the molecular details that are often needed to understand function. Such data only comes from high-resolution 3D structures, but the number of known structures remains several orders of magnitude lower (<100 000), than the number of number of proteins (tens of millions). Therefore, it is important to develop methods that capture as much data as possible from the high-resolution data, thereby enabling the transfer of information and annotations to larger datasets [reviewed in (10)].

As well as protein–protein interactions, it is important to capture other types of interactions at a similar level of detail. The Protein Data Bank (PDB) (11) is the central repository of high-resolution 3D structures, and contains many protein bound small molecule ligands (simply referred to as ligand hereafter), some of which are important clinical drugs. Bound ligands can perform a functional role in the domain–ligand complex, such as the zinc ion that helps stabilize a zinc finger, while other ligands may actually disrupt function by (i) occluding protein binding sites (e.g. (12)), (ii) being analogous to metabolites (e.g. (13)) or (iii) binding to other functionally important regions (e.g. (14)).

To aid the detailed understanding of protein interactions, we have combined information from the Pfam database (which classifies protein sequence space into conserved evolutionary units) (15) and the PDB, to capture high resolution interaction data. This interaction data is made available in the database iPfam (16). iPfam was first released in 2004 (Pfam, release 12.0), as part of the Pfam database, providing detailed domain–domain interactions (DDI) data. In 2007, the Pfam web site underwent a complete redesign [described in (17)], aimed at improving Pfam data access and consistency across the Pfam mirror sites. Due to the exponentially larger sizes of both iPfam and Pfam datasets, Pfam was restricted to storing and displaying just DDI information, without residue, atom or bond information. Herein, we describe the first release of the standalone version of the iPfam database, which returns the details of domain interactions at residue and atomic resolutions.

iPfam has been expanded to include both DDI and domain–ligand interactions (DLI), whereas most similar, domain centric, resources exclusively provide either DLI (18–22) or DDI data (23–30). The new iPfam site provides easy access to the interaction data using a combination of a protein family (Pfam accession or identifier), a ligand and PDB identifier. We present visualizations for analyzing the interactions from both sequence and structure perspectives.

METHODS

In the following section, we briefly describe the process of identifying domain interactions (both DDI and DLI). To identify Pfam protein family interactions in 3D structures, we have to take the Pfam data that is principally defined on UniProt sequences and transfer those annotations to the PDB atom coordinates, while taking account of the data in both resources being in flux.

Mapping Pfam Entries to PDB Entries

The first step of the new, independent, iPfam update cycle is to synchronize the underlying databases, PDB and Pfam. iPfam version 1.0 is built using PDB entries as of 26 August 2013 and Pfam version 27.0, released in March 2013. Due to the asynchronous data release of the source databases, Pfam entries are mapped to PDB entries using a two-stage process.

The first stage depends on the high-quality Pfam–PDB mapping data provided by Pfam, which are constructed from a residue-to-residue mapping between PDB and UniProt called SIFTS (31). This approach of transferring annotations between sequence and structure using SIFTS has the advantage of being able to annotate the many protein fragments in the PDB, which are otherwise too short to significantly match a Pfam model. This allows the capture of many DDI what would otherwise look like a domain–peptide interaction. The Pfam–PDB mapping data is typically months out of date, so we process the mapping to remove references to obsolete PDB entries.

Table 1.

Breakdown of the domain interactions that are available in iPfam version 1.0, according to the interaction type

 Homodomain Heterodomain Ligand 
Intramolecular 528 2492  
Intermolecular 3993 5753 5168 
 Homodomain Heterodomain Ligand 
Intramolecular 528 2492  
Intermolecular 3993 5753 5168 

The second stage addresses the fact that many new PDB entries will have been added since the Pfam mapping was generated and will, therefore, not be annotated with Pfam domains using the first stage. To make iPfam more comprehensive, we supplement the Pfam sourced Pfam–PDB mappings with data from RCSB PDB (32), which uses the HMMER web service (33) to annotate the PDB chain protein sequences with Pfam domains using the hmmscan search tool. This list of Pfam domain annotations is accessible via an API provided by RCSB PDB, and is used as a supplementary mapping data for structures not annotated by the first stage. Note, the RCSB PDB mapping may contain overlapping Pfam domains, the results are filtered to remove any overlaps, disregarding all but the most significant E-value match. Overall, this two-stage mapping approach resulted in 272 900 structural domains (from 6634 distinct Pfam entries) being identified in 87 386 different PDB entries.

Defining Protein Family Interactions

Having performed the Pfam–PDB mapping, each PDB entry containing a Pfam protein family is analyzed for interactions. This is achieved by taking each domain and testing to see if any atoms in the domain are in close enough proximity to form a bond with any atom from another domain and/or a small molecule ligand found within the structure. We define the following bond types—covalent, electrostatic (or salt bridge), amino acid side chain and main chain (backbone) hydrogen bonds and van der Waals. All bonds are defined based on both the geometric (distance and angle) and chemical properties (charge, polarity and size) of the atoms involved, [as defined in (34), Chapter 4]. An interaction between a domain and another domain or ligand is called when one or more of these bonds are formed between the two molecules. Briefly, a covalent bond is determined if the distance between two atoms is less than the sum of the distance of the covalent radii of the atoms involved. An electrostatic bond is determined if the distance between the atoms is greater than the sum of the covalent radii and distance is less than the sum of the van der Waals radii, with one atom positively charged and the other negatively charged. A hydrogen bond is determined between two atoms that do not satisfy either the covalent or electrostatic criteria, and if one of the two atoms is a donor atom (D) and the other an acceptor atom (A). The D and A atoms must also satisfy the following geometric criteria: the D–A distance must be less than 3.6Å, and the angle between the vectors formed between D–A and A and amino acid residue atom, is between 90o and 140o. A van der Waals bond is determined if the distance between the two atoms is greater than the sum of the atom’s van der Waal radii (R) and less than R+ 1.5 Å. The list of atom properties (charge, donor/acceptor, charge, covalent radii and van der Waals radii) used in the calculations is available on the iPfam FTP site (atomData.csv).

In iPfam, a small molecule ligand is defined as any molecule found in the PDB HETAM record, excluding nucleic acids, aqueous solvents and covalently modified residues. We use the 3-character alphanumeric code that the PDB assigns to each ligand in the Chemical Component Dictionary (http://www.wwpdb.org/ccd.html).

In the last release of iPfam, which was embedded in Pfam and contained the interaction data at the molecular level (release 20.0, April 2006), bonds arising from van der Waals interactions were stored as a property of the atoms forming that bond. Van der Waals bonds are general attractive forces that vary in strength based on the distance between two atoms. Thus, the close proximity of two amino residues tends to yield many van der Waals interactions. As a result, 96% of the 18 million recorded bonds were van der Waals. Since 2006, both the PDB and Pfam databases have grown substantially, and in an effort to make the production of iPfam more sustainable, the decision was taken to store van der Waals bonds as a property of the residues involved, rather than the actual atoms. Consequently, in iPfam 1.0, the protein-family atom-bond data table contains just 2.6 million records.

iPfam STATISTICS

Overall, iPfam version 1.0 (referred to as 1.0 as it is the first release of the independent version of the database) catalogues 278 678 DDI structures, corresponding to 9516 distinct different domain pairings. Of these pairings, 43% (4074) are homo-domain interactions (i.e. both interacting domains belong to the same Pfam entry) and 57% (5442) correspond to hetero-domain interactions. Of the DDI, 6595 are inter-chain interactions (i.e. found on two separate chains/proteins in the PDB entry), 1354 are intra-chain and 1567 are found as both intra- and inter-chain. There are 529 800 structures of DLI, corresponding to 50 886 unique DLI pairings. Of the 15 540 ligands in the database, 90% (13 984) form an interaction with one or more Pfam entries. Many of the ligands not forming a DLI are typically interacting with a protein region not covered by a Pfam domain or with another molecule such as nucleic acids. Conversely, 78% (5168) of the families in Pfam with a known 3D structure bind one or more ligands. All of the interaction data contained in iPfam are available as a MySQL database dump available from the FTP site. Downloads of subsets of the data are possible via the web site.

WEBSITE FEATURES

The layout and design of iPfam share many similarities with the Pfam (15) and Dfam (35) web interfaces that we have developed. There are three main entry points to the iPfam data: a protein family, a ligand or a protein structure. At the top of every page is a navigation bar that provides links to the home page, browse pages, FTP site and help sections (Figure 1). Where appropriate in the site, we present an information box that summarizes the content of a section at a glance. This information box, if present, is located below the keyword search, on the top-right side of the page, and contains three icons indicating the number of DDI, number of DLI and number of structures cited in that section.

Figure 1.

Examples of (A) the family and (B) the ligand pages showing the summary section or tab. Both pages present information about the respective entries and include external database links where possible. At the bottom of each page is the Wikipedia article summary.

Figure 1.

Examples of (A) the family and (B) the ligand pages showing the summary section or tab. Both pages present information about the respective entries and include external database links where possible. At the bottom of each page is the Wikipedia article summary.

Interaction data may be retrieved by using the ‘jump to’ box on the homepage to navigate to the domain or ligand page of interest. These pages provide a coarse interaction detail, listing all domains and ligands that interact with the query where appropriate, as well as functional annotations. A page is typically broken down into a series of tabs that present the different data sections available. An example of the summary tab for both a protein family and ligand entry is shown in Figure 1A and B, respectively. In the summary tab for a Pfam family we display the one-line description of the entry and any associated annotation and references provided by Pfam. Beneath this text is a direct link to Pfam. Where possible we provide additional annotation from Wikipedia. The ligand summary tab contains more structured data that describes the ligand, including the ligand name, formula, molecular weight, InChI (IUPAC International Chemical Identifier) and SMILES (Simplified Molecular-Input Line-Entry System) string. For each ligand entry, external database cross-links are provided to both PDB RSCB and PDBe, and where possible to ChEBI (36) and ChEMBL (37). To provide further functional information about the 15 000 different ligands found in the PDB, we have started to link ligands stored in iPfam to the corresponding Wikipedia entries.

FAMILY AND LIGAND WIKIPEDIA ARTICLES

We provide additional functional annotations for protein family entries by using the Wikipedia article mapping provided by Pfam. Linking between iPfam ligand entries and Wikipedia articles was performed by cross-referencing the stored ligand InChI with the InChI found in the Wikipedia chembox or drugbox infoboxes. Such infoboxes in Wikipedia chemical articles contain structured data that can be data-mined. iPfam currently maps to >2500 distinct Wikipedia articles (1484 family articles and 1110 ligand articles). The potential caveats of using Wikipedia data have been discussed previously, particularly with respect to the potential for vandalism (38). To try and minimize the risk of vandalism, we have built an in-house tool that captures snapshot summaries for each article. Each summary contains up to the first two paragraphs of the Wikipedia article lead section and an image, if available. The semi-automated summary generation is verified by a curator, who can modify (i) the length of the summary text (decreasing the number of sentences) and (ii) the selected summary image from the entire set of article images. Once the initial summary is prepared, the summary is stored in our database for use in the web site. Updated Wikipedia articles are retrieved on a daily basis and the summary is compared to the retrieved article to check for differences. Summaries where the corresponding text or images have been altered are flagged for review, with updates only being made after review. Currently, this results in about five summary reviews a day. This substantially reduces the likelihood of malicious vandalism being shown in iPfam by default. The full article can be revealed by selecting the ‘full article’ button or viewed and edited directly in Wikipedia. The use of summaries significantly reduces the amount of Wikipedia content that needs to be checked by a curator and provides a more uniform sized block of annotation to be displayed on the page.

ACCESSING SPECIFIC INTERACTIONS

Adjacent to the family page summary tab, as shown in Figure 1A, are the family and ligand-interactions tabs. The table on each tab lists all detected interactions between the protein family (or domain) and other domains and ligands. These tables list the number of occurrences of each DDI, and additionally provide a breakdown of the occurrences according to whether the DDI is intrachain or interchain. The user can access details of the pair-wise interaction by using the ‘Structures’ and ‘Sequences’ buttons in the table. Selecting either of these buttons takes the user to the next level of detail in the site, altering the context in which the data will be presented. For example, from the ubiquitin family interaction tab (http://ipfam.org/family/PF00240/fam_int), the user can enter the ‘ubuiquitin–CUE interaction’ section of the site to investigate the interaction between the two domains. When using the ‘structure’ button, users are taken to a list of structures containing the ubiquitin–CUE interaction, which tabulates each structural example of the DDI and includes a summary of the bonds that form the interaction. From here, the user can explore each example of the interaction in even more detail. Alternatively, using the ‘sequence’ button from the top-level family interaction tab, the user is taken directly to the sequence interaction graphical display (described below). Similarly, the ligand interaction tab lists ubiquitin interactions with different small molecule ligands, which can be explored in the same fashion as the DDI. The final tab on the family page is the download tab, which provides tab-delimited versions of the data contained in the interaction data tables found in the family page tabs. The top-level ligand pages have a layout similar to the family pages.

VISUALIZING INTERACTIONS

The interaction data contained within iPfam may be investigated at both the residue and the atomic level, using 2D and 3D visualizations. The user may invoke these visualizations by descending through the site as described above, or simply by searching with a PDB identifier.

2D interaction visualization

Many structures in the PDB contain multiple domains, with multiple interactions between the domains. For example, the structure of Aquifex aeolicus transcription factor σ28, bound by the anti-σ FlgM [PDB identifier 1SC5, (39)] provides us with the molecular details of how these two proteins interact. The interaction interface is extensive yet discontinuous, with FlgM binding occluding both the core RNA polymerase-binding interface and the DNA-binding sites of the σ-factor. However, this complex interaction can be simplified as a set of 2D displays for each DDI, as shown in Figure 2. This display graphically represents the domain organization of each chain, with the positions of interacting residues indicated on a line below each domain organization graphic. Each vertical bar on this line indicates the position of a residue that is involved in forming the interaction interface, and is colored according to the most energetically significant bond found at that position. Placing the mouse cursor over the vertical bar causes the interacting residues in the other sequence to be highlighted, by increasing the line height. The details of the interaction(s) between the subject residue and residue(s) from the other domain are displayed in a box to the right of the graphic (as shown in Figure 2). Similarly, with DLIs the sequence containing the interacting domain is represented graphically, with bond-forming residues shown below. The ligand is not represented in this view, with the interactions between domain residues and ligand listed in the ‘Interaction Details’ box. All of the residue interaction details may be downloaded using the adjacent download tab.

Figure 2.

2D representation of a DDI between Sigam70_r2 and FlgM, as determined from the structure 1SC5. The alternative class σ-factor, σ28 is composed of three separate domains [colored green (Sigma70_r2, PF04542), orange (Sigma70_r3, PF04539) and dark red (Sigma70_r4, PF04545)], named after the conserved regions in the primary housekeeping transcription factor, σ70 [(39) and references therein]. In this case, residue 40 of the Sigma70_r2 domain has been selected, with the interacting residues in FlgM (colored blue, Pfam accession: PF04316) being displayed by making the colored vertical lines large. The details of the bonds are also shown in the adjacent ‘Interaction Details’ display.

Figure 2.

2D representation of a DDI between Sigam70_r2 and FlgM, as determined from the structure 1SC5. The alternative class σ-factor, σ28 is composed of three separate domains [colored green (Sigma70_r2, PF04542), orange (Sigma70_r3, PF04539) and dark red (Sigma70_r4, PF04545)], named after the conserved regions in the primary housekeeping transcription factor, σ70 [(39) and references therein]. In this case, residue 40 of the Sigma70_r2 domain has been selected, with the interacting residues in FlgM (colored blue, Pfam accession: PF04316) being displayed by making the colored vertical lines large. The details of the bonds are also shown in the adjacent ‘Interaction Details’ display.

3D visualization interface

The atomic resolution interaction data in iPfam is the most fine-grained data, and is best visualized using the 3D structure that it is calculated from. One of the early design decisions made with the iPfam web site and structure visualization was to adopt the JavaScript version of Jmol (40), JSmol, in preference to the applet version. Applets require Java to be installed locally and to be correctly configured within the browser, whereas JSmol uses the browser to render the structure. The drawback of JSmol is that it uses HTML5 and canvas, which is not supported by older browsers, particularly Internet Explorer version 8.0 and earlier. Given that these older browsers make up <10% of the likely usage (and this fraction will diminish over time), we felt that supporting only JSmol was the appropriate decision given limited development resources. As outlined below, using the JavaScript version has allowed a more elaborate integration of data and structure visualization.

The iPfam structure page offers visualization of all structures that were available at the time of the release, whether an interaction had been found or not. For each structure, the protein chain or chains are rendered as white ribbon cartoons, depicting the secondary structure elements. The positions of Pfam domains are indicated on the structure by coloring of the ribbon, with each domain in the structure assigned a unique color. Ligands, if present, are shown as a ball-and-stick representation and colored according to the CPK scheme. Figure 3 shows an example of a structure page using the PDB entry 1SC5 (39).

Figure 3.

Structure of the Aquifex aeolicus transcription factor σ28, bound by the anti-σ FlgM (PDB identifier 1SC5). (A) The interaction between σ28 and FlgM is multipartite, with FlgM wrapping around the σ-factor, preventing normal function. The Pfam domains in the structure are colored as in Figure 2. The interaction between FlgM and Sigma70_r2 has been selected from the annotation display table, causing the interaction to be displayed in the structure. (B) A partial view of the nested table listing all of the interactions bonds between the selected domains. The ‘Toggle VDW’ button in the header and checkboxes can be used to modulate which bonds are displayed in the structure. (C) The JSmol structure view in (A) has been modified to show just the interacting domains, using the ‘Interactors’ button from the toolbar. (D) The structural display has been further restricted to show just the interacting residues using the ‘Interface’ button.

Figure 3.

Structure of the Aquifex aeolicus transcription factor σ28, bound by the anti-σ FlgM (PDB identifier 1SC5). (A) The interaction between σ28 and FlgM is multipartite, with FlgM wrapping around the σ-factor, preventing normal function. The Pfam domains in the structure are colored as in Figure 2. The interaction between FlgM and Sigma70_r2 has been selected from the annotation display table, causing the interaction to be displayed in the structure. (B) A partial view of the nested table listing all of the interactions bonds between the selected domains. The ‘Toggle VDW’ button in the header and checkboxes can be used to modulate which bonds are displayed in the structure. (C) The JSmol structure view in (A) has been modified to show just the interacting domains, using the ‘Interactors’ button from the toolbar. (D) The structural display has been further restricted to show just the interacting residues using the ‘Interface’ button.

Factors such as the size, the number of the domains and the presence of multiple copies of the same domain all complicate visualizing a domain of interest in a structure. To simplify identification of each domain in a structure, the user can switch to ‘Family’ annotations using the radio button found in the annotation selection block (Figure 3A), which causes the lists of the Pfam domains in the structure to be shown in the annotation table below. Placing the mouse cursor over the domain of interest in the list of interactions provided in the annotation display table causes the corresponding domain in the structure to be highlighted. Similar highlighting is also available for ligands, but rather than changing the color, enlarges the ball and stick representation of the ligand, making it easy to locate.

To view the details of an interaction, the user first chooses the desired interaction annotation (DDI or DLI) using the radio buttons in the annotation selection block, followed by activating the display of the specific annotation by clicking on the ‘>’ symbol in the table of annotations, as shown in Figure 3A. This click triggers the following set of events: (i) reveals the list of bonds (hidden by a subsequent click in Figure 3A, but shown in Figure 3B), (ii) renders the bonds on the structure and (iii) activates the tool bar underneath the JSmol pane. This toolbar contains three buttons that control how much structure information is displayed. The ‘All’ button shows the entire structure, Figure 3A. To provide a less obscured investigation of each domain interaction, it is possible to hide the parts of the structure that do not belong to the interacting domains by clicking on the ‘Interactors’ buttons, Figure 3C. The ‘Interface’ button restricts the display still further, so that just the interacting residues are visible, Figure 3D. In all three views, the nested table showing the list of bonds can be used to modulate the display of each bond, by clicking on the checkboxes (Figure 3B). All van der Waals bonds can be toggled off using the button in the table header (labeled ‘Toggle VDW’, Figure 3B), leaving just the higher order bonds visible.

Figure 4 illustrates how an interaction interface is rendered in structure, using the DLI between Zanamivir, a drug, and the Human sialidase Neu2 protein (PDB identifier 2F0Z, unpublished). Numerous interactions are made between the BNR_2 domain (PF13088), and the ligand within the cleft of the domain (insert in Figure 4). The two panels in Figure 4 show just the interaction interface, using the toolbar to restrict the display as described above. The ligand, Zanamivir, is shown as a ball-and-stick and highlighted in the right panel by a semi-transparent overlay on the figure. Using the display controls provided in the page, the rendering can be quickly modified to enable the visualization of just the higher order bonds (dashed lines), in this case between the amino acid side chains and the bound molecule.

Figure 4.

The binding interface between the neuraminidase inhibitor, Zanamivir and the Human sialidase Neu2 containing the BNR_2 (PF13088) domain (PDB identifier 2F0Z). The complete DLI is shown in the insert, bottom left. The left panel shows the interface between the domain and ligand. The ligand is rendered as a ball-and-stick representation. Interacting amino acids from BNR_2 domain are rendered using the cartoon representation, with interacting amino acid side chains as CPK colored, ball-and-sticks. Residues that form van der Waals interactions are drawn with their surrounding van der Waals radii indicated by a dotted halo around the entire amino acid. These van der Waals may be toggled off (see Figure 3B), to display just the higher order bonds, as shown in the right panel. For illustrative purposes only, the Zanamivir ligand is shown with a semi-transparent grey overly. The blue, dashed lines between the ligand and side chain indicate the positions of the electrostatic bonds. The dashed pink lines similarly indicate the positions of hydrogen bonds.

Figure 4.

The binding interface between the neuraminidase inhibitor, Zanamivir and the Human sialidase Neu2 containing the BNR_2 (PF13088) domain (PDB identifier 2F0Z). The complete DLI is shown in the insert, bottom left. The left panel shows the interface between the domain and ligand. The ligand is rendered as a ball-and-stick representation. Interacting amino acids from BNR_2 domain are rendered using the cartoon representation, with interacting amino acid side chains as CPK colored, ball-and-sticks. Residues that form van der Waals interactions are drawn with their surrounding van der Waals radii indicated by a dotted halo around the entire amino acid. These van der Waals may be toggled off (see Figure 3B), to display just the higher order bonds, as shown in the right panel. For illustrative purposes only, the Zanamivir ligand is shown with a semi-transparent grey overly. The blue, dashed lines between the ligand and side chain indicate the positions of the electrostatic bonds. The dashed pink lines similarly indicate the positions of hydrogen bonds.

DISCOVERING INTERACTION DATA

In addition to the browse pages and the ‘jump to’ sections found on the front page, we provide two additional search interfaces to the database. The first is via a keyword search that matches a query string against the textual data associated with the Pfam families, ligands, PDB entries and linked Wikipedia articles. The second search interface allows iPfam to be queried via a protein sequence.

The protein sequence search tool does not predict interactions in the sense that it does not compare the query sequence to the sequences from the structures that were used to determine the interaction. Instead, this provides a convenient means of querying for a set of interactions based on the Pfam domains found on the query sequence. To perform this search, Pfam domains are identified in the query sequence using the HMMER web service (33). The iPfam server then performs a lookup of the previously observed DDI or DLI for each identified domain. Both the Pfam and iPfam results are returned to the browser and displayed as shown in Figure 5. The top section of the results, Figure 5, shows the position of the domains in both graphical and tabular forms. Under the Pfam results are the iPfam interactions for the domains found on the query sequence. As the lists of interacting partners are potentially very large, we have included the ability to filter the results using a simple text-filtering tool, which is applied to all table cells. Such a search can provide a platform for exploring the interaction data in iPfam in more detail and be used to guide docking experiments.

Figure 5.

The results from searching iPfam with the Deoxyuridine 5′-triphosphate nucleotidohydrolase from yeast (UniProt accession P33317). The top section shows the Pfam match to the dUTPase model (PF00692). Below this is a list of known DDIs and DLIs found in iPfam for dUTPase. The text field located in the header provides a simple way of filtering through the results.

Figure 5.

The results from searching iPfam with the Deoxyuridine 5′-triphosphate nucleotidohydrolase from yeast (UniProt accession P33317). The top section shows the Pfam match to the dUTPase model (PF00692). Below this is a list of known DDIs and DLIs found in iPfam for dUTPase. The text field located in the header provides a simple way of filtering through the results.

FUTURE DIRECTIONS

The separation of iPfam and Pfam not only facilitates the independent development and updating of the two resources, but also allows a clearer segregation between protein family sequence annotations and domain interaction annotations. Both databases will develop data exchange mechanisms that will allow the display of the interaction data in the context of family multiple sequence alignments. This will allow interfaces to be compared and analyzed with respect to family sequence conservation. As part of the ongoing developments in iPfam, we will investigate the inclusion of domain–nucleic acid interactions, another important class of interaction not currently represented in the database, to provide a comprehensive and detailed set of protein interaction annotations.

A source of both false-positive and false-negative interactions in iPfam is that the interactions are currently calculated only on the asymmetric unit found in the PDB file. These may or may not reflect the true biological assembly of the protein and may contain crystal-packing artifacts (41). Having re-established iPfam, we will be investigating the use of the biological rotation matrices supplied in the PDB files, to provide additional arrangements of the proteins and hence interactions. One way of minimizing the number of crystal contact artifacts, or false-positive interactions, is to look at the properties of the interaction interface (42–46).

We do not currently impose any threshold criteria on the number of bonds or the amount of surface area that needs to be found at an interface, because true DDI interfaces can be very small. For example, in the Human BCR-ABL tyrosine kinase the SH3 domain interacts with a small region of the SH2 domain that is essential for orienting peptide interactions [PDB identifier 2ABL, (47)]. This interface consists entirely of van der Waals bonds, formed between just five residues from the SH3 domain and four residues from the SH2 domain (interface size of ∼125 Å). Furthermore, as we treat each structural DDI in isolation, a small, weak, yet biologically meaningful interaction may occur between two interchain domains, which would on in its own be insufficient to pass any criteria. However, a second more substantial interchain interaction may be mediated by another DDI, primarily facilitating the interaction between the two chains.

We acknowledge that the lack of an interface threshold could be a source of some false-positive interactions, and we will incorporate interface area metrics in the future to provide users with as much discriminatory information as possible about the interface.

We also do not try to discriminate between different small molecules in the DLI calculations. The presence of a co-crystallized ligand may be functionally important or simply an artifact due to its presence in the crystallization buffer. The aim of the iPfam database is to harvest as much of the high-resolution interaction information as possible from the few known structures, and to present this data in an accessible way that allows the user to make a judgment regarding the biological relevance of an interaction.

CONCLUSION

The iPfam database is constructed from two continuously updating sources, PDB and Pfam, both of which are well established in their respective fields of 3D structure data and protein domain data. The simple, rich user interface in iPfam, combined with the popularity of the source databases, makes interaction data available to a wide audience that includes both bench biologists, looking at their favorite sequence, and computational biologists, requiring comprehensive interaction data for their own data mining purposes. Bringing both DDI and DLI interaction data into the context of protein families, via the unified framework in iPfam, will help researchers to better understand the relationship between ligand and domain binding at a molecular level. There is an increasing body of evidence showing that DDI and DLI interfaces often overlap, allowing the knowledge of one class of interaction to be used to derive hypotheses about both classes of interaction (48–50). Combining knowledge of protein families, large-scale protein-interaction datasets and detailed molecular interaction data will not only improve our understand of biological systems, but also forms an important basis for the rational design of drugs that can modulate the constituent pathways (49,50,51).

ACKNOWLEDGEMENTS

We would like to thank Travis Wheeler for help in preparing the manuscript and Goran Ceric and his team for Janelia Farm’s high performance computing resources.

FUNDING

Howard Hughes Medical Institute Janelia Farm Research Campus (to R.D.F., B.L.M. and J.C.); European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI) funding (to A.B.). Funding for open access charge: HHMI Janelia Farm Research Campus.

Conflict of interest statement. None declared.

REFERENCES

1
Ideker
T
Systems biology 101—what you need to know
Nat. Biotechnol.
 , 
2004
, vol. 
22
 (pg. 
473
-
475
)
2
Chuang
H-Y
Hofree
M
Ideker
T
A decade of systems biology
Annu. Rev. Cell Dev. Biol.
 , 
2010
, vol. 
26
 (pg. 
721
-
744
)
3
Schwikowski
B
Uetz
P
Fields
S
A network of protein-protein interactions in yeast
Nat. Biotechnol.
 , 
2000
, vol. 
18
 (pg. 
1257
-
1261
)
4
Rain
JC
Selig
L
De Reuse
H
Battaglia
V
Reverdy
C
Simon
S
Lenzen
G
Petel
F
Wojcik
J
Schächter
V
, et al.  . 
The protein-protein interaction map of Helicobacter pylori
Nature
 , 
2001
, vol. 
409
 (pg. 
211
-
215
)
5
Stelzl
U
Worm
U
Lalowski
M
Haenig
C
Brembeck
FH
Goehler
H
Stroedicke
M
Zenkner
M
Schoenherr
A
Koeppen
S
, et al.  . 
A human protein-protein interaction network: a resource for annotating the proteome
Cell
 , 
2005
, vol. 
122
 (pg. 
957
-
968
)
6
Rual
J-F
Venkatesan
K
Hao
T
Hirozane-Kishikawa
T
Dricot
A
Li
N
Berriz
GF
Gibbons
FD
Dreze
M
Ayivi-Guedehoussou
N
, et al.  . 
Towards a proteome-scale map of the human protein–protein interaction network
Nature
 , 
2005
, vol. 
437
 (pg. 
1173
-
1178
)
7
Yu
H
Braun
P
Yildirim
MA
Lemmens
I
Venkatesan
K
Sahalie
J
Hirozane-Kishikawa
T
Gebreab
F
Li
N
Simonis
N
, et al.  . 
High-quality binary protein interaction map of the yeast interactome network
Science
 , 
2008
, vol. 
322
 (pg. 
104
-
110
)
8
Hu
P
Janga
SC
Babu
M
Díaz-Mejía
JJ
Butland
G
Yang
W
Pogoutse
O
Guo
X
Phanse
S
Wong
P
, et al.  . 
Global functional atlas of Escherichia coli encompassing previously uncharacterized proteins
PLoS Biol.
 , 
2009
, vol. 
7
 pg. 
e96
 
9
Rajagopala
SV
Casjens
S
Uetz
P
The protein interaction map of bacteriophage lambda
BMC Microbiol.
 , 
2011
, vol. 
11
 pg. 
213
 
10
Stein
A
Mosca
R
Aloy
P
Three-dimensional modeling of protein interactions and complexes is going ‘omics
Curr. Opin. Struct. Biol.
 , 
2011
, vol. 
21
 (pg. 
200
-
208
)
11
Bernstein
FC
Koetzle
TF
Williams
GJ
Meyer
EF
Brice
MD
Rodgers
JR
Kennard
O
Shimanouchi
T
Tasumi
M
The Protein Data Bank: a computer-based archival file for macromolecular structures
J. Mol. Biol.
 , 
1977
, vol. 
112
 (pg. 
535
-
542
)
12
Nordhoff
S
Cerezo-Gálvez
S
Feurer
A
Hill
O
Matassa
VG
Metz
G
Rummey
C
Thiemann
M
Edwards
PJ
The reversed binding of beta-phenethylamine inhibitors of DPP-IV: X-ray structures and properties of novel fragment and elaborated inhibitors
Bioorg. Med. Chem. Lett.
 , 
2006
, vol. 
16
 (pg. 
1744
-
1748
)
13
Champness
JN
Bennett
MS
Wien
F
Visse
R
Summers
WC
Herdewijn
P
de Clerq
E
Ostrowski
T
Jarvest
RL
Sanderson
MR
Exploring the active site of herpes simplex virus type-1 thymidine kinase by X-ray crystallography of complexes with aciclovir and other ligands
Proteins
 , 
1998
, vol. 
32
 (pg. 
350
-
361
)
14
Seeliger
MA
Nagar
B
Frank
F
Cao
X
Henderson
MN
Kuriyan
J
c-Src binds to the cancer drug imatinib with an inactive Abl/c-Kit conformation and a distributed thermodynamic penalty
Structure
 , 
2007
, vol. 
15
 (pg. 
299
-
311
)
15
Punta
M
Coggill
PC
Eberhardt
RY
Mistry
J
Tate
J
Boursnell
C
Pang
N
Forslund
K
Ceric
G
Clements
J
, et al.  . 
The Pfam protein families database
Nucleic Acids Res.
 , 
2012
, vol. 
40
 (pg. 
D290
-
D301
)
16
Finn
RD
Marshall
M
Bateman
A
iPfam: visualization of protein-protein interactions in PDB at domain and amino acid resolutions
Bioinformatics
 , 
2005
, vol. 
21
 (pg. 
410
-
412
)
17
Finn
RD
Tate
J
Mistry
J
Coggill
PC
Sammut
SJ
Hotz
H-R
Ceric
G
Forslund
K
Eddy
SR
Sonnhammer
ELL
, et al.  . 
The Pfam protein families database
Nucleic Acids Res.
 , 
2008
, vol. 
36
 (pg. 
D281
-
D288
)
18
Michalsky
E
Dunkel
M
Goede
A
Preissner
R
SuperLigands - a database of ligand structures derived from the Protein Data Bank
BMC Bioinformatics
 , 
2005
, vol. 
6
 pg. 
122
 
19
Bashton
M
Nobeli
I
Thornton
JM
PROCOGNATE: a cognate ligand domain mapping for enzymes
Nucleic Acids Res.
 , 
2008
, vol. 
36
 (pg. 
D618
-
D622
)
20
Kruger
FA
Rostom
R
Overington
JP
Mapping small molecule binding data to structural domains
BMC Bioinformatics
 , 
2012
, vol. 
13
 
Suppl. 17
pg. 
S11
 
21
Gana
R
Rao
S
Huang
H
Wu
C
Vasudevan
S
Structural and functional studies of S-adenosyl-L-methionine binding proteins: a ligand-centric approach
BMC Struct. Biol.
 , 
2013
, vol. 
13
 pg. 
6
 
22
Schreyer
AM
Blundell
TL
CREDO: a structural interactomics database for drug discovery
Database
 , 
2013
, vol. 
2013
 pg. 
bat049
 
23
Gong
S
Yoon
G
Jang
I
Bolser
D
Dafas
P
Schroeder
M
Choi
H
Cho
Y
Han
K
Lee
S
, et al.  . 
PSIbase: a database of Protein Structural Interactome map (PSIMAP)
Bioinformatics
 , 
2005
, vol. 
21
 (pg. 
2541
-
2543
)
24
Davis
FP
Sali
A
PIBASE: a comprehensive database of structurally defined protein interfaces
Bioinformatics
 , 
2005
, vol. 
21
 (pg. 
1901
-
1907
)
25
Ogmen
U
Keskin
O
Aytuna
AS
Nussinov
R
Gursoy
A
PRISM: protein interactions by structural matching
Nucleic Acids Res.
 , 
2005
, vol. 
33
 (pg. 
W331
-
W336
)
26
Winter
C
Henschel
A
Kim
WK
Schroeder
M
SCOPPI: a structural classification of protein–protein interfaces
Nucleic Acids Res.
 , 
2006
, vol. 
34
 (pg. 
D310
-
D314
)
27
Stein
A
Céol
A
Aloy
P
3did: identification and classification of domain-based interactions of known three-dimensional structure
Nucleic Acids Res.
 , 
2011
, vol. 
39
 (pg. 
D718
-
D723
)
28
Xu
Q
Dunbrack
RL
The protein common interface database (ProtCID)–a comprehensive database of interactions of homologous proteins in multiple crystal forms
Nucleic Acids Res.
 , 
2011
, vol. 
39
 (pg. 
D761
-
D770
)
29
Yellaboina
S
Tasneem
A
Zaykin
DV
Raghavachari
B
Jothi
R
DOMINE: a comprehensive collection of known and predicted domain-domain interactions
Nucleic Acids Res.
 , 
2011
, vol. 
39
 (pg. 
D730
-
D735
)
30
Teyra
J
Samsonov
SA
Schreiber
S
Pisabarro
MT
SCOWLP update: 3D classification of protein-protein, -peptide, -saccharide and -nucleic acid interactions, and structure-based binding inferences across folds
BMC Bioinformatics
 , 
2011
, vol. 
12
 pg. 
398
 
31
Velankar
S
Dana
JM
Jacobsen
J
van Ginkel
G
Gane
PJ
Luo
J
Oldfield
TJ
O’Donovan
C
Martin
M-J
Kleywegt
GJ
SIFTS: Structure Integration with Function, Taxonomy and Sequences resource
Nucleic Acids Res.
 , 
2013
, vol. 
41
 (pg. 
D483
-
D489
)
32
Rose
PW
Bi
C
Bluhm
WF
Christie
CH
Dimitropoulos
D
Dutta
S
Green
RK
Goodsell
DS
Prlić
A
Quesada
M
, et al.  . 
The RCSB Protein Data Bank: new resources for research and education
Nucleic Acids Res.
 , 
2013
, vol. 
41
 (pg. 
D475
-
D482
)
33
Finn
RD
Clements
J
Eddy
SR
HMMER web server: interactive sequence similarity searching
Nucleic Acids Res.
 , 
2011
, vol. 
39
 (pg. 
W29
-
W37
)
34
Creighton
TE
Proteins
 , 
2013
New York
W H Freeman & Company
35
Wheeler
TJ
Clements
J
Eddy
SR
Hubley
R
Jones
TA
Jurka
J
Smit
AFA
Finn
RD
Dfam: a database of repetitive DNA based on profile hidden Markov models
Nucleic Acids Res.
 , 
2013
, vol. 
41
 (pg. 
D70
-
D82
)
36
Hastings
J
de Matos
P
Dekker
A
Ennis
M
Harsha
B
Kale
N
Muthukrishnan
V
Owen
G
Turner
S
Williams
M
, et al.  . 
The ChEBI reference database and ontology for biologically relevant chemistry: enhancements for 2013
Nucleic Acids Res.
 , 
2013
, vol. 
41
 (pg. 
D456
-
D463
)
37
Willighagen
EL
Waagmeester
A
Spjuth
O
Ansell
P
Williams
AJ
Tkachenko
V
Hastings
J
Chen
B
Wild
DJ
The ChEMBL database as linked open data
J. Cheminform.
 , 
2013
, vol. 
5
 pg. 
23
 
38
Finn
RD
Gardner
PP
Bateman
A
Making your database available through Wikipedia: the pros and cons
Nucleic Acids Res.
 , 
2012
, vol. 
40
 (pg. 
D9
-
D12
)
39
Sorenson
MK
Ray
SS
Darst
SA
Crystal structure of the flagellar sigma/anti-sigma complex sigma(28)/FlgM reveals an intact sigma factor in an inactive conformation
Mol. Cell
 , 
2004
, vol. 
14
 (pg. 
127
-
138
)
40
Herráez
A
Biomolecules in the computer: Jmol to the rescue
Biochem. Mol. Biol. Educ.
 , 
2006
, vol. 
34
 (pg. 
255
-
261
)
41
Krissinel
E
Henrick
K
Inference of macromolecular assemblies from crystalline state
J. Mol. Biol.
 , 
2007
, vol. 
372
 (pg. 
774
-
797
)
42
Jones
S
Marin
A
Thornton
JM
Protein domain interfaces: characterization and comparison with oligomeric protein interfaces
Protein Eng.
 , 
2000
, vol. 
13
 (pg. 
77
-
82
)
43
Ponstingl
H
Henrick
K
Thornton
JM
Discriminating between homodimeric and monomeric proteins in the crystalline state
Proteins
 , 
2000
, vol. 
41
 (pg. 
47
-
57
)
44
Bahadur
RP
Chakrabarti
P
Rodier
F
Janin
J
A dissection of specific and non-specific protein-protein interfaces
J. Mol. Biol.
 , 
2004
, vol. 
336
 (pg. 
943
-
955
)
45
Dafas
P
Bolser
D
Gomoluch
J
Park
J
Schroeder
M
Using convex hulls to extract interaction interfaces from known structures
Bioinformatics
 , 
2004
, vol. 
20
 (pg. 
1486
-
1490
)
46
Dey
S
Pal
A
Chakrabarti
P
Janin
J
The subunit interfaces of weakly associated homodimeric proteins
J. Mol. Biol.
 , 
2010
, vol. 
398
 (pg. 
146
-
160
)
47
Nam
HJ
Haser
WG
Roberts
TM
Frederick
CA
Intramolecular interactions of the regulatory domains of the Bcr-Abl kinase reveal a novel control mechanism
Structure
 , 
1996
, vol. 
4
 (pg. 
1105
-
1114
)
48
Davis
FP
Sali
A
The overlap of small molecule and protein binding sites within families of protein structures
PLoS Comput Biol
 , 
2010
, vol. 
6
 pg. 
e1000668
 
49
Jubb
H
Higueruelo
AP
Winter
A
Blundell
TL
Structural biology and drug discovery for protein-protein interactions
Trends Pharmacol. Sci.
 , 
2012
, vol. 
33
 (pg. 
241
-
248
)
50
Winter
C
Henschel
A
Tuukkanen
A
Schroeder
M
Journal of Structural Biology
J. Struct. Biol.
 , 
2012
, vol. 
179
 (pg. 
347
-
358
)
51
Berg
T
Small-molecule inhibitors of protein–protein interactions
Curr. Opin. Drug. Discov. Devel.
 , 
2008
, vol. 
11
 (pg. 
666
-
674
)
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Comments

0 Comments