Abstract

The genomic era has been characterised by vast amounts of data from diverse sources, creating a need for new tools to extract biologically meaningful information. Bioinformatics is, for the most part, responding to that need. The sparseness of the genomic data associated with diseases, however, creates a new challenge. Understanding the complex interplay between genes and proteins requires integration of data from a wide variety of sources, i.e. gene expression, genetic linkage, protein interaction, and protein structure among others. Thus, computational tools have become critical for the integration, representation and visualization of heterogeneous biomedical data. Furthermore, several bioinformatics methods have been developed to formulate predictions about the functional role of genes and proteins, including their role in diseases. After an introduction to the complex interplay between proteins and genetic diseases, this review explores recent approaches to the understanding of the mechanisms of disease at the molecular level. Finally, because most known mechanisms leading to disease involve some form of protein interaction, this review focuses on the recent methodologies for understanding diseases through their underlying protein interactions. Recent contributions from genetics, protein structure and protein interaction network analyses to the understanding of diseases are discussed here.

INTRODUCTION

‘Life is a relationship between molecules, not a property of any one molecule. So is therefore disease, which endangers life’, wrote Zuckerkandl and Pauling (1962) in their chapter on ‘Molecular disease, evolution and genic heterogeneity’ [1]. Over 40 years later, we are still far from unraveling the molecular mechanisms of most diseases and pondering about the role of molecular interactions on healthy and diseased organisms. Indeed, proteins do not function in isolation but rather within the cell, interacting mostly with other proteins but also with other molecules such as DNA, RNA and small molecules. Thus, studies of proteins and their interactions are essential to understand their role within the cell. Here, the term ‘protein interaction’ includes a great range of events, such as transient and stable complexes, as well as physical and functional interactions.

The focus of this review is protein interactions and their role in understanding diseases. The main topic is divided into three fields of research, addressed in three sections. The first section reviews the association of genes or proteins and their interacting partners with a particular disease. The second addresses the structural analysis of disease-related proteins, protein complexes and their mutants. The third section covers the analysis of the global properties of the protein interaction networks, i.e. those related to diseases. The methods and hypotheses presented here were formulated for general application to any kind of disease. When it aids to illustrate an idea or application, a particular disease is singled out and a brief description of the disease is provided. Readers are referred to the cited work for more details.

This review offers a computational perspective on a broad emerging field that considers the role of protein interactions in the etiology of diseases and the generation of new hypothesis derived from this knowledge.

Proteins and genetic diseases

This section provides a brief introduction to phenotype–genotype association studies and an overview of recent computational methodologies that prioritise disease-related genes. Gene-phenotype association and protein interaction studies are intimately related. Uncovering the mechanism by which genes (and their interactions) cause disease reveals information about the interplay between their corresponding protein products. Conversely, protein interaction studies play a major role in the prediction of new gene-phenotype relationships.

From genes to phenotype

Progress in genetic studies towards the association of phenotype with genotype has led to the identification of an increasing number of genes that contribute to human disease. Mendelian traits or diseases, named after Gregor Mendel, are those inherited and controlled by a single gene. This gene can be isolated based on its position in the chromosome by a process known as positional cloning [2]. Some examples of human disease-related genes which were identified by positional cloning are the genes associated with cystic fibrosis [3, 4], Huntington disease (HD) [5] and breast cancer susceptibility [6, 7]. The first step of positional cloning is linkage analysis, in which the gene is mapped using a group of DNA polymorphisms from families that segregate the disease phenotype [8]. Once the gene that predisposes a disease is identified, its protein products and mutations can be studied to clarify the nature of the disease process. Even in simple Mendelian diseases, however, the correlation between the mutations in the genome of the patient and the symptoms might not be clear [9]. Several reasons have been suggested for this apparent lack of correlation between genotype and phenotype, as illustrated in Figure 1 [10–12]. Among them are pleiotropy (the ability of some genes to produce multiple phenotypes), environmental factors and the influence of other genes. Genes could influence each other in several ways: they can interact synergistically, one could mask the phenotypic effect of the other (phenomenon known as epistasis), or a gene could modify another gene (having a small quantitative effect on the expression of the other gene). For instance, cystic fibrosis and Becker muscular dystrophy, previously considered classical examples of Mendelian pattern of inheritance, are believed to be caused by a mutation of one gene modified by other genes [13, 14]. These observations lead to the evolving concept of oligogenic diseases, which require the interaction of a few genes, presenting inheritance patterns somewhere between monogenic and polygenic (reviewed in [15, 16]). This and other studies have demonstrated that even simple Mendelian diseases can lead to complex genotype-phenotype associations [12].

Figure 1:

From Mendelian to complex diseases: (A) the mutation of a gene is the main cause for the phenotypic trait or disease. Gene pleiotropy, gene modifiers, and environmental factors all influence the final phenotype (see glossary of terms in Table 3). (B) Most Mendelian or single gene diseases are determined by mutations at a single locus, mostly by those that produce mutations in the coding region of the protein. Other factors discussed in (A) also affect Mendelian diseases. (C) Oligogenic diseases require the interaction of a few genes and exhibit inheritance patterns somewhere between monogenic and polygenic. In the illustration, two genes and factors associated (represented by the hexagonal shapes) interact to produce a digenic disease. (D) Complex diseases or traits are affected by a multitude of genes (represented as black ovals) and several factors, such as environment (represented by white rectangles). Genes (and/or environmental factors) can affect other genes by enhancing (black arrows) or inhibiting (gray arrows) their action (to simplify arrows were drawn only for two genes).

Figure 1:

From Mendelian to complex diseases: (A) the mutation of a gene is the main cause for the phenotypic trait or disease. Gene pleiotropy, gene modifiers, and environmental factors all influence the final phenotype (see glossary of terms in Table 3). (B) Most Mendelian or single gene diseases are determined by mutations at a single locus, mostly by those that produce mutations in the coding region of the protein. Other factors discussed in (A) also affect Mendelian diseases. (C) Oligogenic diseases require the interaction of a few genes and exhibit inheritance patterns somewhere between monogenic and polygenic. In the illustration, two genes and factors associated (represented by the hexagonal shapes) interact to produce a digenic disease. (D) Complex diseases or traits are affected by a multitude of genes (represented as black ovals) and several factors, such as environment (represented by white rectangles). Genes (and/or environmental factors) can affect other genes by enhancing (black arrows) or inhibiting (gray arrows) their action (to simplify arrows were drawn only for two genes).

To add to the complexity, most common diseases such as cancer, metabolic, psychiatric and cardiovascular disorders (e.g. diabetes, schizophrenia and hypertension) are believed to be caused by several genes (multigenic) and affected by several factors including environmental ones (e.g. diet, infection by bacteria) [17]. Despite an increasing understanding of the multigenic inheritance, the study of these complex diseases remains challenging [18].

From genes to protein complexes and back

One of the main challenges scientists face today is deciphering the molecular details that lead to diseases. Even when the genetic basis of a disease is well understood, not much is known about the molecular mechanisms leading to the disorders. For oligogenic diseases, synergistic contribution of genes from several loci could explain disruptions in their products, in particular when these proteins are directly or indirectly interacting. Two models, namely the dosage [19] and the poison [20] model, have been used to explain the molecular mechanisms of the disruption (reviewed for oligogenic diseases in [16]).

The dosage model explains disruptions of two proteins within a complex. Mutations in one protein alone weaken the interaction but do not affect the phenotype. Only when the two proteins are mutated, the complex is not formed and the phenotype is affected. For instance, mutations that affect ligand-receptor interactions could be explained with such a model.

In the poison model, mutations in one of the proteins disrupt the complex but enough of the unchanged complexes are still available to maintain the function. Addition of another mutated subunit will further decrease the already reduced number of normal complexes, resulting in phenotype changes. The molecular models described earlier could be also used to explain indirect interactions between proteins (i.e. proteins that do not physically interact but participate in the same functional pathway).

The increasing knowledge about protein networks can be used towards identifying new genes and genetic mechanisms behind diseases. For instance, if the gene products (proteins) have any functional interaction, one could trace these proteins back to their respective genes and identify the genes responsible for the disease. Identifying genes associated with complex diseases from all possible candidates generated from genome-wide genetic linkage studies would involve searching through hundreds of genes. Several computational approaches to prioritise genes related to diseases have been developed to aid linkage analysis and association studies. Some of these methods rely on sequence and functional differences between disease-causing proteins and others not related to any disease. For example, sequence-based properties such as length, conservation across species and number of paralogs have been used to create disease classifiers [21–24].

Other methods tend to integrate several sources of data, like gene expression, gene ontology (GO) annotation, and disease phenotype annotation from the Medical Subject Headings (MeSH) and Online Mendelian Inheritance in Man (OMIM) [25–36]. The performance of seven of these computational methods [21, 23, 25, 30–32, 36] was recently reviewed by Tiffin et al. [37] and applied to the analysis of candidate genes for type2 diabetes (T2D) and obesity genes. The authors identified a set of primary candidates with nine T2D genes and five obesity genes (selected by six out of seven methods). They also generated a secondary set of 94 and 116 gene candidates for T2D and obesity, respectively (found by five out of seven methods), of which 58 of these genes were common to both diseases. This study reviews seven independent computational methods and illustrates how these methodologies can be used to identify genes related to complex diseases, followed by an interesting discussion on the integration of results from different methodologies.

Several of these methods explicitly incorporate protein interaction data. Oti et al. [38] specifically analysed the effect of incorporating protein–protein interaction into the prediction of disease-causing gene candidates. Their results suggest that the inclusion of interaction data will result in approximately a 10-fold improvement on gene candidate identification. These methods are limited by the quality and sparseness of the experimental protein interaction data. Advances in the experimental and computational approaches towards an accurate identification of the interactions will have an impact on the enhancement of these methods.

The problem of gene-phenotype association is complicated further by the fact that not only mutations in multiple genes could cause one disease but multiple syndromes could be caused by mutations in the same gene. These provide an explanation for the continuum of syndromes and their overlap (illustrated in Figure 2) [39]. For instance, the overlap of human malformation syndromes leads to the concept of ‘syndrome families’ [40]. It is possible that these diseases arise from the disruption of strongly related genes (i.e. in the same protein complex or pathway). For example, Fanconi anemia and Usher syndrome type 1 both represent diseases produced by mutations in several genes involved in the same complex [41, 42]. Sam et al. [43] have developed a method to compare diseases based on their shared network of protein interactions. Upon manual examination of significantly correlated disease pairs found by this method, the authors confirmed that, for several of them, the correlation was previously reported in literature. For instance, Cockayne syndrome and Xeroderma Pigmentosum are predicted to be correlated through their protein–protein interaction networks as expected from the literature [44].

Figure 2:

The interaction networks for two diseases are represented by two connected graphs (D1 and D2) and the overlapping Venn diagrams. The diseases share a set of related proteins, namely a ‘disease module’ (enclosed by a rectangle). The two hypothetical diseases would likely share a set of phenotypic features. The networks depicted are from an arbitrary set of proteins and were obtained using Cytoscape [137].

Figure 2:

The interaction networks for two diseases are represented by two connected graphs (D1 and D2) and the overlapping Venn diagrams. The diseases share a set of related proteins, namely a ‘disease module’ (enclosed by a rectangle). The two hypothetical diseases would likely share a set of phenotypic features. The networks depicted are from an arbitrary set of proteins and were obtained using Cytoscape [137].

A recently published method, developed by Lage et al. [45] illustrates the use of knowledge regarding protein interactions to prioritise genes within a linkage interval (these intervals were obtained from genetic linkage analysis data extracted from OMIM and GeneCards databases). The patient phenotype associated with this interval databases, is compared to the phenotype of all disease-related proteins interacting with each of the candidate proteins (gene products). A Bayesian predictor based on the pairwise score between phenotypes (obtained using text mining techniques) is used to rank all gene candidates extracted from OMIM's 870 linkage intervals linked to disease. This method, which capitalises on the fact that interacting proteins are often responsible for similar phenotypes, produced several novel putative disease-causing genes. This approach, and related ones, are limited by the reliability of protein networks and the sparseness of protein-disease association data and would greatly benefit from a more accurate description of phenotypes (as described subsequently).

Another major challenge is the integration and organization of phenotypic databases. NIH recently acknowledged this need by launching the whole genome association studies. The NCBI's database, dbGAP [46] provides open and controlled access to summary and individual data, respectively for several genotype association studies. A recent review by Lussier et al. [47] points to the challenges faced by the emerging field of high-throughput approaches for studying genotype–phenotype relationships, namely the field of phenomics. Advances in this area will depend on robust taxonomies for phenotypes and on the accuracy of their clinical description [48, 49].

CONTRIBUTIONS OF STRUCTURAL ANALYSIS TO THE UNDERSTANDING OF DISEASES

In many cases, a clear understanding of the malfunction that ultimately causes one or several diseases can only be achieved when the molecular level of the protein interactions are known. The three dimensional structure of the protein interaction complex, whether available or modeled, can provide such detail. Furthermore, understanding the binding at such level is critical for the rational design of new therapeutic agents targeted to disrupt interactions that cause disorders. The following section examines the contribution of structural analysis to the understanding of diseases, and provides an overview of several disease-related resources.

Protein structure, protein complexes and disease

To complement classical structural biology, the structural genomic (SG) projects aim to solve the X-ray or NMR structures in a high-throughput manner. The SG initiative's goal is to provide three-dimensional structural models for all proteins encoded by complete genomes [50]. Most experimentally derived structures, however, might not be directly related to any human disease [51]. This requires computational homology studies to obtain models for the human or pathogens proteins relevant to diseases. For instance, from over 40 000 proteins with known structure deposited in the protein data bank (PDB) [52], only a few hundred are known to be related to diseases. Several computational approaches have been implemented to predict function from protein sequence and structure information (see reviews in [53, 54]). However, experimental techniques are still needed to validate the functions of these proteins.

Studies of inherited or somatic non-synonymous mutations constitute the main source for the analysis of the etiology of diseases at the molecular level. A distinction should be made among those rare mutations responsible for functional disruption that lead to disease and the large number of common variations in the human genome derived from high-throughput single nucleotide polymorphisms (SNPs) analysis experiments [55]. The majority of the non-synonymous SNPs (nsSNPs), in particular those that are present in a large number of individuals, are probably not associated with any disease. Rare variants (found in a very small percentage of the population), on the other hand, tend to occur on structurally and functionally relevant sites. This suggests that structural information can be valuable for understanding the effect of mutations and nsSNPs [56]. Several computational methods based on stability, evolutionary and structural information have been developed to predict the impact of a mutation on the protein function. Resources related to this methods are listed in Table 1 (see review by Mooney for more details [57]). The main drawback of these methods is their low accuracy, which has been shown to improve with the addition of structural information [58, 59]. Even if the disruption of the function is correctly predicted, none of these methods offer insight on how the mutation affects the function. Table 2 provides a list of several publicly available genomic databases containing disease information. This list is by no means exhaustive. For instance, for well-studied diseases such as cancer, there are several disease-specific resources available that might or might not be encoded in the data sources listed subsequently (e.g. GeneCards, described subsequently, includes information from CGAP, the National Institute of Cancer's Cancer Anatomy Project). The OMIM database [60], manually curated and updated daily, is one of the largest catalogs of human genes and disorders. As part of the NCBI Entrez database, OMIM is freely available and contains over 11 000 genes with known sequence and over 6000 phenotypes. It should be noted that only a few hundred of the genes with known sequences currently annotated in OMIM have known phenotypes. Automatic approaches for linking genotype with phenotype information have the potential to overcome the data scarcity problem inherent in manual efforts. To this effect, several approaches (such as PhenoGo [61] that use natural language processing in combination with GO data) have been developed to create a collection of over 500 000 phenotype-GO associations, including approximately 33 000 genes from 10 species. Similarly, Gene2Disease automatically assigns priorities to genes related to a disease, and provides a list of candidates based on PubMed MeSH terms and GO. Another resource, Genecards [62], provides a suite of tools that integrate information from over 70 sources including OMIM, constituting a single location to retrieve available information for over 24 000 genes including relationships to diseases when available. The PhenomicDB [63] database uses associated orthology relations to provide multi-species genotype–phenotype mappings across human and several model organisms. The Orthodisease database provides a cluster of more than 3000 disease genes comprising 26 Eukaryotic organisms. Swissprot is a database of protein sequences that includes disease annotations for about 2600 of its 270 000 entries (16 600 are for human proteins). Finally, PharmaGKB [64] is a catalog of over 300 genes and 400 diseases (with genes involved in drug response), providing a single platform to study relationships between drugs, diseases and genes. Users will find that most of these databases are freely available (Genecards is limited to non-profit institutions) and their interface varies in flexibility and convenience. Almost all of them can be easily searched using related words in the query (disease or gene). In addition, the use of standard vocabularies and ontologies within all these databases needs to expand beyond GO, so that descriptions of disease phenotypes, cytological changes, and molecular mechanisms can be well-defined and standardised for better discoverability, correlations and mining. In general, while these databases provide an excellent resource, only a small proportion of the genomic data known to be involved in an inherited disease have both known gene sequence and known phenotype. Despite the scarcity of the structural data related to disease, Moult and collaborators have shown that for a set of genes associated with monogenic diseases, the loss of protein stability is a major factor contributing to disease [65].

Table 1:

Resources for SNP validation

SNPs3D [132http://www.snps3d.org/ Based on structure and sequence analysis. 
MutDB [133http://mutdb.org/ Provides structural and functional annotation 
Align GVGD [134http://agvgd.iarc.fr/agvgd_input.php Uses biochemical information 
PolyPhen [56http://genetics.bwh.harvard.edu/pph/ Uses straightforward physical and comparative considerations 
SIFT [135blocks.fhcrc.org/sift/SIFT.html Based on sequence homology and the physical properties of amino acids 
SNPs3D [132http://www.snps3d.org/ Based on structure and sequence analysis. 
MutDB [133http://mutdb.org/ Provides structural and functional annotation 
Align GVGD [134http://agvgd.iarc.fr/agvgd_input.php Uses biochemical information 
PolyPhen [56http://genetics.bwh.harvard.edu/pph/ Uses straightforward physical and comparative considerations 
SIFT [135blocks.fhcrc.org/sift/SIFT.html Based on sequence homology and the physical properties of amino acids 
Table 2:

Databases with disease annotation

OMIM [60http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIM A catalog of human genes and genetic disorders 
Genecards [62http://www.genecards.org A compendium of genes, proteins and diseases 
Swissprot http://www.ebi.ac.uk/swissprot A database of protein sequences with disease annotation 
PhenomicDB [63http://www.phenomicDB.de Phentoytpe-gentotype database integrating data from multiple organisms 
Gene2Disease http://www.ogic.ca/projects/g2d_2 A database of candidate genes for mapped inherited human diseases 
Orthodisease [136http://orthodisease.cgb.ki.se Eukaryotic Ortholog Groups for Disease Genes 
PhenoGo [61http://www.PhenoGO.org Computed database that provides phenotypic context to existing associations between gene products and gene ontology (GO) for multiple organisms 
PharmaGKB [64http://www.pharmgkb.org Pharmacogenetics research database 
OMIM [60http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIM A catalog of human genes and genetic disorders 
Genecards [62http://www.genecards.org A compendium of genes, proteins and diseases 
Swissprot http://www.ebi.ac.uk/swissprot A database of protein sequences with disease annotation 
PhenomicDB [63http://www.phenomicDB.de Phentoytpe-gentotype database integrating data from multiple organisms 
Gene2Disease http://www.ogic.ca/projects/g2d_2 A database of candidate genes for mapped inherited human diseases 
Orthodisease [136http://orthodisease.cgb.ki.se Eukaryotic Ortholog Groups for Disease Genes 
PhenoGo [61http://www.PhenoGO.org Computed database that provides phenotypic context to existing associations between gene products and gene ontology (GO) for multiple organisms 
PharmaGKB [64http://www.pharmgkb.org Pharmacogenetics research database 
Table 3:

Glossary of terms

Single nucleotide polymorphism (SNP) Sites in the human genome where individuals differ in their DNA sequence, often by a single base, usually with population frequencies greater than 1%. 
Non-synonymous SNP (nsSNP) SNP that produce an amino acid substitution. 
Coding SNP (cSNP) SNP located within the coding regions of genes. 
Mendelian disease Inherited disease with an inheritance pattern based on a single affected gene. The most common cause of Mendelian disease is caused by a cSNP. 
Complex disease Inherited disease caused by a number of factors and therefore do not follow simple Mendelian rules of inheritance. 
Epistasis When the action of one gene is modified by one or several genes. 
Pleiotropic gene A gene that affects more than one phenotype. 
Moonlighting enzymes A case of pleiotropic gene that produces metabolic enzymes with additional functional activities. 
Network Series of points or nodes interconnected by edges, edges can have direction or different weights. 
Diameter Ability of two nodes to interact with each other. 
Small-world networks Highly clustered networks with relatively short distance between any two nodes. 
Scale-free networks The majority of the nodes have few links and a few of them (hubs) are highly connected. 
Single nucleotide polymorphism (SNP) Sites in the human genome where individuals differ in their DNA sequence, often by a single base, usually with population frequencies greater than 1%. 
Non-synonymous SNP (nsSNP) SNP that produce an amino acid substitution. 
Coding SNP (cSNP) SNP located within the coding regions of genes. 
Mendelian disease Inherited disease with an inheritance pattern based on a single affected gene. The most common cause of Mendelian disease is caused by a cSNP. 
Complex disease Inherited disease caused by a number of factors and therefore do not follow simple Mendelian rules of inheritance. 
Epistasis When the action of one gene is modified by one or several genes. 
Pleiotropic gene A gene that affects more than one phenotype. 
Moonlighting enzymes A case of pleiotropic gene that produces metabolic enzymes with additional functional activities. 
Network Series of points or nodes interconnected by edges, edges can have direction or different weights. 
Diameter Ability of two nodes to interact with each other. 
Small-world networks Highly clustered networks with relatively short distance between any two nodes. 
Scale-free networks The majority of the nodes have few links and a few of them (hubs) are highly connected. 

Structural data, when used in combination with information about mutations responsible for disease, could be essential in unveiling the molecular mechanisms leading to diseases. A study by Thornton and collaborators showed that the patterns of mutations of residues associated with human inherited diseases (from OMIM database) are different from that of the large number of nsSNPs (from dbSNP database) [66]. Their results showed that mutations that lead to major changes in hydrophobicity were more frequent in the OMIM data than in dbSNP. In addition, they found that disease-related mutations are more likely to be buried in the protein structure than what would be expected for the average protein residues. These results are consistent with several other studies, including Viktup et al. [67] and a more recent study by Ye et al. [68]. Conversely, common nsSNPs are less likely to be in protein cores than expected on average, a feature that could be useful when predicting the functional impact of an nsSNP (discussed in the previous subsection).

On the other hand, the protein core represents only a fraction of all residues. Consequently, many of the disease related mutations lie in solvent accessible sites, suggesting that the analyses of these mutations might also shed a light on the mechanisms of the disease. For instance, Thornton and collaborators estimated that more than half of the disease-related mutations analysed in their study occur at solvent accessible sites [66]. In their analysis, Ye et al. found that disease-related mutations located in the protein surface tend to be clustered, forming surface patches, while SNPs are uniformly distributed [68]. These could explain the role of mutations in disease since mutations in the binding site would likely disrupt the protein interaction and function. To reach this conclusion, the authors compared the location and distribution of disease-related mutations with nsSNPs on a set of protein domains obtained through homology modeling of disease-related proteins. The authors verified that, for a smaller subset of experimentally determined structures, disease-related mutations are located mostly on the binding interfaces of proteins.

Protein structural analysis has helped to elucidate the molecular basis of several diseases. For example, a protein interaction disruption in Von Hippel-Lindau syndrome (VHL), a common mutation from Tyrosine to Histidine at residue 98 (represented as Tyr98His), which is part of the binding site, disrupts the binding of the VHL protein to a protein called the hypoxia-inducible factor (HIF). As a result, the VHL protein no longer degrades the HIF leading to the expression of angiogenic growth factors and local proliferation of blood vessels [69, 70]. Another example, given its central role in cancer, is the extensive study of mutations of the p53 tumor suppressor protein. Structural analysis of mutations in p53 could facilitate the dissection of their functional role, in particular their effect on DNA-binding that seems to be key in human cancers [71, 72]. For instance, a mutation in the DNA binding region (Arg273His) has been associated with Li-Fraumeni syndrome and low p53 DNA binding. On the other hand, a mutation of an Arginine in position 175 to Histidine is important in the stabilization of the protein which might regulate the binding to DNA.

In some cases, interactions between two proteins might involve order-disorder transitions in partially disordered regions of the interacting proteins during the binding process. These unstructured or disordered regions have been found to be involved in many disease mechanisms [73–75] (see [76] for a review on intrinsically disordered proteins). For instance, the cancer suppressor BRCA1 has been shown to contain intrinsically disordered regions through which it binds to several proteins [49]. Similarly, some bacteria pathogens’ surface proteins contain intrinsically disordered protein regions. The structural analysis of the host-pathogen protein interactions constitutes an excellent system for targeting by drug designers. The NMR determination of the structural complex of one of these surface proteins, namely the streptococcal fibronecting-binding protein (FnBP), bound to the human fibronectin provided the mechanistic details on how the bacterial target the host cell (see review [77]).

DISEASES AND PROTEIN INTERACTION NETWORKS

This section explores the study of protein networks, with a focus on protein–protein interactions, and their impact on understanding diseases. First, it provides an overview of the experimental and computational approaches that have been used to reconstruct the network of human protein interactions (or human interactome). It then lists the basic concepts that define the general properties of a network and introduces recent contributions to biology from this theoretical perspective. Finally, recent approaches to create disease-related protein interaction networks are discussed. Examples of experimental and computational methodologies for network reconstruction are provided.

From one to thousands of interactions

In the past, experimental techniques were limited to reveal a handful of protein–protein interactions at a time. For instance, genetic, biochemical and biophysical techniques mostly study individual interactions [78]. Recent high-throughput experimental analyses represent a dramatic change in the number of interaction data they generated, making possible the reconstruction of whole genome protein networks (see [79–84] and reviews in [85, 86]). These genome-wide analyses rely on the development of computational approaches to understand and visualise these data. Bioinformatics tools could also generate predictions of new functional roles of proteins from existing genomic data. Therefore, bioinformatics has a dual role in the context of protein interaction and diseases: prediction of putative protein interactions and of new gene-disease associations (see previous section), and development of a framework to integrate, represent, and visualise experimental data.

Computational techniques to predict protein interactions have been developed in parallel with experimental advances. These approaches rely on the fact that interacting proteins are more likely to be present in the same set of organisms [87, 88], to conserve the gene order [89, 90], or be fused in some organism [91, 92]. These methods have been successfully used to predict protein interactions but still have many limitations (see reviews [93–96]). The assumption that interacting proteins co-evolve to preserve their function has led to methods that rely on similarities between the evolutionary histories of interacting protein families to predict interacting partners [97–104]. These methods are widely applicable and only require the protein sequence as input. However, the signal from functional co-evolution can sometimes be difficult to detect, resulting in low accuracy in the predictions. Addressing this problem, this technique was recently improved by subtracting the signal from speciation events of unrelated sequences [105, 106] and removing high entropy regions (i.e. regions poorly conserved across species) of the sequences [107].

Proteins meet graph theory

Protein–protein interaction data obtained from high-throughput experimental approaches can be represented as a graph [108, 109]. Proteins constitute the nodes of this graph and interactions between the proteins are represented as lines connecting the nodes. Biological networks have been found to be comparable to communication and social networks. Protein–protein interaction and communication networks share several commonalities, such as scale-free and small-world properties (see definitions in Table 3) [110]. Scale-free networks are fairly robust against random errors but are highly vulnerable to perturbations in highly connected nodes [111]. Certain properties of the protein network could be used to differentiate disease from non-disease proteins. Based on this approach, Xu et al. [112] devised a classifier based on several topological features of the protein interaction network to predict genes related to disease. The classifier was trained on a set of non-disease genes and one of disease genes (from OMIM) and applied to a set of over 5000 human genes. As a result, 970 disease genes were identified with 792 of them already listed in OMIM. Some of the 178 newly predicted disease gene candidates were validated by biological experiments.

Protein interaction networks could be used to improve functional annotation since the function of some proteins could be inferred from their role in pathways or protein complexes [113, 114]. Likewise, information about key nodes on disease-related networks could be used in drug discovery. Drug target identification constitutes a good example of the potential of integrating structural data with high-throughput data [115]. The structural details on the binding or allosteric sites could be used to design molecules that affect protein function. On the other hand, the reconstruction of the different protein networks in which the potential target is involved (signaling, metabolic, regulatory, etc), is needed to predict the overall impact of the disruption. If, for example, the target is highly connected (a hub), its inhibition may affect many activities that are essential for the proper function of the cell and is therefore unsuitable as a drug target. Less connected nodes affecting mainly the pathway that leads to disease, on the other hand, could constitute vulnerable points of the disease-related network, thus, they are better candidates for drug target. Ultimately, a more complex system biology approach that integrates and mathematically models the gene, protein and pathway responses would be needed to fully characterise the effects of the system disruptions caused by the drug.

Reconstructing the interaction networks of proteins and its mutants involved in a disease might be the key to understanding the differences between healthy and disease organisms. Recent work by Goehler et al. [116] on HD illustrates the potential of those approaches. HD is an autosomal dominant neurodegenerative disease. Currently, there is no pharmacological treatment to prevent the progression of this rare inherited disorder [117]. HD is caused by the repeat expansion of the trinucleotide CAG in the Huntingtin (Htt) gene and is one of several polyglutamine (or polyQ) diseases. This expansion causes aggregation of the mutant Htt in insoluble neuronal inclusion bodies which consequently leads to neuronal degeneration. Goehler et al. [116] reported an experimental strategy to generate the protein–protein interaction network of all proteins related to HD, revealing many new interactions and permitting the functional annotation of several uncharacterised proteins. Most importantly, they discovered an interaction of the Htt with GIT1, a GTPase-activating protein which seems to be required for the Htt aggregation. Upon further validation, the GTI1 could constitute an excellent target for therapeutical strategies [118, 119]. Towards a similar goal, Lim and collaborators [120] developed the network of the interactions among proteins related to ataxias and disorders of Purkinje cell degeneration. They found that most of the proteins related to ataxias interact directly or indirectly with each other. A more recent study corroborates these findings across all of the disease proteins from OMIM [121]. Thus, proteins related to a disease are more likely to interact with proteins already known to cause similar diseases. This motivates several of the gene prioritization computational studies presented in the previous section. Chen et al. [122] presented a computational approach to test and confirm this principle for the subnetwork of interacting proteins associated with Alzeheimer's disease (AD) (see [123] for more details about AD and other diseases associated with aggregates, namely amyloid fibrils, that result from protein misfolding). Chen et al. [122] devised a computational method to enrich the AD's subnetwork based on a heuristic score. The score prioritises proteins with high specificity (favoring the addition of low promiscuously connected proteins) and with high confidence on their interaction data (this weighting addresses the problem of unreliability of the interaction data). In an attempt to derive common features among cancer proteins, Jonsson and Bates [124] performed a systematic computational study of a subset of proteins related to cancer. The authors found that the network topology of the cancer related proteins is quite different from those not involved in the disease, i.e. it was found that cancer proteins are highly connected with other cancer-related proteins. In addition, a study of the protein network of herpesvirus performed by Uetz et al. [125] indicates that viral networks differ significantly from cellular networks, which raises the hypothesis that other intracellular pathogens might also have distinguishing topologies. In a recent study, Goh et al. [126] explored the properties of the human disease network for all known phenotype disease gene associations. The authors found that genes that are essential in early development tend to encode highly connected proteins (hub proteins). Surprisingly, their results suggest that the vast majority of disease-related genes are non-essential and show no tendency to encode hub proteins.

CONCLUSIONS AND FINAL REMARKS

Protein interactions are involved in metabolic, signaling, immune and gene-regulatory networks. A better understanding of protein interactions, either with other proteins or with DNA, RNA, membrane or small molecules, could reveal the molecular mechanism of the processes leading to diseases. Mutations in the protein interaction interface (or related sites, e.g. active sites, allosteric binding sites) could evidently disrupt the protein interaction. For example, the etiology of several of the diseases mentioned in this review lies in the disruption of the protein–DNA interaction (e.g. p53 in cancer). For other diseases, the main cause is the disruption of the stability or protein-folding, thus destroying one or several protein–protein interactions (e.g. VHL), or creating new undesired ones (e.g. unfolded proteins tend to aggregate, as in AD and HD). Clearly pathogen-host protein interactions are central for bacteria or viruses hijacking the host immune system.

The study of the phenotypic commonalities of several disorders points to common modules that are responsible for inherited diseases. Evidence of these modules, namely protein complexes or functionally related proteins, emerges from several studies of protein and gene interactions reviewed here. However, several other factors could influence the final disease outcome. These confounders challenge the concept of modularity of genetic diseases (see [39]). Understanding the role of these factors, as illustrated in Figure 1, is a challenge central to the etiology of both Mendelian and complex diseases.

The processes leading to diseases are extremely complex, and so are the proteins and interactions involved in them. Most of the methodologies reviewed here used a simplistic ‘static’ view of the protein and their networks. In reality, proteins are continuously being synthesised from and degraded into amino acids. The kinetics of processes and network dynamics need to be considered to achieve a complete understanding of how the disruptions of proteins and their interactions lead to disease. Finally, it is important to also consider the context-specific (tissue, disease stage and response) functions of protein interactions.

It is clear, then, that a gap still exists between the identification of the disease-associated protein interaction network and the complete understanding of the disease mechanism. The gap is filled, unfortunately, with more questions than answers. The approaches reviewed here generated a considerable amount of valuable data, but also the need for further validation. To that purpose, a number of studies have used data from simpler organisms, such as worm (Caenorhabditis elegans), fly (Drosophila melanogaster) or yeast (Saccharomyces cerevisae). There are limitations, however, to the transfer of interaction annotation across species, in particular those from distantly related organisms [127]. A study of the overlap between the interaction networks of fly, worm, yeast and human data showed that there are only a few conserved interactions among these organisms [121]. Despite these limitations, modeling different aspects of a disease in simpler organisms has proven to be extremely useful. For instance, several aspects of the polyQ diseases (see previous section) were modeled using worm, fly and yeast [128–131].

We are still far from the goal of understanding the etiology of most diseases, further advances on relevant experimental technology, i.e. genetic linkage, protein interaction, protein structure, gene expression, along with computational tools to organise, visualise and integrate these data will provide a step forward in that direction. In particular, the completion of the human protein interactome will provide data that could enhance several of the methodologies reviewed here. In addition, a systematic experimental genome-wide study of protein interactions between host and pathogen, which is not yet available in the literature, could provide insight into the bacteria, virus or parasite mechanisms of pathogenicity. In addition, valuable information about disease and protein interactions is buried within millions of biomedical records. Text mining approaches are therefore essential to recover such information. Indeed, several of the databases and methods to prioritise disease-related genes discussed in previous sections (e.g. Lage et al. [45]) have successfully incorporated text mining techniques.

Ideally, since network and structural approaches are complementary, the combination of network studies with a more detailed structural analysis has the potential to be an excellent framework for the study of disease mechanisms and rational design of drugs. In future, this strategy, and others discussed here, should be integrated into multidisciplinary disease-specific projects that provide a better understanding of a particular disease and help identify disease modules (if any) that are common to related disorders.

Key Points

  • Disruption of an existing protein interaction (by changing the stability of the protein and/or inhibiting the ability to bind to other molecules), production of new undesirable interactions (through mutations that result in misfolding of the protein and aggregation) and disruption of protein-DNA interactions (by affecting gene regulation) are the causes of many diseases.

  • Mutations in one gene could affect other genes. If their protein products interact, the resulting disruption of this interaction could lead to a disease.

  • Analysis of disease-related protein networks confirms that proteins involved in a disease tend to interact with other proteins involved in the same disease.

  • A gene could have a role in several diseases, thus, many diseases could share interaction subnetworks.

Funding

Support for this work was provided by the Intramural Research Program of the National Institutes of Health, National Library of Medicine.

Acknowledgments

Thanks to all the scientists (included or not in this review) that contributed with their excellent work to the field of research reviewed here. Many thanks to Sonia Leach, Leonardo Marino, Eric Neumann, Anna Panchenko, Pedja Ravidojac, Willy Valdivia, Adam Godzik and anonymous reviewers for their helpful comments on the manuscript, to the NIH Fellows Editorial Board for their editorial work, and to Robert Yates for his help in the graphic design of the pictures.

References

Zuckerkandl
E
Pauling
L
Kasha
M
Pullman
B
Molecular disease, evolution, and genic heterogeneity
Horizons in Biochemistry
 , 
1962
New York
Academic Press
pg. 
189
 
Botstein
D
White
RL
Skolnick
M
, et al.  . 
Construction of a genetic linkage map in man using restriction fragment length polymorphisms
Am J Hum Genet
 , 
1980
, vol. 
32
 (pg. 
314
-
31
)
Kerem
B
Rommens
JM
Buchanan
JA
, et al.  . 
Identification of the cystic fibrosis gene: genetic analysis
Science
 , 
1989
, vol. 
245
 (pg. 
1073
-
80
)
Riordan
JR
Rommens
JM
Kerem
B
, et al.  . 
Identification of the cystic fibrosis gene: cloning and characterization of complementary DNA
Science
 , 
1989
, vol. 
245
 (pg. 
1066
-
73
)
Gusella
JF
Wexler
NS
Conneally
PM
, et al.  . 
A polymorphic DNA marker genetically linked to Huntington's disease
Nature
 , 
1983
, vol. 
306
 (pg. 
234
-
8
)
Miki
Y
Swensen
J
Shattuck-Eidens
D
, et al.  . 
A strong candidate for the breast and ovarian cancer susceptibility gene BRCA1
Science
 , 
1994
, vol. 
266
 (pg. 
66
-
71
)
Wooster
R
Bignell
G
Lancaster
J
, et al.  . 
Identification of the breast cancer susceptibility gene BRCA2
Nature
 , 
1995
, vol. 
378
 (pg. 
789
-
92
)
Botstein
D
Risch
N
Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease
Nat Genet
 , 
2003
, vol. 
33
 
Suppl
(pg. 
228
-
237
)
Scriver
CR
Waters
PJ
Monogenic traits are not simple: lessons from phenylketonuria
Trends Genet
 , 
1999
, vol. 
15
 (pg. 
267
-
72
)
Sriram
G
Martinez
JA
McCabe
ERB
, et al.  . 
Single-gene disorders: what role could moonlighting enzymes play?
Am J Hum Genet
 , 
2005
, vol. 
76
 (pg. 
911
-
24
)
Dipple
KM
McCabe
ER
Modifier genes convert "simple" Mendelian disorders to complex traits
Mol Genet Metab
 , 
2000
, vol. 
71
 (pg. 
43
-
50
)
Dipple
KM
McCabe
ER
Phenotypes of patients with “simple” Mendelian disorders are complex traits: thresholds, modifiers, and systems dynamics
Am J Hum Genet
 , 
2000
, vol. 
66
 (pg. 
1729
-
35
)
Groman
JD
Meyer
ME
Wilmott
RW
, et al.  . 
Variant cystic fibrosis phenotypes in the absence of CFTR mutations
N Engl J Med
 , 
2002
, vol. 
347
 (pg. 
401
-
7
)
Sun
H
Smallwood
PM
Nathans
J
Biochemical defects in ABCR protein variants associated with human retinopathies
Nature Genet
 , 
2000
, vol. 
26
 (pg. 
242
-
6
)
Agarwal
S
Moorchung
N
Modifier genes and oligogenic disease
J Nippon Med Sch
 , 
2005
, vol. 
72
 (pg. 
326
-
34
)
Badano
JL
Katsanis
N
Beyond Mendel: an evolving view of human genetic disease transmission
Nat Rev Genet
 , 
2002
, vol. 
3
 (pg. 
779
-
89
)
Van Heyningen
V
Yeyati
PL
Mechanisms of non-Mendelian inheritance in genetic disease
Hum Mol Genet
 , 
2004
, vol. 
13
 
2
(pg. 
R225
-
33
)
Mayeux
R
Mapping the new frontier: complex genetic disorders
J Clin Invest
 , 
2005
, vol. 
115
 (pg. 
1404
-
7
)
Fuller
MT
Interacting genes identify interacting proteins involved in microtubule function in Drosophila
Cell Motil Cytoskeleton
 , 
1989
, vol. 
14
 (pg. 
128
-
35
)
Stearns
T
Botstein
D
Unlinked noncomplementation: isolation of new conditional-lethal mutations in each of the tubulin genes of Saccharomyces cerevisiae
Genetics
 , 
1988
, vol. 
119
 (pg. 
249
-
60
)
Adie
EA
Adams
RR
Evans
KL
, et al.  . 
Speeding disease gene discovery by sequence based candidate prioritization
BMC Bioinformatics
 , 
2005
, vol. 
6
 pg. 
55
 
Furney
SJ
Higgins
DG
Ouzounis
CA
, et al.  . 
Structural and functional properties of genes involved in human cancer
BMC Genomics
 , 
2006
, vol. 
7
 pg. 
3
 
Lopez-Bigas
N
Ouzounis
CA
Genome-wide identification of genes likely to be involved in human genetic disease
Nucleic Acids Res
 , 
2004
, vol. 
32
 (pg. 
3108
-
14
)
Tu
Z
Wang
L
Xu
M
, et al.  . 
Further understanding human disease genes by comparing with housekeeping genes and other genes
BMC Genomics
 , 
2006
, vol. 
7
 pg. 
31
 
Adie
EA
Adams
RR
Evans
KL
, et al.  . 
SUSPECTS: enabling fast and effective prioritization of positional candidates
Bioinformatics
 , 
2006
, vol. 
22
 (pg. 
773
-
4
)
Franke
L
Bakel
H
Fokkens
L
, et al.  . 
Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes
Am J Hum Genet
 , 
2006
, vol. 
78
 (pg. 
1011
-
25
)
George
RA
Liu
JY
Feng
LL
, et al.  . 
Analysis of protein sequence and interaction data for candidate disease gene prediction
Nucleic Acids Res
 , 
2006
, vol. 
34
 pg. 
e130
 
Ma
X
Lee
H
Wang
L
, et al.  . 
CGI: a new approach for prioritizing genes by combining gene expression and protein-protein interaction data
Bioinformatics
 , 
2007
, vol. 
23
 (pg. 
215
-
21
)
Perez-Iratxeta
C
Bork
P
Andrade
MA
Association of genes to genetically inherited diseases using data mining
Nat Genet
 , 
2002
, vol. 
31
 (pg. 
316
-
9
)
Perez-Iratxeta
C
Wjst
M
Bork
P
, et al.  . 
G2D: a tool for mining genes associated with disease
BMC Genet
 , 
2005
, vol. 
6
 pg. 
45
 
Tiffin
N
Kelso
JF
Powell
AR
, et al.  . 
Integration of text- and data-mining using ontologies successfully selects disease gene candidates
Nucleic Acids Res
 , 
2005
, vol. 
33
 (pg. 
1544
-
52
)
Turner
FS
Clutterbuck
DR
Semple
CAM
POCUS: mining genomic sequence annotation to predict disease genes
Genome Biol
 , 
2003
, vol. 
4
 pg. 
R75
 
Rossi
S
Masotti
D
Nardini
C
, et al.  . 
TOM: a web-based integrated approach for identification of candidate disease genes
Nucleic Acids Res
 , 
2006
, vol. 
34
 (pg. 
W285
-
92
)
Aerts
S
Lambrechts
D
Maity
S
, et al.  . 
Gene prioritization through genomic data fusion
Nat Biotechnol
 , 
2006
, vol. 
24
 (pg. 
537
-
44
)
Subramanian
A
Tamayo
P
Mootha
VK
, et al.  . 
Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles
Proc Natl Acad Sci USA
 , 
2005
, vol. 
102
 (pg. 
15545
-
50
)
van Driel
MA
Cuelenaere
K
Kemmeren
PP
, et al.  . 
GeneSeeker: extraction and integration of human disease-related information from web-based genetic databases
Nucleic Acids Res
 , 
2005
, vol. 
33
 (pg. 
W758
-
61
)
Tiffin
N
Adie
E
Turner
F
, et al.  . 
Computational disease gene identification: a concert of methods prioritizes type 2 diabetes and obesity candidate genes
Nucleic Acids Res
 , 
2006
, vol. 
34
 (pg. 
3067
-
81
)
Oti
M
Snel
B
Huynen
MA
, et al.  . 
Predicting disease genes using protein-protein interactions
J Med Genet
 , 
2006
, vol. 
43
 (pg. 
691
-
8
)
Oti
M
Brunner
H
The modular nature of genetic diseases
Clin Genet
 , 
2007
, vol. 
71
 (pg. 
1
-
11
)
Pinsky
L
The polythetic (phenotypic community) system of classifying human malformation syndromes
Birth Defects Orig Artic Ser
 , 
1977
, vol. 
13
 (pg. 
13
-
30
)
Garcia-Higuera
I
Taniguchi
T
Ganesan
S
, et al.  . 
Interaction of the Fanconi anemia proteins and BRCA1 in a common pathway
Mol Cell
 , 
2001
, vol. 
7
 (pg. 
249
-
62
)
Mace
G
Bogliolo
M
Guervilly
JH
, et al.  . 
3R coordination by Fanconi anemia proteins
Biochimie
 , 
2005
, vol. 
87
 (pg. 
647
-
58
)
Sam
L
Liu
Y
Jianrong
L
, et al.  . 
Discovery of protein interaction networks shared by diseases
Pac Symp Biocomput
 , 
2007
, vol. 
12
 (pg. 
76
-
87
)
Spivak
G
The many faces of Cockayne syndrome
Proc Natl Acad Sci USA
 , 
2004
, vol. 
101
 (pg. 
15273
-
4
)
Lage
K
Karlberg
EO
Stãrling
ZM
, et al.  . 
A human phenome-interactome network of protein complexes implicated in genetic disorders
Nat Biotechnol
 , 
2007
, vol. 
25
 (pg. 
309
-
16
)
dbGAP
 
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gap (July 4, 2007, data last accessed).
Lussier
YA
Liu
Y
Computational approaches to phenotyping: high-throughput phenomics
Proc Am Thorac Soc
 , 
2007
, vol. 
4
 (pg. 
18
-
25
)
Scriver
CR
After the genome–the phenome?
J Inherit Metab Dis
 , 
2004
, vol. 
27
 (pg. 
305
-
17
)
Butte
AJ
Kohane
IS
Creation and implications of a phenome-genome network
Nat Biotechnol
 , 
2006
, vol. 
24
 (pg. 
55
-
62
)
Brenner
SE
A tour of structural genomics
Nature Reviews Genetics
 , 
2001
, vol. 
2
 (pg. 
801
-
9
)
Todd
AE
Marsden
RL
Thornton
JM
, et al.  . 
Progress of structural genomics initiatives: an analysis of solved target structures
J Mol Biol
 , 
2005
, vol. 
348
 (pg. 
1235
-
60
)
Berman
HM
Westbrook
J
Feng
Z
, et al.  . 
The protein data bank
Nucleic Acids Res
 , 
2000
, vol. 
28
 (pg. 
235
-
42
)
Bartlett
GJ
Todd
AE
Thornton
JM
Inferring protein function from structure
Methods Biochem Anal
 , 
2003
, vol. 
44
 (pg. 
387
-
407
)
Ofran
Y
Punta
M
Schneider
R
, et al.  . 
Beyond annotation transfer by homology: novel protein-function prediction methods to assist drug discovery
Drug Discov Today
 , 
2005
, vol. 
10
 (pg. 
1475
-
82
)
The International HapMap C
A haplotype map of the human genome
Nature
 , 
2005
, vol. 
437
 (pg. 
1299
-
320
)
Sunyaev
S
Ramensky
V
Bork
P
Towards a structural basis of human non-synonymous single nucleotide polymorphisms
Trends Genet
 , 
2000
, vol. 
16
 (pg. 
198
-
200
)
Mooney
S
Bioinformatics approaches and resources for single nucleotide polymorphism functional analysis
Brief Bioinform
 , 
2005
, vol. 
6
 (pg. 
44
-
56
)
Ng
PC
Henikoff
S
Accounting for human polymorphisms predicted to affect protein function
Genome Res
 , 
2002
, vol. 
12
 (pg. 
436
-
46
)
Saunders
CT
Baker
D
Evaluation of structural and evolutionary contributions to deleterious mutation prediction
J Mol Biol
 , 
2002
, vol. 
322
 (pg. 
891
-
901
)
Hamosh
A
Scott
AF
Amberger
JS
, et al.  . 
Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders
Nucleic Acids Res
 , 
2005
, vol. 
33
 (pg. 
D514
-
7
)
Lussier
Y
Borlawsky
T
Rappaport
D
, et al.  . 
PhenoGO: assigning phenotypic context to gene ontology annotations with natural language processing
Pac Symp Biocomput
 , 
2006
, vol. 
11
 (pg. 
64
-
75
)
Safran
M
Chalifa-Caspi
V
Shmueli
O
, et al.  . 
Human Gene-Centric Databases at the Weizmann Institute of Science: GeneCards, UDB, CroW 21 and HORDE
Nucleic Acids Res
 , 
2003
, vol. 
31
 (pg. 
142
-
6
)
Kahraman
A
Avramov
A
Nashev
LG
, et al.  . 
PhenomicDB: a multi-species genotype/phenotype database for comparative phenomics
Bioinformatics
 , 
2005
, vol. 
21
 (pg. 
418
-
20
)
Altman
RB
PharmGKB: a logical home for knowledge relating genotype to drug response phenotype
Nat Genet
 , 
2007
, vol. 
39
 pg. 
426
 
Yue
P
Li
Z
Moult
J
Loss of protein structure stability as a major causative factor in monogenic disease
J Mol Biol
 , 
2005
, vol. 
353
 (pg. 
459
-
73
)
Steward
RE
MacArthur
MW
Laskowski
RA
, et al.  . 
Molecular basis of inherited diseases: a structural perspective
Trends Genet
 , 
2003
, vol. 
19
 (pg. 
505
-
13
)
Vitkup
D
Sander
C
Church
GM
The amino-acid mutational spectrum of human genetic disease
Genome Biol
 , 
2003
, vol. 
4
 pg. 
R72
 
Ye
Y
Li
Z
Godzik
A
Modeling and analyzing three-dimensional structures of human disease proteins
Pac Symp Biocomput
 , 
2006
(pg. 
439
-
50
)
Brauch
H
Kishida
T
Glavac
D
, et al.  . 
Von Hippel-Lindau (VHL) disease with pheochromocytoma in the Black Forest region of Germany: evidence for a founder effect
Hum Genet
 , 
1995
, vol. 
95
 (pg. 
551
-
6
)
Ohh
M
Park
CW
Ivan
M
, et al.  . 
Ubiquitination of hypoxia-inducible factor requires direct binding to the beta-domain of the von Hippel-Lindau protein
Nat Cell Biol
 , 
2000
, vol. 
2
 (pg. 
423
-
7
)
Martin
AC
Facchiano
AM
Cuff
AL
, et al.  . 
Integrating mutation data and structural analysis of the TP53 tumor-suppressor protein
Hum Mutat
 , 
2002
, vol. 
19
 (pg. 
149
-
64
)
Cho
Y
Gorina
S
Jeffrey
PD
, et al.  . 
Crystal structure of a p53 tumor suppressor-DNA complex: understanding tumorigenic mutations
Science
 , 
1994
, vol. 
265
 (pg. 
346
-
55
)
Ryan
DP
Matthews
JM
Protein-protein interactions in human disease
Curr Opin Struct Biol
 , 
2005
, vol. 
15
 (pg. 
441
-
6
)
Cheng
Y
LeGall
T
Oldfield
CJ
, et al.  . 
Abundance of intrinsic disorder in protein associated with cardiovascular disease
Biochemistry
 , 
2006
, vol. 
45
 (pg. 
10448
-
60
)
Iakoucheva
LM
Brown
CJ
Lawson
JD
, et al.  . 
Intrinsic disorder in cell-signaling and cancer-associated proteins
J Mol Biol
 , 
2002
, vol. 
323
 (pg. 
573
-
4
)
Dunker
AK
Lawson
JD
Brown
CJ
, et al.  . 
Intrinsically disordered protein
J Mol Graph Model
 , 
2001
, vol. 
19
 (pg. 
26
-
59
)
Schwarz-Linek
U
Hook
M
Potts
JR
Fibronectin-binding proteins of gram-positive cocci
Microbes Infect
 , 
2006
, vol. 
8
 (pg. 
2291
-
8
)
Fu
H
Fu
H
Protein-Protein Interactions: Methods and Applications (Methods in Molecular Biology).
2004
Totowa, New Jersey
Humana Press
pg. 
544
 
Ito
T
Chiba
T
Ozawa
R
, et al.  . 
A comprehensive two-hybrid analysis to explore the yeast protein interactome
Proc Natl Acad Sci USA
 , 
2001
, vol. 
98
 (pg. 
4569
-
74
)
Uetz
P
Giot
L
Cagney
G
, et al.  . 
A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae
Nature
 , 
2000
, vol. 
403
 (pg. 
623
-
7
)
Gavin
AC
Bosche
M
Krause
R
, et al.  . 
Functional organization of the yeast proteome by systematic analysis of protein complexes
Nature
 , 
2002
, vol. 
415
 (pg. 
141
-
7
)
Ho
Y
Gruhler
A
Heilbut
A
, et al.  . 
Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry
Nature
 , 
2002
, vol. 
415
 (pg. 
180
-
3
)
Gavin
AC
Aloy
P
Grandi
P
, et al.  . 
Proteome survey reveals modularity of the yeast cell machinery
Nature
 , 
2006
, vol. 
440
 (pg. 
631
-
6
)
Krogan
NJ
Cagney
G
Yu
H
, et al.  . 
Global landscape of protein complexes in the yeast Saccharomyces cerevisiae
Nature
 , 
2006
, vol. 
440
 (pg. 
637
-
43
)
Titz
B
Schlesner
M
Uetz
P
What do we learn from high-throughput protein interaction data?
Expert Rev Proteomics
 , 
2004
, vol. 
1
 (pg. 
111
-
21
)
Shoemaker
BA
Panchenko
AR
Deciphering protein–protein interactions. Part I. Experimental techniques and databases
PLoS Comput Biol
 , 
2007
, vol. 
3
 pg. 
e42
 
Huynen
MA
Bork
P
Measuring genome evolution
Proc Natl Acad Sci USA
 , 
1998
, vol. 
95
 (pg. 
5849
-
56
)
Pellegrini
M
Marcotte
EM
Thompson
MJ
, et al.  . 
Assigning protein functions by comparative genome analysis: protein phylogenetic profiles
Proc Natl Acad Sci USA
 , 
1999
, vol. 
96
 (pg. 
4285
-
8
)
Dandekar
T
Snel
B
Huynen
M
, et al.  . 
Conservation of gene order: a fingerprint of proteins that physically interact
Trends Biochem Sci
 , 
1998
, vol. 
23
 (pg. 
324
-
8
)
Overbeek
R
Fonstein
M
D'Souza
M
, et al.  . 
Use of contiguity on the chromosome to predict functional coupling
In Silico Biol
 , 
1999
, vol. 
1
 (pg. 
93
-
108
)
Marcotte
FM
Pellegrini
M
Ng
HL
, et al.  . 
Detecting protein function and protein-protein interactions from genome sequences
Science
 , 
1999
, vol. 
285
 (pg. 
751
-
3
)
Enright
AJ
Iliopoulos
I
Kyrpides
NC
, et al.  . 
Protein interaction maps for complete genomes based on gene fusion events
Nature
 , 
1999
, vol. 
402
 (pg. 
86
-
90
)
Shoemaker
BA
Panchenko
AR
Deciphering protein–protein interactions. Part II. Computational methods to predict protein and domain interaction partners
PLoS Comput Biol
 , 
2007
, vol. 
3
 pg. 
e42
 
Valencia
A
Pazos
F
Computational methods for the prediction of protein interactions
Curr Opin Struct Biol
 , 
2002
, vol. 
12
 (pg. 
368
-
73
)
Rost
B
Liu
J
Nair
R
, et al.  . 
Automatic prediction of protein function
Cell Mol Life Sci
 , 
2003
, vol. 
60
 (pg. 
2637
-
50
)
Shoemaker
BA
Panchenko
AR
Deciphering protein-protein interactions. Part II. Computational methods to predict protein and domain interaction partners
PLoS Comput Biol
 , 
2007
, vol. 
3
 pg. 
e43
 
Gertz
J
Elfond
G
Shustrova
A
, et al.  . 
Inferring protein interactions from phylogenetic distance matrices
Bioinformatics
 , 
2003
, vol. 
19
 (pg. 
2039
-
45
)
Goh
CS
Bogan
AA
Joachimiak
M
, et al.  . 
Co-evolution of proteins with their interaction partners
J Mol Biol
 , 
2000
, vol. 
299
 (pg. 
283
-
93
)
Goh
CS
Cohen
FE
Co-evolutionary analysis reveals insights into protein-protein interactions
J Mol Biol
 , 
2002
, vol. 
324
 (pg. 
177
-
92
)
Jothi
R
Kann
MG
Przytycka
TM
Predicting protein-protein interaction by searching evolutionary tree automorphism space
Bioinformatics
 , 
2005
, vol. 
21
 
Suppl 1
(pg. 
i241
-
50
)
Pazos
F
Helmer-Citterich
M
Ausiello
G
, et al.  . 
Correlated mutations contain information about protein-protein interaction
J Mol Biol
 , 
1997
, vol. 
271
 (pg. 
511
-
23
)
Pazos
F
Valencia
A
In silico two-hybrid system for the selection of physically interacting protein pairs
Proteins
 , 
2002
, vol. 
47
 (pg. 
219
-
27
)
Ramani
AK
Marcotte
EM
Exploiting the co-evolution of interacting proteins to discover interaction specificity
J Mol Biol
 , 
2003
, vol. 
327
 (pg. 
273
-
84
)
Jothi
R
Cherukuri
PF
Tasneem
A
, et al.  . 
Co-evolutionary analysis of domains in interacting proteins reveals insights into domain-domain interactions mediating protein-protein interactions
J Mol Biol
 , 
2006
, vol. 
362
 (pg. 
861
-
75
)
Pazos
F
Ranea
JA
Juan
D
, et al.  . 
Assessing protein co-evolution in the context of the tree of life assists in the prediction of the interactome
J Mol Biol
 , 
2005
, vol. 
352
 (pg. 
1002
-
15
)
Sato
T
Yamanishi
Y
Kanehisa
M
, et al.  . 
The inference of protein-protein interactions by co-evolutionary analysis is improved by excluding the information about the phylogenetic relationships
Bioinformatics
 , 
2005
, vol. 
21
 (pg. 
3482
-
9
)
Kann
MG
Jothi
R
Cherukuri
PF
, et al.  . 
Predicting protein domain interactions from coevolution of conserved regions
Proteins
 , 
2007
, vol. 
67
 (pg. 
811
-
20
)
Barabasi
AL
Oltvai
ZN
Network biology: understanding the cell's functional organization
Nat Rev Genet
 , 
2004
, vol. 
5
 (pg. 
101
-
13
)
Grindrod
P
Kibble
M
Review of uses of network and graph theory concepts within proteomics
Expert Rev Proteomics
 , 
2004
, vol. 
1
 (pg. 
229
-
38
)
Yook
SH
Oltvai
ZN
Barabasi
AL
Functional and topological characterization of protein interaction networks
Proteomics
 , 
2004
, vol. 
4
 (pg. 
928
-
42
)
Albert
R
Jeong
H
Barabasi
AL
Error and attack tolerance of complex networks
Nature
 , 
2000
, vol. 
406
 (pg. 
378
-
82
)
Xu
J
Li
Y
Discovering disease-genes by topological features in human protein-protein interaction network
Bioinformatics
 , 
2006
, vol. 
22
 (pg. 
2800
-
05
)
Huynen
MA
Snel
B
von Mering
C
, et al.  . 
Function prediction and protein networks
Curr Opin Cell Biol
 , 
2003
, vol. 
15
 (pg. 
191
-
8
)
Droit
A
Poirier
GG
Hunter
JM
Experimental and bioinformatic approaches for interrogating protein-protein interactions to determine protein function
J Mol Endocrinol
 , 
2005
, vol. 
34
 (pg. 
263
-
80
)
Jiang
Z
Zhou
Y
Using bioinformatics for drug target identification from the genome
Am J Pharmacogenomics
 , 
2005
, vol. 
5
 (pg. 
387
-
96
)
Goehler
H
Lalowski
M
Stelzl
U
, et al.  . 
A protein interaction network links GIT1, an enhancer of huntingtin aggregation, to Huntington's disease
Mol Cell
 , 
2004
, vol. 
15
 (pg. 
853
-
65
)
Herbst
M
Wanker
EE
Therapeutic approaches to polyglutamine diseases: combating protein misfolding and aggregation
Curr Pharm Des
 , 
2006
, vol. 
12
 (pg. 
2543
-
55
)
Duennwald
ML
Jagadish
S
Giorgini
F
, et al.  . 
A network of protein interactions determines polyglutamine toxicity
Proc Natl Acad Sci USA
 , 
2006
, vol. 
103
 (pg. 
11051
-
6
)
Giorgini
F
Muchowski
PJ
Connecting the dots in Huntington's disease with protein interaction networks
Genome Biol
 , 
2005
, vol. 
6
 pg. 
210
 
Lim
J
Hao
T
Shaw
C
, et al.  . 
A protein-protein interaction network for human inherited ataxias and disorders of Purkinje cell degeneration
Cell
 , 
2006
, vol. 
125
 (pg. 
801
-
14
)
Gandhi
TK
Zhong
J
Mathivanan
S
, et al.  . 
Analysis of the human protein interactome and comparison with yeast, worm and fly interaction datasets
Nat Genet
 , 
2006
, vol. 
38
 (pg. 
285
-
93
)
Chen
JY
Shen
C
Sivachenko
AY
Mining Alzheimer disease relevant proteins from integrated protein interactome data
Pac Symp Biocomput
 , 
2006
, vol. 
11
 (pg. 
367
-
78
)
Chiti
F
Dobson
CM
Protein misfolding, functional amyloid, and human disease
Annu Rev Biochem
 , 
2006
, vol. 
75
 (pg. 
333
-
66
)
Jonsson
PF
Bates
PA
Global topological features of cancer proteins in the human interactome
Bioinformatics
 , 
2006
, vol. 
22
 (pg. 
2291
-
7
)
Uetz
P
Dong
YA
Zeretzke
C
, et al.  . 
Herpesviral protein networks and their interaction with the human proteome
Science
 , 
2006
, vol. 
311
 (pg. 
239
-
42
)
Goh
KI
Cusick
ME
Valle
D
, et al.  . 
The human disease network
Proc Natl Acad Sci USA
 , 
2007
, vol. 
104
 (pg. 
8685
-
90
)
Mika
S
Rost
B
Protein-protein interactions more conserved within species than across species
PLoS Comput Biol
 , 
2006
, vol. 
2
 pg. 
e79
 
Willingham
S
Outeiro
TF
DeVit
MJ
, et al.  . 
Yeast genes that enhance the toxicity of a mutant huntingtin fragment or alpha-synuclein
Science
 , 
2003
, vol. 
302
 (pg. 
1769
-
72
)
Giorgini
F
Guidetti
P
Nguyen
Q
, et al.  . 
A genomic screen in yeast implicates kynurenine 3-monooxygenase as a therapeutic target for Huntington disease
Nat Genet
 , 
2005
, vol. 
37
 (pg. 
526
-
31
)
Kazemi-Esfarjani
P
Benzer
S
Genetic suppression of polyglutamine toxicity in Drosophila
Science
 , 
2000
, vol. 
287
 (pg. 
1837
-
40
)
Nollen
EA
Garcia
SM
van Haaften
G
, et al.  . 
Genome-wide RNA interference screen identifies previously undescribed regulators of polyglutamine aggregation
Proc Natl Acad Sci USA
 , 
2004
, vol. 
101
 (pg. 
6403
-
8
)
Yue
P
Melamud
E
Moult
J
SNPs3D: candidate gene and SNP selection for association studies
BMC Bioinformatics
 , 
2006
, vol. 
7
 pg. 
166
 
Dantzer
J
Moad
C
Heiland
R
, et al.  . 
MutDB services: interactive structural analysis of mutation data
Nucleic Acids Res
 , 
2005
, vol. 
33
 (pg. 
W311
-
4
)
Mathe
E
Olivier
M
Kato
S
, et al.  . 
Computational approaches for predicting the biological effect of p53 missense mutations: a comparison of three sequence analysis based methods
Nucleic Acids Res
 , 
2006
, vol. 
34
 (pg. 
1317
-
25
)
Ng
PC
Henikoff
S
SIFT: Predicting amino acid changes that affect protein function
Nucleic Acids Res
 , 
2003
, vol. 
31
 (pg. 
3812
-
4
)
O'Brien
KP
Westerlund
I
Sonnhammer
EL
OrthoDisease: a database of human disease orthologs
Hum Mutat
 , 
2004
, vol. 
24
 (pg. 
112
-
9
)
Shannon
P
Markiel
A
Ozier
O
, et al.  . 
Cytoscape: a software environment for integrated models of biomolecular interaction networks
Genome Res
 , 
2003
, vol. 
13
 (pg. 
2498
-
504
)