Optimizing a global alignment of protein interaction networks

Motivation: The global alignment of protein interaction networks is a widely studied problem. It is an important first step in understanding the relationship between the proteins in different species and identifying functional orthologs. Furthermore, it can provide useful insights into the species’ evolution. Results: We propose a novel algorithm, PISwap, for optimizing global pairwise alignments of protein interaction networks, based on a local optimization heuristic that has previously demonstrated its effectiveness for a variety of other intractable problems. PISwap can begin with different types of network alignment approaches and then iteratively adjust the initial alignments by incorporating network topology information, trading it off for sequence information. In practice, our algorithm efficiently refines other well-studied alignment techniques with almost no additional time cost. We also show the robustness of the algorithm to noise in protein interaction data. In addition, the flexible nature of this algorithm makes it suitable for different applications of network alignment. This algorithm can yield interesting insights into the evolutionary dynamics of related species. Availability: Our software is freely available for non-commercial purposes from our Web site, http://piswap.csail.mit.edu/. Contact: bab@csail.mit.edu or csliao@ie.nthu.edu.tw Supplementary information: Supplementary data are available at Bioinformatics online.

Algorithm 1 : Given a weighted bipartite graph G = (X ∪ Y, E) with parameters α and c, find the optimum mapping M * .

Running-time Analysis
Theorem 1 Given a a weighted bipartite graph G = (X ∪ Y, E) with parameters α and c, the running time of Algorithm 1 is pseudo-polynomial time bounded in the worst case.
Proof. It is readily seen that the cardinality of a maximum-weight mapping M * is |M * | ≤ min{|X|, |Y |}. Note that the first step of Algorithm 1 to obtain a maximum weighted mapping M * by the Hungarian algorithm takes O(|M * | 3 ) time.
Let ∆ denote the maximum degree of a vertex in G X and G Y , i.e. the largest number of neighbors a vertex in X ∪ Y can have. Let B denote the largest similarity value for two sequences, i.e. B = max x∈X,y∈Y {s(x, y)}. In Step 2, we compute the topology similarity t(e) and the weight w(e) for each edge e = (x, y) ∈ M * . Since we consider all possible pairwise combinations between neighbors of x and neighbors of y, this requires O(|M * | × ∆ 2 ) time.
In Step 3, we find the candidate set S. We first compute the subsets pref er Y (x) and pref er X (y) for each vertex in X ∪ Y . The running time is bounded by O(|X| × |Y |) since it requires O(|Y |) (respectively O(|X|)) time to find the c highest-weighted neighbors in Y (respectively X) for each vertex x ∈ X (respectively y ∈ Y ), for any constant c. We then find all the edges e ∈ M * satisfying the properties ( ) for every edge e ∈ M * . For every edge e = (x, y) ∈ M * , there are at most c 2 edges in M * with one endpoint in pref er X (y) and the other endpoint in pref er Y (x) that e can be swapped with. Each of the c 2 possible weight differences swap(e, e ) and c 2 2 swap(e, e , e ) can be computed in constant time from the topology similarity and sequence similarity for every edge e ∈ M * . Hence Step 3 takes O(c 4 |M * |) time.
Step 4 is an iteration, and we first consider the time complexity of one iteration. The maximum value of swap(e, e ) and swap(e, e , e ) can be found in constant time by using a priority queue. The swap operation also takes constant time. For the two newly inserted edges of M * , we verify if they satisfy the properties ( ) in O(c 4 ) time as above. The last step of updating the values of t(e), w(e), swap(e, e ), and swap(e, e , e ) takes O(∆ 2 ) time, since only the edges e ∈ M * with one endpoint in Finally, consider the number of iterations of the while loop. The total sequence score is an integer and varies between 0 and |M * | × B, and similarly, the total topology score is an integer and varies between 0 and |M * | 2 ; the consecutive values of w(M * ) form a strictly increasing sequence whose length is bounded Step 4 is the dominating one, Algorithm 1 runs We note that the k-Opt technique can also be pseudo-polynomial time bounded by O(max{c 2(k−1) , ∆ 2 } × B × |M * | 3 ). Table S1 shows that the EC ratios improve significantly after applying the 2-Opt and 3-Opt heuristics to each pair of the older PPI networks; on the other hand, the FC values of the initial mappings and of those refined by PISwap do not differ substantially. This is similar to the situation observed with the more recent PPI networks in the main text.
In particular, as mentioned in the main text, the refinement of PISwap can be thought of as a topological improvement which can compensate for a sequence-based alignment and discover functional orthologs that are not derived by sequence-only approaches. For example, the result of our worm-fly experiments (CE vs. DM) demonstrated that PISwap predicted the worm protein C16C2.1 as a functional ortholog for fly protein CG2189, where the two proteins are homeodomain proteins which are involved in DNA binding; however, they were not mapped by a sequence-only approach such as Homologene and are not in Isobase. DM Table S1. Evaluation of alignments based on the initial mappings produced by Hungarian algorithm; CE = C. elegans, DM = D. melanogaster, SC = S. cerevisiae, HS = H. sapiens, and MM = M. musculus Figure S1 shows the results of using PISwap to refine the initial mappings produced by GRAAL, IsoRank, and PATH on the more recent PPI networks. Just as with the more recent PPI networks, PISwap significantly improves the EC score of these initial alignments. Similar to the results presented in the main text, the refining effects on the three alignment tools are different. The refined EC ratios are nearly double those of the initial mappings obtained by GRAAL; on the other hand, the EC ratios for IsoRank increase by 15% to 30% after refinement, and those for PATH increase dramatically by a factor of 3 to 16. Fig. S1. Evaluation of the refinement of the initial mappings obtained by GRAAL, IsoRank and PATH; each of the blue-series and red-series bars, respectively, represents the result before and after refinement by PISwap.