colorcode - Color Coding Speeding up Network Searches 858L...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Color Coding Speeding up Network Searches 858L Efficient Algorithms for Detecting Signaling Pathways in Protein Interaction Networks Scott, Ideker, Karp, Sharan RECOMB 2005 • Color Coding: Alon et al, 1995. Searching for High Scoring Paths Weighted network G: G might be an alignment graph, a PPI network, metabolic network, etc... u p(u,v) = probability this edge exists w(u,v) = - log p(u,v) v P = simple path Weight(P) = sum of w(u,v) values along its edges Length(P) = number of nodes in P Goal: Low-weight, simple, length-k paths Given: Graph G, a subset of nodes I, and a node v. Find: The lowest-weight path P that: (1) starts at some vertex in I (2) ends at v (3) is of length k and is simple (doesn’t use any vertex twice) Set I let’s us specify, e.g., that the path G= should start at a surface receptor protein. I= { } P v Is this Problem Hard? Given: Graph G, a subset of nodes I, and a node v. Find: The lowest-weight, simple, length-k path between I and v. Is this Problem Hard? Given: Graph G, a subset of nodes I, and a node v. Find: The lowest-weight, simple, length-k path between I and v. Yes. It’s NP-hard. Why? Is this Problem Hard? Given: Graph G, a subset of nodes I, and a node v. Find: The lowest-weight, simple, length-k path between I and v. Yes. It’s NP-hard. Why? Reduce Hamiltonian Cycle (HC) to it: To solve an HC instance <GH>, let G = GH, I = {v}, and k = n. Is this Problem Hard? Given: Graph G, a subset of nodes I, and a node v. Find: The lowest-weight, simple, length-k path between I and v. Yes. It’s NP-hard. Why? Reduce Hamiltonian Cycle (HC) to it: To solve an HC instance <GH>, let G = GH, I = {v}, and k = n. Without the simple condition or length-k condition, the problem is easy. Dynamic Programming Algorithm v∈S Set of ≤ k vertices W(v, S) := minimum weight of a simple path that starts at I, visits each vertex in S, and ends at v, and is of length |S|. W(v, S) := ∞ if no such path exists. ￿ 0 if v ∈ I W (v, {v }) = ∞ if v ￿∈ I W (v, S ) = min u∈S −{v } v W (u, S − {u}) + w(u, v ) Smaller size “S” set, so we can compute W(•, •) in order of increasing size of S. I= { } u v Ok, So: OPT(I, v ) = min W (v, S ) S :|S |=k What’s the running time? Note how “simple” this algorithm is: try all possible sets of k nodes, compute their optimal order, and return the best set. Ok, So: OPT(I, v ) = min W (v, S ) S :|S |=k Note how “simple” this algorithm is: try all possible sets of k nodes, compute their optimal order, and return the best set. What’s the running time? Number of sets we will consider = all possible subsets of nodes of size ≤ k = k￿ ￿ i=0 n i ￿ =n k For each set, computing the min takes at most O(k) steps. Therefore: Running time = O(knk). Color Coding • O(knk) is too slow for any interesting k. • Can we do better? • Idea: rather than keeping track of all of S, we’ll keep track of less information about which nodes we’ve already visited. • This will introduce a problem: we may miss the optimum path... Color Coding Main Step: Randomly color each node with a color from {1,2,...,k}. Let c(u) be the color of node u. Define: a path is “colorful” if it contains exactly 1 vertex of each color. Note: any colorful path is simple. So, we consider this modified problem: Given: Graph G, a subset of nodes I, and a node v. Find: The lowest-weight, colorful, length-k path between I and v. Color Coding DP Algorithm c(v) ∈ C Set of ≤ k colors W(v, C) := minimum weight of a path that starts at I, visits a vertex of each color in C, ends at v, and is of length |C|. W(v, C) := ∞ if no such path exists. ¯ W (v, C ) = min u:c(u)∈C −{c(v )} Intuition for faster run time: we must consider only 2k possible sets k￿￿ “C” instead of O(nk) ￿ k i=0 i ¯ W (u, C − {c(u)}) + w(u, v ) v “C” keeps track of the remaining allowed colors. = 2k Alternative View of Color Coding Algorithm Let I be the given starting node set Let colorings(u, j) be the set of valid path colorings for a path of length j-1 from I to u 1 I 5 2 For all u in I: colorings(u,1) = {c(u)} 2 For j = 1, ..., k: For every edge (u, w): For every C in colorings(u, j): If c(w) not in C: Add C ∪ {c(w)} to colorings(w, j+1). 8 5 u w {1,2,8} {1,5,8} {1,2,5,8} Alternative View of Color Coding Algorithm Let I be the given starting node set Let colorings(u, j) be the set of valid path colorings for a path of length j-1 from I to u 1 I 5 2 For all u in I: colorings(u,1) = {c(u)} 2 For j = 1, ..., k: For every edge (u, w): For every C in colorings(u, j): If c(w) not in C: Add C ∪ {c(w)} to colorings(w, j+1). 8 u Running time: k￿ ￿ j =0 ￿￿￿ k k |E | j = O(2 k |E |) j 5 w {1,2,8} {1,5,8} {1,2,5,8} So: We had an algorithm that was ≈ O(nk) We converted it into an ≈ O(2k) algorithm, but with an ε probability we’ll miss the optimal answer. n = 100 1020 1016 1012 108 104 2 4 6 k 8 10 What if the optimal path is not colorful? Have to repeat this procedure enough times so that the probability that that happens is low. What if the optimal path is not colorful? Have to repeat this procedure enough times so that the probability that that happens is low. k! ways to make a path colorful. kk ways to color a path. Pr[Path is colorful] = k!/kk ≥ e-k. Pr[OPT is colorful] ≥ e-k. Pr[OPT is not colorful] < (1-e-k) What if the optimal path is not colorful? Have to repeat this procedure enough times so that the probability that that happens is low. Repeat algorithm −e ln ￿ k k! ways to make a path colorful. times. kk ways to color a path. Pr[Path is colorful] = k!/kk ≥ e-k. Pr[OPT is colorful] ≥ e-k. Pr[OPT is not colorful] < (1-e-k) Pr[OPT is never colorful] ≤ ￿ ￿k −k −e ln ￿ 1−e = ￿￿ 1+ ln ￿ ≤e 1 −ek ￿−ek ￿ln ￿ =￿ 0.015 k!/kk 0.010 0.005 e-k 6 7 8 9 10 Running Times Yeast Network with ~4,500 nodes and ~14,500 edges: Pheromone Response Pathway STE2/3 STE3 STE3 AKR1 STE4/18 AKR1 STE4/18 CDC42 STE4 CDC24 CDC42 STE20 CDC24 BEM1 FAR1 STE11 STE11 BEM1 GPA1 STE5 STE7 STE7 STE5 STE50 FUS3 STE7 DIG1/2 KSS1 KSS1 FUS3 DIG1/2 STE12 STE12 (a) STE12 (b) (c) Collection of all lowKnownThe pheromone response signaling pathway in yeast. (a) The main chain of pathway Best length-9 pathway Fig. 2. weight paths between between STE3 and STE12 best path of the same length (9) in the known pathway, adapted from [13]. (b) The STE3 and STE12 the network. (c) The assembly of all light-weight paths starting at STE3 and ending Color Coding Summary • Turned a slow, O(nk) algorithm into a less-slow O (2k) algorithm that is correct with high probability. • • Used on yeast to identify signaling pathways. • Color Coding: Alon et al, 1995. Directly extends to finding good-scoring pathways in the alignment graph of PathBLAST. ...
View Full Document

This note was uploaded on 01/13/2012 for the course CMSC 423 taught by Professor Staff during the Fall '07 term at Maryland.

Ask a homework question - tutors are online