netalign - Network Alignment 858L Terms & Questions...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Network Alignment 858L Terms & Questions A homologous h Interolog = B h Species 1 Species 2 Are there conserved pathways? What is the minimum set of pathways required for life? Can we compare networks to develop an evolutionary distance? Aligning Networks Combining Sequence and Network Topology ❖ Let G1 = (V1, E1), G2, ... Gk be graphs, each giving noisy experimental estimations of interactions between proteins in organisms 1,.., k. ❖ If Gi = (Vi, Ei), we also have a function: sim(u,v) : Vi × Vj → R that gives the sequence similarity between u and v. G1 G2 Conservation ⇒ Functional Importance ❖ If a structure has withstood millions of years of the randomizing process of mutations, then it likely has an important function. ❖ “Structure” = DNA sequence, protein sequence, protein shape, network topology. ❖ So: appearance of similar topology in two widely separated organisms indicates a real, fundamental set of interactions. Also, by comparing graphs we can transfer knowledge about one organism to another. Local alignment: 1. Which nodes are dissimilar [low sim(u,v)] but have similar neighbors / neighborhoods? (e.g. Bandyopadhyay et al.) functional orthologs: proteins that play the same role, but may look very different. 2. Which edges are real and important, e.g. form a conserved pathway in the cell? Global alignment: Singh et al., 2007 propose: Maximum common subgraph: Find the largest graph H that is isomorphic to subgraphs of two given graphs G1 and G2. ❖ Graph Isomorphism: Given graphs G1 = (V1,E1), G2 = (V2,E2) each with n nodes, decide whether there is a one-to-one and onto function f : V1 → V2 such that (u,v) ∈ E1 ⇔ (f(u), f(v)) ∈ E2 ❖ Subgraph Isomorphism: Given graphs G1 = (V1,E1), G2 = (V2,E2), where G1 has k nodes and G2 has n > k nodes, decide whether there is a one-to-one function f : V1 → V2 such that (u,v) ∈ E1 ⇔ (f(u), f(v)) ∈ E2 ❖ Graph Isomorphism: Given graphs G1 = (V1,E1), G2 = (V2,E2) each with n nodes, decide whether there is a one-to-one and to onto function wn . no ard f : V1 → V2 tk o N such that (u,v) ∈ E1 ⇔ (f(u), f(v)) ∈ E2 ❖ -h P eN b Subgraph Isomorphism: Given graphs G1 = (V1,E1), G2 = (V2,E2), where G1 has k nodes and G2 has n > k nodes, decide whether there is a one-to-one function f : V1 → V2 such that (u,v) ∈ E1 ⇔ (f(u), f(v)) ∈ E2 NP om -c te. ple PathBLAST: (Kelley et al, 2003) PathBLAST Alignment Graph Nodes correspond to homologous pairs (A, a) where A is from one species, and a is from the other. Edges come in 3 types: • Direct. A-B and a-b interactions are present. • Gap. Edge A-B is present, and a & b are separated by 2 hops. • Mismatch. Both (A & B) and (a & b) are both separated by 2 hops. (Kelley et al, 2003) PathBLAST Scoring Function Pr[interaction] 1 p(v ) = Pr[Homology | Ev ] S (P ) := v ∈P P is a path in the alignment graph. log 0.9 q (e) = ￿ i∈e Pr[i] q(e) = product of interaction edges “contained” within the alignment edge sum over logs = product over scores ￿ 0.3 ≥3 p(v) = probability that proteins in v are really homologs. 0.1 2 p(Ev | H) estimated from Ev distributions in COG: p(v ) prandom + ￿ e∈P log q (e) qrandom prandom and qrandom are the average values of p(v) and q(e) in the graph. PathBLAST Search Procedure If G is directed, acyclic (DAG) then its easy to find a high-scoring path via dynamic programming. S(v,L ) = max-scoring path of length L that ends at v: S (v, L) = arg max u∈pred(v ) ￿ S (u, L − 1) + log p(v ) prandom q (u → v ) + log qrandom ￿ Because G is not directed, acyclic they randomly create a large number of DAGs by removing edges as follows: 1. Randomly rank vertices. 2. Direct edges from low to high rank. Run dynamic program on the random DAGs and take the highest scoring path. 2/L! chance that a path will be preserved. So repeat 5L! times. H. pylori & S. cerevisiae Find several (50) high-scoring paths Then, remove those edges & vertices and repeat. Overlay the identified paths. Revealed 5 conserved pathways. Contains proteins from both: DNA polymerase and Proteosome => evidence that they interact (Kelley et al, 2003) H. pylori & S. cerevisiae Find several (50) high-scoring paths Then, remove those edges & vertices and repeat. Overlay the identified paths. Revealed 5 conserved pathways. Contains proteins from both: DNA polymerase and Proteosome => evidence that they interact (Kelley et al, 2003) H. pylori & S. cerevisiae Find several (50) high-scoring paths Then, remove those edges & vertices and repeat. Overlay the identified paths. Revealed 5 conserved pathways. Contains proteins from both: DNA polymerase and Proteosome => evidence that they interact (Kelley et al, 2003) Some Notes • Goal: use a well-studied organism (yeast) to learn about a lessstudied organism (H. pylori). • There were only 7 directly shared edges between yeast & H. pylori. (you would expect 2.5 shared edges). • Within conserved pathways, proteins often were not paired with the protein with the most similar sequence. • Gap & mismatch edges were essential! 22% of the proteins in previous figure did not pair with their best sequence match Single pathways in bacteria often correspond to multiple pathways in yeast. (Yeast is suspected of having undergone multiple whole-genome duplications.) Proteins were not allowed to pair with themselves or their network neighbors. Yeast Paralogous Pathways (Kelley et al, 2003) Proteins were not allowed to pair with themselves or their network neighbors. Yeast Paralogous Pathways MAPK kinase signaling cascades were very common. All 120 yeast kinases share ~ 30% sequence similarity. Hence: p(v) similar for all pairs v of two kinases. Probably need to improve kinase matching procedure. (Kelley et al, 2003) Searching Can use local alignment to search: align a small query network to the large network. (Kelly et al, 2003) Searching Can use local alignment to search: align a small query network to the large network. (Kelly et al, 2003) PathBLAST Summary • Local graph alignment • Takes into account sequence similarity & topological patterns • Allows gaps and mismatches of length 1. • Scoring function ~ probability of the path existing. • Algorithm: fast, reasonable, but definitely a heuristic. • Searching & local alignment are very related. ...
View Full Document

Ask a homework question - tutors are online