Unformatted text preview: Network Alignment
858L Terms & Questions A homologous h Interolog = B h Species 1 Species 2 Are there conserved pathways?
What is the minimum set of pathways required for life?
Can we compare networks to develop an evolutionary distance? Aligning Networks
Combining Sequence and Network Topology
❖ Let G1 = (V1, E1), G2, ... Gk be graphs, each giving noisy
experimental estimations of interactions between proteins
in organisms 1,.., k. ❖ If Gi = (Vi, Ei), we also have a function:
sim(u,v) : Vi × Vj → R
that gives the sequence similarity between u and v. G1 G2 Conservation ⇒ Functional Importance
❖ If a structure has withstood millions of years of the
randomizing process of mutations, then it likely has an
important function. ❖ “Structure” = DNA sequence, protein sequence, protein
shape, network topology. ❖ So: appearance of similar topology in two widely separated
organisms indicates a real, fundamental set of interactions.
Also, by comparing graphs we can transfer knowledge about
one organism to another. Local alignment:
1. Which nodes are dissimilar [low sim(u,v)] but have similar
neighbors / neighborhoods? (e.g. Bandyopadhyay et al.)
functional orthologs: proteins that play the same role,
but may look very diﬀerent.
2. Which edges are real and important, e.g. form a conserved
pathway in the cell? Global alignment:
Singh et al., 2007 propose:
Maximum common subgraph: Find the largest graph
H that is isomorphic to subgraphs of two given graphs G1
and G2. ❖ Graph Isomorphism: Given graphs G1 = (V1,E1), G2 = (V2,E2)
each with n nodes, decide whether there is a onetoone and
onto function
f : V1 → V2
such that (u,v) ∈ E1 ⇔ (f(u), f(v)) ∈ E2 ❖ Subgraph Isomorphism: Given graphs G1 = (V1,E1), G2 =
(V2,E2), where G1 has k nodes and G2 has n > k nodes, decide
whether there is a onetoone function
f : V1 → V2
such that (u,v) ∈ E1 ⇔ (f(u), f(v)) ∈ E2 ❖ Graph Isomorphism: Given graphs G1 = (V1,E1), G2 = (V2,E2)
each with n nodes, decide whether there is a onetoone and
to
onto function
wn .
no ard
f : V1 → V2
tk
o N such that (u,v) ∈ E1 ⇔ (f(u), f(v)) ∈ E2 ❖ h
P eN
b Subgraph Isomorphism: Given graphs G1 = (V1,E1), G2 =
(V2,E2), where G1 has k nodes and G2 has n > k nodes, decide
whether there is a onetoone function
f : V1 → V2
such that (u,v) ∈ E1 ⇔ (f(u), f(v)) ∈ E2 NP om
c te.
ple PathBLAST: (Kelley et al, 2003) PathBLAST Alignment Graph
Nodes correspond to homologous
pairs (A, a) where A is from one
species, and a is from the other.
Edges come in 3 types:
• Direct. AB and ab
interactions are present.
• Gap. Edge AB is present, and
a & b are separated by 2 hops.
• Mismatch. Both (A & B) and
(a & b) are both separated by 2
hops.
(Kelley et al, 2003) PathBLAST Scoring Function Pr[interaction]
1 p(v ) = Pr[Homology  Ev ] S (P ) := v ∈P
P is a path in
the alignment
graph. log 0.9 q (e) =
i∈e Pr[i] q(e) = product of interaction
edges “contained” within the
alignment edge sum over logs
= product over
scores 0.3 ≥3 p(v) = probability that proteins
in v are really homologs. 0.1 2 p(Ev  H) estimated from
Ev distributions in COG: p(v )
prandom + e∈P log q (e)
qrandom prandom and qrandom are the average
values of p(v) and q(e) in the graph. PathBLAST Search Procedure
If G is directed, acyclic (DAG) then its easy to ﬁnd a highscoring
path via dynamic programming. S(v,L ) = maxscoring path of
length L that ends at v:
S (v, L) = arg max u∈pred(v )
S (u, L − 1) + log p(v )
prandom q (u → v )
+ log
qrandom Because G is not directed, acyclic they randomly create a large
number of DAGs by removing edges as follows:
1. Randomly rank vertices.
2. Direct edges from low to high rank.
Run dynamic program on the random DAGs and take the
highest scoring path.
2/L! chance that a path will be preserved.
So repeat 5L! times. H. pylori &
S. cerevisiae
Find several (50)
highscoring paths
Then, remove those
edges & vertices
and repeat.
Overlay the
identiﬁed paths.
Revealed 5
conserved
pathways.
Contains proteins from both:
DNA polymerase and
Proteosome => evidence that
they interact (Kelley et al, 2003) H. pylori &
S. cerevisiae
Find several (50)
highscoring paths
Then, remove those
edges & vertices
and repeat.
Overlay the
identiﬁed paths.
Revealed 5
conserved
pathways.
Contains proteins from both:
DNA polymerase and
Proteosome => evidence that
they interact (Kelley et al, 2003) H. pylori &
S. cerevisiae
Find several (50)
highscoring paths
Then, remove those
edges & vertices
and repeat.
Overlay the
identiﬁed paths.
Revealed 5
conserved
pathways.
Contains proteins from both:
DNA polymerase and
Proteosome => evidence that
they interact (Kelley et al, 2003) Some Notes
• Goal: use a wellstudied organism (yeast) to learn about a lessstudied organism (H. pylori). • There were only 7 directly shared edges between yeast & H.
pylori. (you would expect 2.5 shared edges). • Within conserved pathways, proteins often were not paired with
the protein with the most similar sequence. • Gap & mismatch edges were essential! 22% of the proteins in previous ﬁgure did not pair with their
best sequence match Single pathways in bacteria often correspond to multiple
pathways in yeast. (Yeast is suspected of having undergone
multiple wholegenome duplications.) Proteins were not allowed to pair with themselves or their network neighbors. Yeast Paralogous Pathways (Kelley et al, 2003) Proteins were not allowed to pair with themselves or their network neighbors. Yeast Paralogous Pathways
MAPK kinase signaling cascades
were very common.
All 120 yeast kinases share ~ 30%
sequence similarity.
Hence: p(v) similar for all pairs v
of two kinases.
Probably need to improve kinase
matching procedure. (Kelley et al, 2003) Searching
Can use local alignment to search: align a small query network to
the large network. (Kelly et al, 2003) Searching
Can use local alignment to search: align a small query network to
the large network. (Kelly et al, 2003) PathBLAST Summary • Local graph alignment • Takes into account sequence similarity & topological
patterns • Allows gaps and mismatches of length 1. • Scoring function ~ probability of the path existing. • Algorithm: fast, reasonable, but deﬁnitely a heuristic. • Searching & local alignment are very related. ...
View
Full Document
 Fall '07
 staff
 DNA, edges, local alignment, Yeast Paralogous Pathways

Click to edit the document details