Unformatted text preview: ECS 124: Theory and practice of bioinformatics
Lecture 4: Dynamic Programming and Lecture 4: Dynamic Programming and Local sequence alignment Instructor: Ilias Instructor: Ilias Tagkopoulos [email protected]
Office: Kemper 3063 and GBSF 5313 Offi K 3063 d GBSF 5313 1 UC Davis 4/13/2010 Last time: # of alignments, complexity and global alignment 0 # of alignments 10 20 100 8,097,452 ~2.6054e+014 ~2.0537e+075 2 UC Davis 4/13/2010 Alignment Graph Ali G h We can calculate the maximum score (min distance) from (0,0) to any distance) from (0 0) to any node (i,j). What about this node? Do we have to calculate the total have to calculate the total path ? Can you find a way to speedup the computation ? Computing distance from Computing distance from neighbor nodes only How many operations? Complexity? A case of dynamic programming
4/13/2010 3 UC Davis Dynamic programming D i i Dynamic programming is a case of "divide and Dynamic programming is a case of divide and conquer" where you can break down the problem to smaller subproblems, and compute them efficiently. Two conditions for dynamic programming: Existence of overlapping subproblems. Existence of optimal sub structure Existence of optimal substructure. Overlapping subproblems: subproblem space is small, i.e. naively we would have to do the same , y computation multiple times Optimal substructure: optimal global solution is the combination of optimal local (i.e. subproblem) bi ti f ti l l l (i b bl ) solution
4 UC Davis 4/13/2010 Dynamic programming D i i Two categories: Two categories: Topdown approach: Saving previous results on run time (what is called memoization ). Similar to runtime (what is called "memoization"). Similar to mapping, caching, lookup tables etc. Bottomup approach: Reusing results from p pp g previous steps (subproblems) in a recursive manner. E.g. recursion to find the factorial or Fibonacci series. Fib i i 5 UC Davis 4/13/2010 Example: Fibonacci number E l Fib i b Abundant in nature Abundant in nature F(n) = F(n1) + F(n2) F(0) 0 F(1) 1 F(0) = 0, F(1) = 1 F(n) related to the golden ratio 6 UC Davis 4/13/2010 Example: Fibonacci number E l Fib i b 7 UC Davis 4/13/2010 Example: Fibonacci number E l Fib i b Fibonacci series is recursive Fibonacci series is recursive. F(n) = F(n1) + F(n2) P Pseudocode to calculate it: d d t l l t it
function F(n): If n = 0 return 0; If n = 1 return 1; return fib(n  1) + fib(n  2) Complexity O(?) 8 UC Davis 4/13/2010 Example: Fibonacci number E l Fib i b Topdown approach: what if Top down approach: what if we are to save previously calculated values?
array map; map[0]:=0; map[1]:=0; p[ ] ; function F(n): if map[n] doesn't exists map[n] = F(n1) + F(n2); return map[n]; Complexity from exponential O(2 ) to linear exponential O(2n) to linear O(n). What about space? Does order plays a role ?E.g. p y g F(2), F(3) vs. F(3),F(2)?
9 UC Davis 4/13/2010 Example: Fibonacci number E l Fib i b Bottomup approach: Bottom up approach: Calculate smaller values first. Start from the END
function F(n): var preF :=0, curF :=1; If n = 0 return 0; If n = 1 return 1 1; repeat n1 times: var newF := preF +curF; preF = curF; curF = newF; return curF; Complexity from Complexity from 10 exponential O(2n) to linear O(n). What about space? O(1) in space
UC Davis 4/13/2010 Going back to the Alignment Graph
NeedlemanWunsch algorithm N dl W h l ith Step 1: Calculate the score for each node in the 1st row and 1st column. Step 2: Work iteratively, row by row or column by column and calculate the score of all nodes. o e of all ode Step 3: Traceback, starting from the end and follow from the end and follow the path to find the optimum alignment. p g
11 UC Davis So why you have to backtrack ??? yy Can't we just keep track of max value as we go forward? 4/13/2010 Summary: Global alignment and Dynamic Programming Dynamic programming: y a ic p og a i g: Breaking problems to subproblems and saving/reusing previous results. Two conditions for dynamic programming: Two conditions for dynamic programming: Existence of overlapping subproblems. Existence of optimal substructure. Two ways: topdown (saving previous results) bottomup (reusing results, start from the end) bottom up (re using results, start from the end) Order of computation matters Sequence alignment with dynamic programming: going from exponential to quadratic computational i f i l d i i l complexity
4/13/2010 12 UC Davis Global alignment: Needleman Wunsch algorithm A case of dynamic programming Uses a similarity matrix Sim(), e.g. Uses a similarity matrix Sim(), e.g.
A A G C T 10 4 5 5 1 G 4 5 2 2 0 C 5 2 8 2 T 1 0 2 2 8 Global alignment: NeedlemanWunsch Iterative procedure, assume d is space penalty
F(i,j) = max{F(i1,j) d, F(i,j1) d, F(i j 1) d F(i1,j1) + Sim(S1i,S2j)} 13 UC Davis 4/13/2010 What if you want to partially align the sequences? start sta t end d 14 UC Davis 4/13/2010 Local alignment: SmithWaterman What is local alignment? Given two string S1 and S2, local alignment is an alignment of a substring of S1 with a start substring of S2 Why it may be needed? Searching: looking for elements of small length (genes, promoters, binding pockets) in large ( bi di k )i l end sequences (genomes, proteins) Conservation: looking for the part of sequences that are conserved in multiple th t di lti l organisms during evolution Biological justification: transposons, genome rearrangements etc rearrangements etc 15 UC Davis 4/13/2010 Local alignment: SmithWaterman
F(i,j) = max{0, F(i 1,j) F(i1,j) d, F(i,j1) d, F(i1,j1) + Sim(S1i,S2j)} Where would you start? Highest scoring cell Go back to the first element with zero score start end 16 UC Davis 4/13/2010 Global vs. Local alignment Gl b l L l li
Global alignment o a a ig e g Local alignment Initialization: Iteration: F(0,j) = j*d , F(i,0)=i*d F(i,j) = max{F(i1,j) d, F(i,j1) d, F(i1,j1) + Sim(S1i,S2j)} Top left T l ft Bottom right F(0,j) = F(i,0)= 0
F(i,j) = max{0, F(i1,j) d, F(i,j1) d, F(i 1,j 1) + Sim(S1 S2 )} F(i1 j1) + Sim(S1i,S2j)} anywhere anywhere 4/13/2010 17 Starting Position: Ending Position: Global vs. Local alignment Gl b l L l li
Global alignment o a a ig e g Local alignment Computational Complexity: Time: Space: 18 O(m*n) O(m*n) ( ) O(m*n) O(m*n) ( )
4/13/2010 Note: Hirschberg's algorithm performs better in space O(max(m,n)) at a computational cost BLAST Basic local alignment search tool Basic local alignment search tool SF Altschul, W. Gish, W. Miller, EW Myers, DJ Lipman. J. Mol. Biol., 1990 ~25,000 citations Algorithm: Split query into overlapping words of length W Find neighborhood words for each word until threshold T Look into table where these neighbor words occur: seeds S E Extend seeds S until score drops off under X d d S il d ff d X Inexact matching
19 UC Davis 4/13/2010 End of Lecture 4 End of Lecture 4 20 UC Davis 4/13/2010 ...
View
Full
Document
This note was uploaded on 05/30/2010 for the course ECS ECS 124 taught by Professor Tagkopoulos during the Spring '10 term at UC Davis.
 Spring '10
 Tagkopoulos

Click to edit the document details