Space-Efficient Alignment CMSC 858S

Space Usage O(n 2 ) is pretty low space usage, but for a 10 Gb genome, you’d need a huge amount of memory. Can we use less? Hirschberg’s algorithm
Remember the meaning of a cell 0 1 2 3 4 5 6 7 8 9 10 11 12 9 8 7 6 5 4 3 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 x y C A G T T G C A A A A G G T A T G A A T C Best alignment between prefix x[1..5] and prefix y[1..5]

Linear Space for Alignment Scores If you are only interested in the cost or score of an alignment, you need to use only O(n) space. How?
Linear Space for Alignment Scores If you are only interested in the cost or score of an alignment, you need to use only O(n) space. How? 0 1 2 3 4 5 6 7 8 9 10 11 12 9 8 7 6 5 4 3 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 When filling in an entry (gray box) we only look at the current and previous rows. Only need to keep those two rows in memory.

We can do more... Given 2 strings X and Y, we can, in linear space and O(nm) time, compute the cost of aligning... every prefix of X with Y X with every prefix of Y a particular prefix of X with every prefix of Y a particular suffix of X with every suffix of Y How can we do that?
Best Alignment Between Prefix of X and Y 0 1 2 3 4 5 6 7 8 9 10 11 12 9 8 7 6 5 4 3 2 1 0 9g 8g 7g 6g 5g 4g 3g 2g 1g 0 1g 2g 3g 4g 5g 6g 7g 8g 9g 10g 11g 12g x y C A G T T G C A A A A G G T A T G A A T C Score of an optimal alignment between Y and a prefix of X

Fill in the matrix by columns... 0 1 2 3 4 5 6 7 8 9 10 11 12 9 8 7 6 5 4 3 2 1 0 9g 8g 7g 6g 5g 4g 3g 2g 1g 0 1g 2g 3g 4g 5g 6g 7g 8g 9g 10g 11g 12g x y C A G T T G C A A A A G G T A T G A A T C What is this column?
Fill in the matrix by columns... 0 1 2 3 4 5 6 7 8 9 10 11 12 9 8 7 6 5 4 3 2 1 0 9g 8g 7g 6g 5g 4g 3g 2g 1g 0 1g 2g 3g 4g 5g 6g 7g 8g 9g 10g 11g 12g x y C A G T T G C A A A A G G T A T G A A T C What is this column?

