sol9 - CS 170 Algorithms Fall 2014 David Wagner Soln 9 There are a variety of approaches you could use to solve this problem Here are two Greedy

sol9 - CS 170 Algorithms Fall 2014 David Wagner Soln 9...

This preview shows page 1 - 3 out of 16 pages.

CS 170 Algorithms Fall 2014 David Wagner Soln 9 CS 170, Fall 2014, Soln 9 1
Image of page 1
1Greedy mergingMain idea.One way to solve this problem is to use a greedy approach. We repeatedly apply the followingprocess: finds,tSthat have the largest overlap, merges,tto get a new stringu, removes,tfromS, andaddutoS. After at mostn-1 iterations, the setSwill contain only a single string; output that string.Here’s a refinement that’s helpful: before beginning the above process, we preprocess the short reads toremove any short read that is a substring of any other short read.We get the following pseudocode:1.For eachs,tSwiths6=t:2.Ifsis a substring oft, removesfromS.3.While|S|>1:4.Finds,tSthat maximize Overlap(s,t), subject tos6=t.5.Merges,tto get a new stringu.6.AddutoS. Removes,tfromS.7.Output the one string inS.Efficient implementation.The challenging part is to provide an efficient implementation for this algo-rithm. If we re-calculate the overlap between all pairs in each iteration of the while loop, the running timewill beO(n3k2), which is inefficient. We can optimize this by avoiding re-doing work unnecessarily.One good improvement is to avoid re-computing the overlap between two strings that haven’t changed. Wecan initially compute the overlaps between all pairs of short reads in a pre-processing step. Then, eachtime we merge two strings, says,tS, it is possible to compute the overlap between the new stringuandevery otherxSefficiently. In particular, Overlap(u,x) =Overlap(t,x)and Overlap(x,u) =Overlap(x,s),assuming no element ofSis a substring of any other. See § 3 for the details. There is no need to recomputethe overlap betweenx,yfor any other pair of stringsx,ythat weren’t involved in the merger.This suggests storing the overlap information in some data structure that makes it easy to find the pairwith largest overlap, and updating the data structure each time we merge two strings. One reasonable datastructure is a priority queue that stores all pairs of elements ofS, prioritized by their overlap. The priorityqueue containsO(n2)elements, so if we use a binary heap, each operation on the priority queue takesO(lg(n2)) =O(lgn)time.We can reduce the space consumption with a little more cleverness: for each stringsS, we remember thestringt
Image of page 2
Image of page 3

You've reached the end of your free preview.

Want to read all 16 pages?

  • Fall '02
  • HENZINGER
  • Algorithms, Greedy algorithm, Analysis of algorithms, soln, short reads

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture