100%(1)1 out of 1 people found this document helpful
This preview shows page 1 - 3 out of 16 pages.
CS 170AlgorithmsFall 2014David WagnerSoln 9CS 170, Fall 2014, Soln 91
1Greedy mergingMain idea.One way to solve this problem is to use a greedy approach. We repeatedly apply the followingprocess: finds,t∈Sthat have the largest overlap, merges,tto get a new stringu, removes,tfromS, andaddutoS. After at mostn-1 iterations, the setSwill contain only a single string; output that string.Here’s a refinement that’s helpful: before beginning the above process, we preprocess the short reads toremove any short read that is a substring of any other short read.We get the following pseudocode:1.For eachs,t∈Swiths6=t:2.Ifsis a substring oft, removesfromS.3.While|S|>1:4.Finds,t∈Sthat maximize Overlap(s,t), subject tos6=t.5.Merges,tto get a new stringu.6.AddutoS. Removes,tfromS.7.Output the one string inS.Efficient implementation.The challenging part is to provide an efficient implementation for this algo-rithm. If we re-calculate the overlap between all pairs in each iteration of the while loop, the running timewill beO(n3k2), which is inefficient. We can optimize this by avoiding re-doing work unnecessarily.One good improvement is to avoid re-computing the overlap between two strings that haven’t changed. Wecan initially compute the overlaps between all pairs of short reads in a pre-processing step. Then, eachtime we merge two strings, says,t∈S, it is possible to compute the overlap between the new stringuandevery otherx∈Sefficiently. In particular, Overlap(u,x) =Overlap(t,x)and Overlap(x,u) =Overlap(x,s),assuming no element ofSis a substring of any other. See § 3 for the details. There is no need to recomputethe overlap betweenx,yfor any other pair of stringsx,ythat weren’t involved in the merger.This suggests storing the overlap information in some data structure that makes it easy to find the pairwith largest overlap, and updating the data structure each time we merge two strings. One reasonable datastructure is a priority queue that stores all pairs of elements ofS, prioritized by their overlap. The priorityqueue containsO(n2)elements, so if we use a binary heap, each operation on the priority queue takesO(lg(n2)) =O(lgn)time.We can reduce the space consumption with a little more cleverness: for each strings∈S, we remember thestringt∈
You've reached the end of your free preview.
Want to read all 16 pages?
Algorithms, Greedy algorithm, Analysis of algorithms, soln, short reads