This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: 6.896 Sublinear Time Algorithms March 8, 2007 Lecture 10 Lecturer: Ronitt Rubinfeld Scribe: Khanh Do Ba 1 R e c a p 1.1 Algorithm and notation LZ77( w ) t ← 1 1 repeat 2 find longest substring w t ...w t + ` − 1 s.t. ∃ index p < t with w p ...w p + ` − 1 = w t ...w t + ` − 1 3 if none then 4 next symbol = w t 5 t ← t + 1 6 else 7 next symbol = ( p, ` ) 8 t ← t + ` 9 until t > n (=  w  ) 10 Recall the following notation. n ` ( w ) = # compressed segments of length ` in w , not counting alphabet symbols and last compressed segment C LZ ( w ) = # symbols in compressed string (# pairs + # alphabet symbols) d ` ( w ) = # distinct substrings of length ` 1.2 Main theorem Recall the main theorem, which forms the basis of our LZApprox algorithm. Theorem 1 For any integer ` > 1 , let m = m ( ` ) = max ` ∈ [ ` ] d ` ( w ) ` . Then m ≤ C LZ ( w ) ≤ 4 m log ` + n ` . Last lecture, we already proved the first inequality. We also claimed that (*) For every ` ∈ { 1 , . . . , ` / 2 } (wlog, assume ` is even), ` X k =1 n k ≤ 2( m + 1) ` X k =1 1 k . (**) For every ` ≥ 1, n X k = ` +1 n k ≤ n ` + 1 . and proved (**), as well as that (*) and (**) together imply the second inequality. Today, it remains to prove (*). 1 2 Proof of (*) and Theorem 1 Proof We reexpress the summation as ` X k =1 kn k = A + B, where A = # locations within { `, . . . , n − ` } that lie in compressed substrings of length ≤ ` , and B = # locations within [ n ] \{ `, . . . , n − ` } that lie in compressed substrings of length ≤ ` . Clearly, B ≤ 2 ` . Let C = # distinct length2 ` substrings, so that C ≤ 2 m` by the definition of...
View
Full Document
 Fall '04
 RonittRubinfeld
 Algorithms, longest substring wt, maxij Dij, small connected components, connected nvertex graph, small connected component

Click to edit the document details