This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: 6.896 Sublinear Time Algorithms March 8, 2007 Lecture 10 Lecturer: Ronitt Rubinfeld Scribe: Khanh Do Ba 1 R e c a p 1.1 Algorithm and notation LZ77( w ) t ← 1 1 repeat 2 find longest substring w t ...w t + ` − 1 s.t. ∃ index p < t with w p ...w p + ` − 1 = w t ...w t + ` − 1 3 if none then 4 next symbol = w t 5 t ← t + 1 6 else 7 next symbol = ( p, ` ) 8 t ← t + ` 9 until t > n (=  w  ) 10 Recall the following notation. n ` ( w ) = # compressed segments of length ` in w , not counting alphabet symbols and last compressed segment C LZ ( w ) = # symbols in compressed string (# pairs + # alphabet symbols) d ` ( w ) = # distinct substrings of length ` 1.2 Main theorem Recall the main theorem, which forms the basis of our LZApprox algorithm. Theorem 1 For any integer ` > 1 , let m = m ( ` ) = max ` ∈ [ ` ] d ` ( w ) ` . Then m ≤ C LZ ( w ) ≤ 4 m log ` + n ` . Last lecture, we already proved the first inequality. We also claimed that (*) For every ` ∈ { 1 , . . . , ` / 2 } (wlog, assume ` is even), ` X k =1 n k ≤ 2( m + 1) ` X k =1 1 k . (**) For every ` ≥ 1, n X k = ` +1 n k ≤ n ` + 1 . and proved (**), as well as that (*) and (**) together imply the second inequality. Today, it remains to prove (*). 1 2 Proof of (*) and Theorem 1 Proof We reexpress the summation as ` X k =1 kn k = A + B, where A = # locations within { `, . . . , n − ` } that lie in compressed substrings of length ≤ ` , and B = # locations within [ n ] \{ `, . . . , n − ` } that lie in compressed substrings of length ≤ ` . Clearly, B ≤ 2 ` . Let C = # distinct length2 ` substrings, so that C ≤ 2 m` by the definition of...
View
Full
Document
This note was uploaded on 04/02/2010 for the course CS 6.896 taught by Professor Ronittrubinfeld during the Fall '04 term at MIT.
 Fall '04
 RonittRubinfeld
 Algorithms

Click to edit the document details