daptive Huffman and arithmetic methods are universal in the sense
that the encoder can adapt to the statistics of the source. But,
adaptation is computationally expensive, particular when kth order
Markov approximation is needed for some k > 2. As we know, the kth order
approximation approaches the source entropy rate when k
→
∞
. For example, for
English text, to do second order Markov approximation, we will need to estimate
the probability of all possible triplets (about 35
3
=42,875, 35 = {az,(,)
....
etc} )
triplets, which is impractical. Arithmetic codes are inherently adaptive, but it is
slow and works well for binary file.
The dictionarybased methods such as the LZfamily of encoders do not use any
statistical model, nor do they use variable size prefix code. Yet, they are universal,
adaptive, reasonably fast and use modest amount of storage and computational
resources. Variants of LZ algorithm form the basis of Unix compress, gzip, pkzip,
stacker and for modems operating at more than 14.4 KBPS.
Dictionary Models
The dictionary model allows several consecutive symbols,
called
phrases
stored in a dictionary, to be encoded as an
address in the dictionary. Usually,
an adaptive model is
used where the dictionary is encoded using previously
encoded
text. As the text is compressed, previously
encountered substrings are added to the dictionary. Almost
all adaptive dictionary models originated from the original
papers by Ziv and Lempel which led to several families of
LZ coding techniques.
Here we will present a couple of those techniques.
A
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
LZ77 algorithms
The prior text constitutes the codebook or the dictionary. Rather
than keeping an explicit dictionary, the decoded text up to current
time can be used as a dictionary. The figure below shows the
characters
abaabab
just decoded and the decoder is looking at
the triplet (5,3,b) 
number 5 denotes how far back to look into the
already decoded text stream, number 3 gives the length of the phrase
matched beginning the first character of yet unencoded part of the text
and the character ‘b’ gives the next character from input.
This yields ‘
aabb
’ to be the next phrase added.
a
b
a
a
b
a
b
(0,0,a
)
(0,0,b)
(2,1,a)
(3,2,b)
(5,3,b) (10,1,a)
Encoded Output
Decoded Output
LZ77 Algorithm with Finite
Buffer
s
Two buffers of finite size W, called the search(left) and the lookahead(right )buffers
are connected as a shift register. The text to be decoded is shifted in from right to left,
initially placing
W symbols
in the right buffer and filling in the left buffer with the
first character of the text. The information transmitted is (p,L,S) and the buffer is
shifted L+1 places left. Actually, rather than transmitting p, the offset backward in
the search buffer is transmitted. The process is repeated until text is fully encoded.
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '11
 Mukherjee
 sliding window, search buffer, lookahead buffer

Click to edit the document details