daptive Huffman and arithmetic methods are universal in the sense
that the encoder can adapt to the statistics of the source. But,
adaptation is computationally expensive, particular when k-th order
Markov approximation is needed for some k > 2. As we know, the kth order
approximation approaches the source entropy rate when k
→
∞
. For example, for
English text, to do second order Markov approximation, we will need to estimate
the probability of all possible triplets (about 35
3
=42,875, 35 = {a-z,(,).
...etc} )
triplets, which is impractical. Arithmetic codes are inherently adaptive, but it is
slow and works well for binary file.
The dictionary-based methods such as the LZ-family of encoders do not use any
statistical model, nor do they use variable size prefix code. Yet, they are universal,
adaptive, reasonably fast and use modest amount of storage and computational
resources. Variants of LZ algorithm form the basis of Unix compress, gzip, pkzip,
stacker and for modems operating at more than 14.4 KBPS.
Dictionary Models
The dictionary model allows several consecutive symbols,
called
phrases
stored in a dictionary, to be encoded as an
address in the dictionary. Usually,
an adaptive model is
used where the dictionary is encoded using previously
encoded
text. As the text is compressed, previously
encountered substrings are added to the dictionary. Almost
all adaptive dictionary models originated from the original
papers by Ziv and Lempel which led to several families of
LZ coding techniques.
Here we will present a couple of those techniques.
A