{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

Module5_3 - Module 5 Lecture 3 Data Compression Dictionary...

This preview shows pages 1–5. Sign up to view the full content.

Module 5, Lecture 3 Data Compression: Dictionary Methods G.L. Heileman Module 5, Lecture 3

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Dictionary Methods The compression methods we have considered so far make use of a probability model associated with the source in order to compress the data produced by the source. In this lecture we will consider dictionary techniques that do not make explicit use of a probabilistic model. Rather, these methods exploit redundancies in the data produced by a source by directly observing the data itself, and then creating a dictionary of frequently occurring patterns. If one of these patterns is later encountered, it can be encoded by referencing an index in the dictionary. If a pattern that is not in the dictionary is encountered, it can be encoded using some other less eﬃcient method. Thus, this is very similar to the AEP compression technique, where we divide the input into two classes: (1) frequently occurring patterns, and (2) infrequently occurring patterns. Recall one of the limitations of AEP compression — A ( n ) ± must be completely determined a priori . This problem is circumvented by dictionary methods. G.L. Heileman Module 5, Lecture 3
Dictionary Methods Ex: Consider a 32-symbol alphabet, and assume we encode blocks of 4 symbols. If each of the 32 4 = 2 20 = 1 , 048 , 576 4-symbol blocks were equally likely, then 20 bits would be required to encode each block, i.e., 5 bits/symbol. Consider putting the 256 most frequently occurring 4-symbol patterns in a dictionary. 1 bit must be used to distinguish between the sets (just like in AEP compression). Let p denote the probability that a 4-symbol block in the data occurs in the dictionary. Then the expected codeword length is given by: (9 p + 21(1 - p )) / 4 bits/symbol. For p = 0 . 08 ¯ 3, the previous equation evaluates to 5 bits. This is the “breakeven point” as compared to the uniform distribution considered above. G.L. Heileman Module 5, Lecture 3

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Dictionary Methods Obviously, we would like to make the dictionary fairly small, and the probability of encountering a pattern in the dictionary very high. This latter points requires that we have good knowledge about the structure of the date being produced by the source. The are two basic approaches to dictionary construction: 1 Static dictionaries – Involves constructing the dictionary before considering the actual data. This will be most successful if we have considerable a priori knowledge about the source.
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

Page1 / 16

Module5_3 - Module 5 Lecture 3 Data Compression Dictionary...

This preview shows document pages 1 - 5. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online