Module5_3 - Module 5, Lecture 3 Data Compression:...

Info iconThis preview shows pages 1–5. Sign up to view the full content.

View Full Document Right Arrow Icon
Module 5, Lecture 3 Data Compression: Dictionary Methods G.L. Heileman Module 5, Lecture 3
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Dictionary Methods The compression methods we have considered so far make use of a probability model associated with the source in order to compress the data produced by the source. In this lecture we will consider dictionary techniques that do not make explicit use of a probabilistic model. Rather, these methods exploit redundancies in the data produced by a source by directly observing the data itself, and then creating a dictionary of frequently occurring patterns. If one of these patterns is later encountered, it can be encoded by referencing an index in the dictionary. If a pattern that is not in the dictionary is encountered, it can be encoded using some other less efficient method. Thus, this is very similar to the AEP compression technique, where we divide the input into two classes: (1) frequently occurring patterns, and (2) infrequently occurring patterns. Recall one of the limitations of AEP compression — A ( n ) ± must be completely determined a priori . This problem is circumvented by dictionary methods. G.L. Heileman Module 5, Lecture 3
Background image of page 2
Dictionary Methods Ex: Consider a 32-symbol alphabet, and assume we encode blocks of 4 symbols. If each of the 32 4 = 2 20 = 1 , 048 , 576 4-symbol blocks were equally likely, then 20 bits would be required to encode each block, i.e., 5 bits/symbol. Consider putting the 256 most frequently occurring 4-symbol patterns in a dictionary. 1 bit must be used to distinguish between the sets (just like in AEP compression). Let p denote the probability that a 4-symbol block in the data occurs in the dictionary. Then the expected codeword length is given by: (9 p + 21(1 - p )) / 4 bits/symbol. For p = 0 . 08 ¯ 3, the previous equation evaluates to 5 bits. This is the “breakeven point” as compared to the uniform distribution considered above. G.L. Heileman Module 5, Lecture 3
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Dictionary Methods Obviously, we would like to make the dictionary fairly small, and the probability of encountering a pattern in the dictionary very high. This latter points requires that we have good knowledge about the structure of the date being produced by the source. The are two basic approaches to dictionary construction: 1 Static dictionaries – Involves constructing the dictionary before considering the actual data. This will be most successful if we have considerable a priori knowledge about the source.
Background image of page 4
Image of page 5
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 05/06/2010 for the course ECE 549 taught by Professor G.l.heileman during the Spring '10 term at University of New Brunswick.

Page1 / 16

Module5_3 - Module 5, Lecture 3 Data Compression:...

This preview shows document pages 1 - 5. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online