# Module5_2 - Module 5 Lecture 2 Data Compression Huffman and...

This preview shows pages 1–8. Sign up to view the full content.

Module 5, Lecture 2 Data Compression: Huﬀman and Arithmetic Coding G.L. Heileman Module 5, Lecture 2

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Huﬀman Coding For a given RV X , assuming knowledge of only p 1 , . . . , p |X| , Huﬀman coding will produce an optimal preﬁx code, C Huf in terms of expected codeword length, L ( C Huf ). That is, no other coding technique, using only p 1 , . . . , p , can produce a better preﬁx code C , such that L ( C ) < L ( C Huf ). Huﬀman derived his algorithm based on two observations regarding optimal preﬁx codes: (1) In an optimal code, symbols with higher probability will have shorter codewords than symbols with lower probability (obvious). (2) In an optimal code, the two symbols with lowest probability will have the same length (not obvious). G.L. Heileman Module 5, Lecture 2
Huﬀman Coding To see why observation (2) is true: Suppose we have an optimal code C in which the two least probable symbols do not have the same length. Since these are the two least probable symbols, by observation (1), no other codeword can be longer these codewords. Assume the longer codeword is k bits longer than the shorter codeword. By simply dropping the last k bits of the longer codeword, we end up with a new preﬁx code (the new codeword will not become the preﬁx of any other codeword) in which the two symbols with lowest probability will have the same length. The new preﬁx code will have a smaller expected codeword length than C . This contradicts our assumption that C is optimal, and establishes the truth of observation (2). G.L. Heileman Module 5, Lecture 2

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Huﬀman Coding Lemma Given RV X p ( x ) with alphabet X , let x , y ∈ X be two characters with the smallest probabilities. Then there exists an optimal preﬁx code C for X in which C ( x ) and C ( y ) are siblings with maximal depth in the code tree. If x and y are siblings in a code tree, since this is a preﬁx code, they will diﬀer only in their LSBs. Observations (1) and (2), along with the truth of this Lemma, are the basis for the Huﬀman coding procedure. G.L. Heileman Module 5, Lecture 2
Huﬀman Coding Proof of Lemma: Let T be a tree representing an optimal preﬁx code, where C ( a ) and C ( b ), a , b ∈ X are the deepest sibling leaves in T . Assume p ( a ) p ( b ), p ( x ) p ( y ), and C ( x ) and C ( y ) do not have maximal depth in T : T x a b y T T' x a b y x a b y Then exchange C ( x ) and C ( a ) to produce a new tree T 0 . Because T is optimal L ( T ) = i p i l T i = L ? , and L ? - L ( T 0 ) = X i p i l T i - X i p i l T 0 i = p ( x ) l T ( x ) + p ( a ) l T ( a ) - p ( x ) l T 0 ( x ) - p ( a ) l T 0 ( a ) G.L. Heileman Module 5, Lecture 2

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Huﬀman Coding Proof of Lemma (con’t): Thus, L ( T 0 ) L ? , with equality when p ( a ) = p ( x ) or l T ( a ) = l T ( x ). Since L ? is optimal, it must be the case that L ( T 0 ) = L ? , and therefore L ( T 0 ) is an optimal code. Similarly, by exchanging the codewords for y and b we cannot increase the expected length of the code in a new tree T 00 . Thus T 00 is an optimal tree in which C ( x ) and C ( y ) appear as sibling leaves having maximal depth. G.L. Heileman Module 5, Lecture 2
Huﬀman Coding Given RV X p ( x ).

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### Page1 / 30

Module5_2 - Module 5 Lecture 2 Data Compression Huffman and...

This preview shows document pages 1 - 8. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online