{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

09_HuffmanCoding

09_HuffmanCoding - Wednesday Dr Daniel Hughes...

This preview shows pages 1–13. Sign up to view the full content.

CSC 30155 Wednesday 13/10/10 Dr. Daniel Hughes

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Structure for Today l Review of Lossless Compression 15 mins l The Huffman Coding Algorithm 45 mins l Exam Questions 30 mins
CSC 30155 Lossless Compression Dr. Daniel Hughes

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Supporting Reading l Optional reading: l Kleinberg et al., Algorithm Design , Pearson Education, Chapter 4: Huffman Codes and Data Compression (4.8). l Cormen et al., Introduction to Algorithms , MIT Press, 2001, Chapter 16: Huffman Codes (16.3).
Encoding Symbols Using Bits l Computers operate on sequences of bits, or to put it another way they have an alphabet with two potential characters. l Humans use richer alphabets: l The English alphabet has 26 characters plus various special symbols. The simplified Chinese alphabet has 3000+ characters.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Fixed Length Encoding l Fixed length encoding assigns each character a unique number encoded in bits. l Each character will thus be encoded as a fixed length bit sequence l Decoding is easy, read five bits and match this with the appropriate character.
ASCII Character Encoding It is clear that no attention is paid to the relative frequency of characters.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Characteristics of Human Alphabets l Our alphabets may have a lot of characters, but we do not use them all equally. l Some English characters such as E are used much more frequently than others such as Z . l Lets take a look at this
Frequency of English Characters

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Morse Code l Frequent characters are encoded using smaller codes. l But Morse Code is ambiguous without delimiters: l F could also be (E + A + E). l D could also be (T + I). l To address this, Morse uses pauses between symbols, but this requires a 3 character alphabet, while we only have a 2 character alphabet in binary.
Exploiting Character Frequencies l We can exploit the different typical frequencies of characters by assigning frequent characters a smaller code . l However, this is a problem during decoding because there is no obvious way to separate encoded variable length characters : l e.g. is text “10101000” the characters ‘101’ and ‘01000’ or ‘10101’ ‘000’?

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Prefix Codes (Encoding) l A prefix code is a code in which no codeword is a prefix of some other codeword. symbol
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

Page1 / 44

09_HuffmanCoding - Wednesday Dr Daniel Hughes...

This preview shows document pages 1 - 13. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online