09_HuffmanCoding - Wednesday 13/10/10 Dr. Daniel Hughes

Info iconThis preview shows pages 1–13. Sign up to view the full content.

View Full Document Right Arrow Icon
CSC 30155 Wednesday 13/10/10 Dr. Daniel Hughes daniel.hughes@xjtlu.edu.cn
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Structure for Today l Review of Lossless Compression 15 mins l The Huffman Coding Algorithm 45 mins l Exam Questions 30 mins
Background image of page 2
CSC 30155 Lossless Compression Dr. Daniel Hughes daniel.hughes@xjtlu.edu.cn
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Supporting Reading l Optional reading: l Kleinberg et al., Algorithm Design , Pearson Education, Chapter 4: Huffman Codes and Data Compression (4.8). l Cormen et al., Introduction to Algorithms , MIT Press, 2001, Chapter 16: Huffman Codes (16.3).
Background image of page 4
Encoding Symbols Using Bits l Computers operate on sequences of bits, or to put it another way they have an alphabet with two potential characters. l Humans use richer alphabets: l The English alphabet has 26 characters plus various special symbols. The simplified Chinese alphabet has 3000+ characters.
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Fixed Length Encoding l Fixed length encoding assigns each character a unique number encoded in bits. l Each character will thus be encoded as a fixed length bit sequence l Decoding is easy, read five bits and match this with the appropriate character.
Background image of page 6
ASCII Character Encoding It is clear that no attention is paid to the relative frequency of characters.
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Characteristics of Human Alphabets l Our alphabets may have a lot of characters, but we do not use them all equally. l Some English characters such as E are used much more frequently than others such as Z . l Lets take a look at this
Background image of page 8
Frequency of English Characters
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Morse Code l Frequent characters are encoded using smaller codes. l But Morse Code is ambiguous without delimiters: l F could also be (E + A + E). l D could also be (T + I). l To address this, Morse uses pauses between symbols, but this requires a 3 character alphabet, while we only have a 2 character alphabet in binary.
Background image of page 10
Exploiting Character Frequencies l We can exploit the different typical frequencies of characters by assigning frequent characters a smaller code . l However, this is a problem during decoding because there is no obvious way to separate encoded variable length characters : l e.g. is text “10101000” the characters ‘101’ and ‘01000’ or ‘10101’ ‘000’?
Background image of page 11

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Prefix Codes (Encoding) l A prefix code is a code in which no codeword is a prefix of some other codeword. symbol
Background image of page 12
Image of page 13
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 05/22/2011 for the course CSC 30155 taught by Professor Garyli during the Spring '11 term at University of Liverpool.

Page1 / 44

09_HuffmanCoding - Wednesday 13/10/10 Dr. Daniel Hughes

This preview shows document pages 1 - 13. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online