22-huffman - CSE 143 Lecture 22 Huffman slides created by...

Info iconThis preview shows pages 1–6. Sign up to view the full content.

View Full Document Right Arrow Icon
1 CSE 143 Lecture 22 Huffman slides created by Ethan Apter http://www.cs.washington.edu/143/ 2 Huffman Tree • For your next assignment, you’ll create a “Huffman tree” • Huffman trees are used for file compression file compression: making files smaller • for example, WinZip makes zip files • Huffman trees allow us to implement a relatively simple form of file compression – Huffman trees are essentially just binary trees – it’s not as good as WinZip, but it’s a whole lot easier! • Specifically, we’re going to compress text files
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
2 3 ASCII • Characters in a text file are all encoded by bits bit : the smallest piece of information on a computer (“zero” or “one”) – your computer automatically converts the bits into the characters you expect to see • Normally, all characters are encoded by the same number of bits – this makes it easy to find the boundaries between characters • One character encoding is the American Standard Code for Information Interchange – better known as ASCII 4 ASCII Table
Background image of page 2
3 5 ASCII • The original version of ASCII had 128 characters – this fit perfectly into 7 bits (2 7 = 128) • But the standard data size is 8 bits (i.e. a byte) – original ASCII used the 8 th bit as a “parity” (odd or even) bit – ...which didn’t work out very well • Eventually, 128 characters wasn’t enough – Extended ASCII has 256 characters • this fits perfectly into 8 bits (2 8 = 256) • can represent 00000000 to 11111111 (binary) • can represent 0 to 255 (decimal) 6 Extended ASCII Table
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
4 7 Text Files • In simple text files, each byte (8 bits) represents a single character • If we want to compress the file, we have to do better – otherwise, we won’t improve the old file • What if different characters are represented by different numbers of bits? – characters that appear frequently will require fewer bits – characters that appear infrequently will require more bits • The Huffman algorithm finds an ideal variable-length way of encoding the characters for a specific file 8 Huffman Algorithm • The Huffman algorithm creates a Huffman tree • This tree represents the variable-length character encoding • In a Huffman tree, the left and right children each represent a single bit of information – going left is a bit of value zero – going right is a bit of value one • But how do we create the Huffman tree?
Background image of page 4
5 9 Creating a Huffman Tree • First, we have to know how frequently each character occurs in the file • Then, we construct a leaf node for every character that occurs at least once (i.e. has non-zero frequency) – we don’t care about letters that never occur, because we won’t
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 6
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 14

22-huffman - CSE 143 Lecture 22 Huffman slides created by...

This preview shows document pages 1 - 6. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online