spec8 - CSE 143, Winter 2010 Programming Assignment #8:...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
1 of 5 CSE 143, Winter 2010 Programming Assignment #8: Huffman Coding (20 points) Due Thursday, March 11, 2010, 11:30 PM No submissions for this assignment will be accepted after Sunday, March 14, at 11:30pm. This program focuses on binary trees, priority queues, and input/output. Turn in files named HuffmanTree.java , HuffmanNode.java , secretmessage.huf , and secretmessage.huf.counts from the Homework section of the web site. You will need support files HuffMain.java , Bit*.java , and input files from the course web page. Huffman Coding: Huffman coding is an algorithm devised by David A. Huffman of MIT in 1952 for compressing text data to make a file occupy a smaller number of bytes. This relatively simple compression algorithm is powerful enough that variations of it are still used today in computer networks, fax machines, modems, HDTV, and other areas. Normally text data is stored in a standard format of 8 bits per character, commonly using an encoding called ASCII that maps every character to a binary integer value from 0-255. The idea of Huffman coding is to abandon the rigid 8-bits-per- character requirement and use different-length binary encodings for different characters. The advantage of doing this is that if a character occurs frequently in the file, such as the letter 'e' , it could be given a shorter encoding (fewer bits), making the file smaller. The tradeoff is that some characters may need to use encodings that are longer than 8 bits, but this is reserved for characters that occur infrequently, so the extra cost is worth it. The table below compares ASCII values of various characters to possible Huffman encodings for the text of Shakespeare's Hamlet . Frequent characters such as space and 'e' have short encodings, while rarer ones like 'z' have longer ones. Character ASCII value ASCII (binary) Huffman (binary) ' ' 32 00100000 10 'a' 97 01100001 0001 'b' 98 01100010 0111010 'c' 99 01100011 001100 'e' 101 01100101 1100 'z' 122 01111010 00100011010 The steps involved in Huffman coding a given text source file into a destination compressed file are the following: 1. Examine the source file's contents and count the number of occurrences of each character. 2. Place each character and its frequency (count of occurrences) into a sorted "priority" queue. 3. Convert the contents of this priority queue into a binary tree with a particular structure. 4. Traverse the tree to discover the binary encodings of each character. 5. Re-examine the source file's contents, and for each character, output the encoded binary version of that character to the destination file. Encoding a File: For example, suppose we have a file named example.txt with the following contents: ab ab cab In the original file, this text occupies 10 bytes (80 bits) of data. The 10th is a special "end-of-file" (EOF) byte. byte
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 5

spec8 - CSE 143, Winter 2010 Programming Assignment #8:...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online