CS301-Lec26 handout - CS301 Data Structures Lecture No. 26...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
CS301 – Data Structures Lecture No. 26 ___________________________________________________________________ Data Structures Lecture No. 26 Reading Material Data Structures and Algorithm Analysis in C++ Chapter. 4 4.4.2 Summary Hoffman Encoding Mathematical Properties of Binary Trees Huffman Encoding We will continue our discussion on the Huffman encoding in this lecture. In the previous lecture, we talked about the situation where the data structure binary tree was built. Huffman encoding is used in data compression. Compression technique is employed while transferring the data. Suppose there is a word-document (text file) that we want to send on the network. If the file is, say, of one MB, there will be a lot of time required to send this file. However, in case of reduction of size by half through compression, the network transmission time also get halved. After this example, it will be quite easy to understand the Hoffman encoding to compress a text file. We know that Huffman code is a method for the compression of standard text documents. It makes use of a binary tree to develop codes of varying lengths for the letters used in the original message. Huffman code is also a part of the JPEG image compression scheme. David Huffman introduced this algorithm in the year 1952 as part of a course assignment at MIT. In the previous lecture, we had started discussing a simple example to understand Huffman encoding. In that example, we were encoding the 32-character phrase: " traversing threaded binary trees ". If this phrase were sent as a message in a network using standard 8-bit ASCII codes, we would have to send 8*32= 256 bits. However, the Huffman algorithm can help cut down the size of the message to 116 bits. In the Huffman encoding, following steps are involved: 1. List all the letters used, including the "space" character, along with the frequency with which they occur in the message. 2. Consider each of these (character, frequency) pairs as nodes; these are actually leaf nodes, as we will see later. 3. Pick two nodes with the lowest frequency. If there is a tie, pick randomly amongst those with equal frequencies 4. Make a new node out of these two and develop two nodes as its children. 5. This new node is assigned the sum of the frequencies of its children. 6. Continue the process of combining the two nodes of lowest frequency till the time, only one node, the root, is left.
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
CS301 – Data Structures Lecture No. 26 ___________________________________________________________________ In the first step, we make a list of all letters (characters) including space and end line character and find out the number of occurrences of each letter/character. For example we ascertain how many times the letter ‘a’ is found in the file and how many times ‘b’ occurs and so on. Thus we find the number of occurrences (i.e. frequency) of each letter in the text file. In the step 2, we consider the pair (i.e. letter and its frequency) as a node. We will
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 11

CS301-Lec26 handout - CS301 Data Structures Lecture No. 26...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online