This preview shows pages 1–3. Sign up to view the full content.
CS301 – Data Structures
Lecture No. 26
___________________________________________________________________
Data Structures
Lecture No. 26
Reading Material
Data Structures and Algorithm Analysis in C++
Chapter. 4
4.4.2
Summary
•
Hoffman Encoding
•
Mathematical Properties of Binary Trees
Huffman Encoding
We will continue our discussion on the Huffman encoding in this lecture. In the
previous lecture, we talked about the situation where the data structure binary tree was
built. Huffman encoding is used in data compression. Compression technique is
employed while transferring the data. Suppose there is a worddocument (text file)
that we want to send on the network. If the file is, say, of one MB, there will be a lot
of time required to send this file. However, in case of reduction of size by half
through compression, the network transmission time also get halved. After this
example, it will be quite easy to understand the Hoffman encoding to compress a text
file.
We know that Huffman code is a method for the compression of standard text
documents. It makes use of a binary tree to develop codes of varying lengths for the
letters used in the original message. Huffman code is also a part of the JPEG image
compression scheme. David Huffman introduced this algorithm in the year 1952 as
part of a course assignment at MIT.
In the previous lecture, we had started discussing a simple example to understand
Huffman encoding. In that example, we were encoding the 32character phrase:
"
traversing threaded binary trees
". If this phrase were sent as a message in a network
using standard 8bit ASCII codes, we would have to send 8*32= 256 bits. However,
the Huffman algorithm can help cut down the size of the message to 116 bits.
In the Huffman encoding, following steps are involved:
1.
List all the letters used, including the "space" character, along with the
frequency with which they occur in the message.
2.
Consider each of these (character, frequency) pairs as nodes; these are actually
leaf nodes, as we will see later.
3.
Pick two nodes with the lowest frequency. If there is a tie, pick randomly
amongst those with equal frequencies
4.
Make a new node out of these two and develop two nodes as its children.
5.
This new node is assigned the sum of the frequencies of its children.
6.
Continue the process of combining the two nodes of lowest frequency till the
time, only one node, the root, is left.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentCS301 – Data Structures
Lecture No. 26
___________________________________________________________________
In the first step, we make a list of all letters (characters) including space and end line
character and find out the number of occurrences of each letter/character. For example
we ascertain how many times the letter ‘a’ is found in the file and how many times ‘b’
occurs and so on. Thus we find the number of occurrences (i.e. frequency) of each
letter in the text file.
In the step 2, we consider the pair (i.e. letter and its frequency) as a node. We will
This is the end of the preview. Sign up
to
access the rest of the document.
 Spring '10
 Dr.Naveed Malik
 Data Structures

Click to edit the document details