1
Chapter 1
Information Sources &
Source Coding

2
Chapter 1
(Part 3)
•
Huffman coding
•
Lempel-Ziv coding
•
Run Length encoding
•
Differential coding
•
Shannon-Fano coding
•
Arithmetic coding

3
Huffman Coding
•Having been introduced to what prefix codes arein Part 2, you will now learn how to actually constructa type of prefix code known as the Huffman code•The basic idea behind Huffman coding is to encode each symbol with a binary codeword that is roughly equal in length to the amount of information conveyed by the symbol in question. (Why?)
.

4
The Huffman Coding Algorithm
•
STEP 1: The
Splitting
Stage
List the
source symbols in the order of
decreasing probability
. The
two symbols of lowest probability are assigned a 0 and a 1.
•
STEP 2: The
Combining
Stage
Combine
the probabilities of the last two symbols, and
reorder
the
resultant probabilities. The list of symbols is now reduced by one.
•
STEP 3: Repeat
Repeat STEP 1 and STEP 2 until only
two symbols are left
, for
which a 0 and 1 are assigned.
•
STEP 4: Encode
The code for each source symbol is found by
working backward
and tracing the sequence of 0s and 1s assigned to that symbol as
well as its successors

5
Example 1.6a•In this example, we demonstrate how a prefix code is constructed for a DMS with alphabet {s0, s1, s2, s3, s4} and corresponding probabilities {0.4, 0.2, 0.2, 0.1, 0.1}.•Following through the Huffman algorithm, our computation ends after four iterations, resulting in the Huffman treeshown below:

6
:

7
From Example 1.6a, we may make several observations:
•
No two codeword consist of identical arrangement of bits
•
No codeword is a prefix of another codeword => Huffman code is
a type of prefix code
•
Higher probability symbols have shorter codewords, and vice
versa => Huffman code is a
variable-length
code
•
The two
least probable
codewords have
equal length
, and differ
only in the
final digit
•
The average codeword length is very close to the source entropy
•
The average codeword length satisfy H(S) <
L < H(S) + 1

8
Variations in Huffman Coding
•
The Huffman tree constructed in Example 1.6a is
not unique
.
In particular, there are
two variations
to the process that may
produce different sets of Huffman codes for the same source:
1) at each splitting stage, there is arbitrariness in the way a 0 and a
1 are assigned to the last source symbols
2) when the probability of a combined symbol is found to equal
another probability in the list, we may proceed by placing the
new combined symbol
as high as possible
or as
low as possible
•
Whichever way the variations are chosen, however, they are
to be
consistently adhered to
throughout the encoding
process.