4.8
Huffman Codes
These lecture slides are supplied by Mathijs de Weerd
2
Data Compression
Q.
Given a text that uses 32 symbols (26 different letters, space, and
some punctuation characters), how can we encode this text in bits?
Q.
Some symbols (e, t, a, o, i, n) are used far more often than others.
How can we use this to reduce our encoding?
Q.
How do we know when the next symbol begins?
Ex.
c(a) = 01
What is 0101?
c(b) = 010
c(e) = 1
3
Data Compression
Q.
Given a text that uses 32 symbols (26 different letters, space, and
some punctuation characters), how can we encode this text in bits?
A.
We can encode 2
5
different symbols using a fixed length of 5 bits per
symbol. This is called
fixed length encoding
.
Q.
Some symbols (e, t, a, o, i, n) are used far more often than others.
How can we use this to reduce our encoding?
A.
Encode these characters with fewer bits, and the others with more bits.
Q.
How do we know when the next symbol begins?
A.
Use a separation symbol (like the pause in Morse), or make sure that
there is no ambiguity by ensuring that
no
code is a
prefix
of another one.
Ex.
c(a) = 01
What is 0101?
c(b) = 010
c(e) = 1
4
Prefix Codes
Definition.
A
prefix code
for a set S is a function c that maps each x
∈
S to 1s and 0s in such a way that for x,y
∈
S, x
≠
y,
c(x) is not a prefix of
c(y).
Ex.
c(a) = 11
c(e) = 01
c(k) = 001
c(l) = 10
c(u) = 000
Q.
What is the meaning of 1001000001 ?
Suppose frequencies are known in a text of 1G:
f
a
=0.4,
f
e
=0.2,
f
k
=0.2,
f
l
=0.1,
f
u
=0.1
Q.
What is the size of the encoded text?
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
5
Prefix Codes
Definition.
A
prefix code
for a set S is a function c that maps each x
∈
S to 1s and 0s in such a way that for x,y
∈
S, x
≠
y,
c(x) is not a prefix of
c(y).
Ex.
c(a) = 11
c(e) = 01
c(k) = 001
c(l) = 10
c(u) = 000
Q.
What is the meaning of 1001000001 ?
A.
“leuk”
Suppose frequencies are known in a text of 1G:
f
a
=0.4,
f
e
=0.2,
f
k
=0.2,
f
l
=0.1,
f
u
=0.1
Q.
What is the size of the encoded text?
A.
2*f
a
+ 2*f
e
+ 3*f
k
+ 2*f
l
+ 4*f
u
= 2.4G
6
Optimal Prefix Codes
Definition.
The
average bits per letter
of a prefix code c is the sum
over all symbols of its frequency times the number of bits of its
encoding:
We would like to find a prefix code that is has the lowest possible
average bits per letter.
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '12
 Ajay
 Prefix code, Entropy encoding

Click to edit the document details