This preview shows pages 1–3. Sign up to view the full content.
6.896 Sublinear Time Algorithms
March 1, 2007
Lecture 8
Lecturer: Ronitt Rubinfeld
Scribe: Jacob Scott
1
Hufman Coding and Entropy
Consider a string
w
=
w
1
w
2
...w
m
on an alphabet
A
=
a
1
a
2
...a
n
. We will now be considering our data
as fxed, as opposed to being generated From a probability distribution as in previous lectures. Thus, we
can consider the Frequency oF each letter in the alphabet,
p
=
{
p
1
,p
2
,...,p
n
}
. We can now defne a code
C
=
{
c
1
,c
2
,...,c
n
}
such that
c
i
is the “code word” For
a
i
. The Following coding algorithm encodes
w
:
Coding Algorithm
scan leFt to right
→
iF
w
i
=
a
j
write
c
j
Choice of code
We would like to pick variable lengths From the
c
i
’s to minimize
L
(
C
)=
X
i
p
(
i
)

c
i

Which can be considered the expected length oF a letter
a
i
drawn From
p
andwrittenas
c
i
.
Shannon’s
Source Coding Theorem
relates this quantity to entropy as Follows:
L
(
C
)
≥
H
(
p
)
Hu±man codes achieve this bound when For all
i
there is an integral
j
i
such that
p
i
=2

j
i
.
Some examples oF distributions and their entropies are:
1.
H
(
U
n
)=log
n
2.
H
(
p
1
=1
i>
1
=0)=0
3. IF
p
1
/
2
2
=
p
3
=
...
=
p
n
:
L
(
C
)
≥
H
(
p
)
=
−
1
/
2log1
/
2+(
n
−
1)
1
2(
n
−
1)
log
1
2(
n
−
1)
=l
o
g
2+
1
/
2log
1
n
−
1
This is approximately halF oF the entropy oF the uniForm distribution.
4. IF
p
i
/
2
i
:
H
(
p
)=2
5. IF
p
1
=
p
2
=
=
1
l
l
+1
=
=
p
n
=0:
H
(
p
l
1
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document 2
Distinct Colors
Before moving to talk about LempelZiv compression, we will explore the following questions: how many
distinct letters are there in a string? This problem arises in many areas, for example in the study of
This is the end of the preview. Sign up
to
access the rest of the document.
This note was uploaded on 04/02/2010 for the course CS 6.896 taught by Professor Ronittrubinfeld during the Fall '04 term at MIT.
 Fall '04
 RonittRubinfeld
 Algorithms

Click to edit the document details