Harvard SEAS
ES250 – Information Theory
Entropy, relative entropy, and mutual information
*
1
Entropy
1.1
Entropy of a random variable
Definition
The entropy of a discrete random variable
X
with pmf
p
X
(
x
) is
H
(
X
) =

X
x
p
(
x
) log
p
(
x
)
The entropy measures the expected uncertainty in
X
. It has the following properties:
•
H
(
X
)
≥
0, entropy is always nonnegative.
H
(
X
) = 0 iff
X
is deterministic.
•
Since
H
b
(
X
) = log
b
(
a
)
H
a
(
X
), we don’t need to specify the base of the logarithm.
1.2
Joint entropy and conditional entropy
Definition
Joint entropy between two random variables
X
and
Y
is
H
(
X, Y
)
,

E
p
(
x,y
)
[log
p
(
X, Y
)]
=

X
x
∈X
X
y
∈Y
p
(
x, y
) log
p
(
x, y
)
Definition
Given a random variable
X
, the conditional entropy of
Y
(average over
X
) is
H
(
Y

X
)
,

E
p
(
x
)
[
H
(
Y

X
=
x
)]
=

X
x
∈X
p
(
x
)
H
(
Y

X
=
x
)
=

E
p
(
x
)
E
p
(
y

x
)
[log
p
(
Y

X
)]
=

E
p
(
x,y
)
[log
p
(
Y

X
)]
Note:
H
(
X

Y
)
6
=
H
(
Y

X
).
1.3
Chain rule
Joint and conditional entropy provide a natural calculus:
Theorem
(Chain rule)
H
(
X, Y
) =
H
(
X
) +
H
(
Y

X
)
Corollary
H
(
X, Y

Z
) =
H
(
X

Z
) +
H
(
Y

X, Z
)
*
Based on Cover & Thomas, Chapter 2
1
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Harvard SEAS
ES250 – Information Theory
2
Relative Entropy and Mutual Information
2.1
Entropy and Mutual Information
•
Entropy
H
(
X
) is the uncertainty (“selfinformation”) of a single random variable
•
Conditional entropy
H
(
X

Y
) is the entropy of one random variable
conditional upon
knowledge of another.
•
We call the reduction in uncertainty
mutual information
:
I
(
X
;
Y
) =
H
(
X
)

H
(
X

Y
)
•
Eventually we will show that the maximum rate of transmission over a given channel
p
(
Y

X
), such that the
error probability goes to zero, is given by the
channel capacity
:
C
= max
p
(
X
)
I
(
X
;
Y
)
Theorem
Relationship between mutual information and entropy
I
(
X
;
Y
)
=
H
(
X
)

H
(
X

Y
)
I
(
X
;
Y
)
=
H
(
Y
)

H
(
Y

X
)
I
(
X
;
Y
)
=
H
(
X
) +
H
(
Y
)

H
(
X, Y
)
I
(
X
;
Y
)
=
I
(
Y
;
X
)
(symmetry)
I
(
X
;
X
)
=
H
(
X
)
(“selfinformation”)
2.2
Relative Entropy and Mutual Information
Definition
Relative entropy
(Information or KullbackLeibler divergence)
D
(
p
k
q
)
,
E
p
•
This is the end of the preview.
Sign up
to
access the rest of the document.
 '09
 Information Theory, Mutual Information, Harvard SEAS

Click to edit the document details