42
Figure 4.1: Venn diagram to illustrate the relationships of entropy and relative entropy
H(XY)
I(X;Y)
H(YX)
H(Y)
H(X)
Problem 2.3
Minimum entropy.
What is the minimum value of
H
(
p
1
, ..., p
n
) =
H
(
p
)
as
p
ranges over the set of
n
dimensional probability vectors? Find all
p
’s which achieve this minimum.
Solution 2.3
We wish to find
all
probability vectors
p
= (
p
1
, p
2
, . . . , p
n
)
which minimize
H
(
p
) =

X
i
p
i
log
p
i
.
Now

p
i
log
p
i
≥
0
, with equality iff
p
i
= 0
or
1
. Hence the only possible probability vectors which minimize
H
(
p
)
are those with
p
i
= 1
for some
i
and
p
j
= 0
, j
6
=
i
. There are
n
such vectors, i.e.,
(1
,
0
, . . . ,
0)
,
(0
,
1
,
0
, . . . ,
0)
, ...,
(0
, . . . ,
0
,
1)
, and the minimum value of
H
(
p
)
is 0.
Problem 2.5
Entropy of functions of a random variable.
Let
X
be a discrete random variable. Show that the entropy
of a function of
X
is less than or equal to the entropy of
X
by justifying the following steps:
H
(
X, g
(
X
))
(
a
)
=
H
(
X
) +
H
(
g
(
X
)

X
)
(4.1)
(
b
)
=
H
(
X
);
(4.2)
H
(
X, g
(
X
))
(
c
)
=
H
(
g
(
X
)) +
H
(
X

g
(
X
))
(4.3)
(
d
)
≥
H
(
g
(
X
))
.
(4.4)
Thus
H
(
g
(
X
))
≤
H
(
X
)
.
Solution 2.5
Entropy of functions of a random variable.
1.
H
(
X, g
(
X
)) =
H
(
X
) +
H
(
g
(
X
)

X
)
by the chain rule for entropies.
2.
H
(
g
(
X
)

X
) = 0
since for any particular value of X, g(X) is fixed, and hence
H
(
g
(
X
)

X
) =
∑
x
p
(
x
)
H
(
g
(
X
)

X
=
x
) =
∑
x
0 = 0
.
3.
H
(
X, g
(
X
)) =
H
(
g
(
X
)) +
H
(
X

g
(
X
))
again by the chain rule.
4.
H
(
X

g
(
X
))
≥
0
, with equality iff
X
is a function of
g
(
X
)
, i.e.,
g
(
.
)
is onetoone. Hence
H
(
X, g
(
X
))
≥
H
(
g
(
X
))
.
Combining parts (b) and (d), we obtain
H
(
X
)
≥
H
(
g
(
X
))
.
Problem 2.6