Harvard SEAS
ES250 – Information Theory
Homework 1
Solution
1. Let
p
(
x, y
) be given by
X
Y
0
1
0
1/3
1/3
1
0
1/3
Evaluate the following expressions:
(a)
H
(
X
),
H
(
Y
)
(b)
H
(
X

Y
),
H
(
Y

X
)
(c)
H
(
X, Y
)
(d)
H
(
Y
)

H
(
Y

X
)
(e)
I
(
X
;
Y
)
(f) Draw a Venn diagram for the quantities in (a) through (e)
Solution :
(a)
H
(
X
) =
2
3
log
3
2
+
1
3
log 3 = 0
.
918 bits =
H
(
Y
)
(b)
H
(
X

Y
) =
1
3
H
(
X

Y
= 0) +
2
3
H
(
X

Y
= 1) = 0
.
667 bits =
H
(
Y

X
)
(c)
H
(
X, Y
) = 3
×
1
3
log 3 = 1
.
585 bits
(d)
H
(
Y
)

H
(
Y

X
) = 0
.
251 bits
(e)
I
(
X
;
Y
) =
H
(
Y
)

H
(
Y

X
) = 0
.
251 bits
(f)
2. Entropy of functions of a random variable
(a) Let
X
be a discrete random variable.
Show that the entropy of a function of
X
is less than
or equal to the entropy of
X
by justifying the following steps:
H
(
X, g
(
X
))
(
a
)
=
H
(
X
) +
H
(
g
(
X
)

X
)
(
b
)
=
H
(
X
)
H
(
X, g
(
X
))
(
c
)
=
H
(
g
(
X
)) +
H
(
X

g
(
X
))
(
d
)
≥
H
(
g
(
X
))
Thus
H
(
g
(
X
))
≤
H
(
X
).
(b) Let
Y
=
X
7
, where
X
is a random variable taking in positive and negative integer values.
What is the relationship of
H
(
X
) and
H
(
Y
)? What if
Y
= cos(
πX/
3) ?
Solution :
(a)
STEP (a) :
H
(
X, g
(
X
)) =
H
(
X
) +
H
(
g
(
X
)

X
) by the chain rule for entropies.
1
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Harvard SEAS
ES250 – Information Theory
STEP (b) :
H
(
g
(
X
)

X
) = 0 since for any particular value of
X
,
g
(
X
) is fixed, and hence
H
(
g
(
X
)

X
) =
∑
x
p
(
x
)
H
(
g
(
X
)

X
=
x
) =
∑
x
0 = 0.
STEP (c) :
H
(
X, g
(
X
)) =
H
(
g
(
X
)) +
H
(
X

g
(
X
)) again by the chain rule.
STEP (d) :
H
(
X

g
(
X
))
≥
0, with equality iff
X
is a function of
g
(
X
), i.e.,
g
(
.
) is onetoone. Hence
H
(
X, g
(
X
))
≥
H
(
g
(
X
)).
Combining STEP (b) and (d), we obtain
H
(
X
)
≥
H
(
g
(
X
)).
(b) By the part (a), we know that passing a random variable through a function can only reduce the
entropy or leave it unchanged, but never increase it. That is,
H
(
g
(
X
))
≤
H
(
X
), for any function
g
.
The reason for this is simply that if the function
g
is not onetoone, then it will merge some states,
reducing entropy.
The trick, then, for this problem, is simply to determine whether or not the mappings are onetoone.
If so, then entropy is unchanged. If the mappings are not onetoone, then the entropy is necessarily
decreased. Note that whether the function is onetoone or not is only meaningful for the support of
X
, i.e. for all
x
with
p
(
x
)
>
0.
Y
=
X
7
is onetoone and hence the entropy, which is just a function of the probabilities does
not change, i.e.
H
(
X
) =
H
(
Y
).
Y
= cos(
πX/
3) is not onetoone, unless the support of
X
is rather small, since this function maps
the entire set of integers into just three different values!
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '10
 323
 Information Theory, Probability theory, Harvard SEAS

Click to edit the document details