Introduction to Information Theory (67548)
December 21, 2008
Assignment 1: Solution
Lecturer: Prof. Michael Werman
Due:
Important note:
All logarithms are in base 2 unless speciﬁed otherwise.
Problem 1 Joint Distribution and Entropy
1.
H
(
X,Y
) =

X
x,y
p
(
x,y
) log(
p
(
x,y
)) =
3
8
log(8) +
2
16
log(16) +
1
2
log(2) = 2
.
125 bits
.
H
(
X

Y
) =

X
x,y
p
(
x,y
) log(
p
(
x

y
)) = 0
.
625 bits
.
H
(
Y

X
=
c
) =

X
y
p
(
c,y
) log(
p
(
y

c
)) = 0 bits
.
It is easy to verify that the distribution of
X
is
p
(
a
) = 1
/
4
,p
(
b
) = 11
/
16
,p
(
c
) = 1
/
16, and therefore
H
(
X
)
≈
1
.
12 bits. We have already calculated that
H
(
X

Y
) = 0
.
625 bits. As a result,
I
(
X
;
Y
) =
H
(
X
)

H
(
X

Y
)
≈
1
.
12

0
.
625 = 0
.
495 bits
.
2. For any discrete random variable (or a ﬁnite collection of discrete random variables), the distri
bution which maximizes entropy is the uniform distribution. The distribution which maximizes
H
(
X,Y
) is therefore
p
(
x,y
) = 1
/
9 for any
x,y
∈ {
a,b,c
}
, and the entropy if log(9)
≈
3
.
17 bits.
3. It is easy to verify that if
H
(
X
) = 0, then
X
must be essentially constant (equal to a single value
with probability 1). Let us assume w.l.o.g that
X
=
a
with probability 1. Under this constraint,
the distribution which maximizes
H
(
X,Y
) is the uniform distribution for
Y
: namely,
p
(
a,y
) = 1
/
3
for any
y
∈ {
a,b,c
}
and 0 otherwise. In this case,
H
(
X,Y
) = log(3)
≈
1
.
58 bits. Not surprisingly,
notice that this is a lower entropy that in the previous question, where
X
was not constrained to
be constant.
4. As explained in the answer to the previous question,
H
(
X
) =
H
(
Y
) = 0 implies that both
X
and
Y
receive the same value with probability 1, and thus their joint entropy
H
(
X,Y
) is 0 bits.
Problem 2 Counterfeit Coins