ECE 1502 — Information Theory
Problem Set 2 solutions
1
February 15, 2006
3.2
An AEPlike limit
.
X
1
, X
2
, . . . ,
i.i.d.
∼
p
(
x
). Hence log(
X
i
) are also i.i.d. and
lim(
p
(
X
1
, X
2
, . . . , X
n
))
1
n
=
lim
e
log(
p
(
X
1
,X
2
,...,X
n
))
1
n
=
e
lim
1
n
∑
log
p
(
X
i
)
a.s.
=
e
E
(log(
p
(
X
)))
a.s.
=
e

H
(
X
)
a.s.
by the strong law of large numbers (assuming of course that H(X) exists). Note that “a.s.”
means “almost surely”, i.e., with probability 1.
Remarks (added by FK):
The author of this solution is invoking the strong law of large
numbers, which states that, if
X
1
, X
2
, . . .
is an i.i.d. sequence of random variables each with
expected value
E
[
X
], then
lim
n
→∞
1
n
n
i
=1
X
i
=
E
[
X
] with probability 1
.
This sense of stochastic convergence is called “almost sure convergence,” i.e., the strong law
of large numbers says that the sample average converges to the mean almost surely.
Recall, on the other hand, the weak law of large numbers, which states that if
X
1
, X
2
, . . .
is
an i.i.d. sequence of random variables each with expected value
E
[
X
], then, for every
>
0,
lim
n
→∞
P
[

1
n
n
i
=1
X
i

E
[
X
]

<
] = 1
.
This sense of stochastic convergence is called “convergence in probability,” i.e., the weak law
of large numbers says that the sample average converges to the mean in probability.
As the names imply, the “strong law” is stronger than the “weak law” in the sense that
almost sure convergence implies convergence in probability.
To understand the distinction
between the two, it is instructive to consider an example of a sequence of random variables
that converges in probability, but not almost surely. The following is one such example.
The Shrinking Bullseye
: Let
U
1
, U
2
, U
3
, . . .
be a sequence of independent random variables
uniformly distibuted over the unit disc in
R
2
.
The disc represents a dart board, and
U
i
represents the location at which a dart, thrown at random, hits the dart board.
Suppose
that the area of the bullseye in the dart board in the
n
th trial is given by
π/n
. (This is the
“shrinking bullseye”.) The probability of hitting the bullseye in the
n
th trial is 1
/n
. Define
a sequence
B
1
, B
2
, B
3
, . . .
of binary random variables, where
B
n
=
1
if the bullseye is hit by the dart in the
n
th trial, and
0
otherwise.
1
Solutions to problems from the text are supplied courtesy of Joy A. Thomas.
1
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Now for every
<
1,
P
[

B
n

>
] =
P
[
B
n
= 1] = 1
/n
→
0 as
n
→ ∞
.
Thus
B
n
converges to zero in probability, i.e., hitting the bullseye becomes increasingly rare
as
n
becomes large. On the other hand, the set of realizations in which the bullseye is hit
infinitely many times has probability 1. To see this, notice that the probability of completely
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '10
 323
 Probability theory, Kraft, Prefix code, Van a

Click to edit the document details