Theorem 4.8. Strong Law of Large Numbers.
Let
X
1
, X
2
, . . .
be i.i.d with
E

X
i

<
∞
and
EX
i
=
μ
. Then
¯
X
n
→
μ
as
n
→ ∞
.
A corollary of this result is the frequency interpretation of probability. Let
X
i
= 1 if the
event
A
occurs on the
i
th trial and 0 otherwise.
EX
i
=
P
(
A
). Theorem 4.8 implies that
¯
X
n
= the fraction of times
A
occurs in the first
n
trials converges to
P
(
A
). While Theorem
4.8 is nice, for practical purposes Theorem 4.7 suffices since it says that if
n
is large the
sample mean is close to the true mean with high probability.
118
CHAPTER 4.
LAW OF LARGE NUMBERS
4.5
Central Limit Theorem
At this point we know that if
X
1
, X
2
, . . .
are independent and have the same distribution
with mean
EX
i
=
μ
and variance var (
X
i
) =
σ
2
∈
(0
,
∞
) then, see (4.14), the sum
S
n
=
X
1
+
· · ·
+
X
n
has mean
ES
n
=
nμ
and var (
S
n
) =
nσ
2
. Using (1.15) and (4.2) we see that
S
n

nμ
has mean 0 and variance
nσ
2
so
S
n

nμ
σ
√
n
has mean 0 and variance 1.
The remarkable fact, called the central limit theorem is that as
n
→ ∞
this scaled variable
converges to the standard normal distribution.
Theorem 4.9.
Suppose
X
1
, X
2
, . . .
are independent and have the same distribution with
mean
EX
i
=
μ
and variance var
(
X
i
) =
σ
2
∈
(0
,
∞
)
. Then for all
a < b
P
a
≤
S
n

nμ
σ
√
n
≤
b
→
Z
b
a
1
√
2
π
e

x
2
/
2
dx
To apply this result we need to learn how to use the normal table.
If we let Φ(
x
) =
P
(
χ
≤
x
) be the normal distribution function then
P
(
a
≤
χ
≤
b
) = Φ(
b
)

Φ(
a
)
The values of Φ for positive values of
x
are given in the table at the back of the book. By
symmetryΦ(

x
) =
P
(
χ
≤ 
x
) =
P
(
χ
≥
x
) = 1

Φ(
x
) so
P
(

x
≤
χ
≤
x
) = Φ(
x
)

(1

Φ(
x
)) = 2Φ(
x
)

1
(4.17)
To illustrate the use of Theorem
??
we will use a small part of the table
x
0
1
2
3
Φ(
x
)
0.500
0.8413
0.9772
0.9986
P
(

x
≤
χ
≤
x
)
0
0.6826
0.9544
0.9972
In words, for the normal distribution the probability of being within one standard deviation
of the mean is 68%, within two standard deviations is 95%, and the probability of being
more that three standard deviations away is ¡ 0.3%.
We begin by considering the situation
P
(
X
i
= 1) =
P
(
X
i
=

1) which has
EX
i
= 0 and
var (
X
i
) =
E
(
X
2
i
) = 1. Taking
a
=

1 and
b
= 1 and using our little table
P
(

√
n
≤
S
n
≤
√
n
)
≈
P
(

1
≤
χ
≤
1) = 0
.
6826
Thus if
n
= 2500 = 50
2
our net winnings will be
∈
[

50
,
50] with probability
≈
0
.
6826.
Since 1275

1225 = 50 this means that the number of heads will be
∈
[1225
,
1275] with that
probability. 1275
/
2500 = 0
.
51 so this corresponds to the fraction of heads
∈
[0
.
49
,
0
.
51].
Using the other two values in the table, if
n
= 2500 our net winnings will be in [

100
,
100]
with probability
≈
0
.
9544, and in [

150
,
150] with probability
≈
0
.
9972.
The last result
shows that the bound from Chebyshev’s inequality can be very crude. Chebyshev tells us
that
P
(

χ
 ≥
3)
≤
var (
χ
)
3
2
=
1
9
while the true value of
P
(

χ
 ≥
3) = 1

0
.
9972 = 0
.
0028
≈
1
/
357.
Turning to a smaller value of
n
.
4.5.
CENTRAL LIMIT THEOREM
119
Example 4.20.
Suppose we flip a coin 100 times. What is the probability we get at least
56 heads?
Let
X
i
= 1 if the
i
th flip is heads, 0 otherwise, and
S
100
=
X
1
+
· · ·
+
X
100
.
EX
i
= 1
/
2
and var (
X
i
) = 1
/
4 so
ES
100
= 50 and
σ
(
S
100
) =
p
100
/
4 = 5. To apply the central limit
theorem we write
P
(
S
100
≥
56) =
P
S
100

50
5
≥
6
5
≈
P
(
χ
≥
1
.
2)
= 1

P
(
χ
≤
1
.
2) = 1

0
.
8849 = 0
.
1151
If the question has been “What is the probability of at most 55 heads?”
we would have
computed
P
(
S
100
≤
55) =
P
S
100

50
5
≤
5
5
≈
P
(
χ
≤
1
.
0) = 0
.
8413
You've reached the end of your free preview.
Want to read all 39 pages?
 Spring '13
 Mattingly
 Normal Distribution, Probability theory