This preview shows page 1. Sign up to view the full content.
Unformatted text preview: right side of (1.56). In other words, limn→1 FZn (y ) = FΦ (y ) for each y ∈ R. This is
called convergence in distribution, since it is the sequence of distribution functions that is
converging. The theorem is illustrated by Figure 1.10.
It is known that if |X | has a ﬁnite third moment, then for each y ∈ R, FZn (y ) converges
with increasing n to FΦ (y ) as 1/ n. Without a third moment the convergence still takes
place, but perhaps very slowly.
The word central appears in the name central limit theorem because FZn (y ) is usually much
better approximated by FΦ (y ) when the magnitude of y is relatively small, say y ∈ (−3, 3).
Part of the reason for this is that when y is very negative, FΦ (y ) is extremely small (for
example FΦ (−3) = 0.00135). Approximating FZn (y ) by FΦ (y ) is then only going to be
meaningful if |FZn (y ) − FΦ(y) | is small relative to FΦ (y ). For y = −3, say, this requires n
to be huge. The same problem occurs for large y , since 1 − Fφ (y ) is extremely small and is
the quantity of interest for large y . Finding how FZn (y ) behaves for large n and large |y | is
a part of the theory of large deviations, which will be introduced in Chapter 7.
The central limit theorem (CLT) helps explain why Gaussian random variables play such
a central role in probability theory. In fact, many of the cookbook formulas of elementary
statistics are based on the tacit assumption that the underlying variables are Gaussian, and
the CLT helps explain why these formulas often give reasonable results.
One should be careful to avoid reading more into the CLT than it says. For example, the
normalized sum, [Sn − nX ]/( nσ ) need not have a density that is approximately Gaussian. 32 CHAPTER 1. INTRODUCTION AND REVIEW OF PROBABILITY 1 · · · · · · · · · · · 0.8 · · · · · · · · · · · · · · · · · · · · · · · FZn (z ) 0.6 ·
Zn = Sn −E[Sn ]
n · n=4
n = 20 X 0.4 · · · · · · · · · · · · · 0.2 · · · · · · · · · · · · · · n = 50 · · 0
-2 -1 0 1 2 Figure 1.10: The same distribution functions as Figure 1.5 normalized to 0 mean
and unit standard deviation, i.e., the distribution functions of Zn = Sn −E[Sn ] for n =
4, 20, 50. Note that as n increases, the distribution function of Zn slowly starts to
resemble the normal distribution function. In fact, if the underlying variables are discrete, the normalized sum is also, and does not
have a density at all. What is happening is that the normalized sum can have very detailed
ﬁne structure; this does not disappear as n increases, but becomes “integrated out” in the
distribution function. We will not use the CLT extensively here, and will not prove it
(See Feller  for a thorough and careful exposition on various forms of the central limit
theorem). Giving a proof from ﬁrst principles is quite tricky and uses analytical tools that
will not be needed subsequently; many elementary texts on probability give “heuristic”
proofs indicating that the normalized sum has a density that tends to Gaussian (thus
indicating that both the heuristics and the manipulations are wrong).
The central limit theorem gives us very explicit information on how the distribution function
of the sample average, FSn /n converges to X . It says that as n increases, FSn /n becomes
better approximated by the S-shaped Gaussian distribution function, where with increasing
n, the standard deviation of the Gaussian curve decreases as 1/ n. Thus not only do we
know that FSn is approaching a unit step, but we know the shape of FSn as it appoaches
this step. Thus it is reasonable to ask why we emphasize the weak law of large numbers
when the central limit is so much more explicit. There are three answers—the ﬁrst is that
sometimes one doesn’t care about the added information in (1.56), and that the added
information obscures some issues of interest. The next is that (1.56) is only meaningful
if the variance σ 2 of X is ﬁnite, whereas, as we soon show, (1.51) holds whether or not
the variance is ﬁnite. One might think that variables with inﬁnite variance are of purely
academic interest. However, whether or not a variance is ﬁnite depends on the far tails of
a distribution, and thus results are considerably more robust when a ﬁnite variance is not
The third reason for interest in laws of large numbers is that they hold in many situations 1.4. THE LAWS OF LARGE NUMBERS 33 more general than that of sums of IID random variables18 . These more general laws of
large numbers can usually be interpreted as saying that a time-average (i.e., an average
over time of a sample function of a stochastic process) approaches the expected value of the
process at a given time. Since expected values only exist within a probability model, but
time-averages can be evaluated from a sample function of the actual process being modeled,
the relationship between time-averages and expected values is often the link between...
View Full Document
This note was uploaded on 09/27/2010 for the course EE 229 taught by Professor R.srikant during the Spring '09 term at University of Illinois, Urbana Champaign.
- Spring '09