Discrete-time stochastic processes

# Giving a proof from rst principles is quite tricky

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: right side of (1.56). In other words, limn→1 FZn (y ) = FΦ (y ) for each y ∈ R. This is called convergence in distribution, since it is the sequence of distribution functions that is converging. The theorem is illustrated by Figure 1.10. It is known that if |X | has a ﬁnite third moment, then for each y ∈ R, FZn (y ) converges √ with increasing n to FΦ (y ) as 1/ n. Without a third moment the convergence still takes place, but perhaps very slowly. The word central appears in the name central limit theorem because FZn (y ) is usually much better approximated by FΦ (y ) when the magnitude of y is relatively small, say y ∈ (−3, 3). Part of the reason for this is that when y is very negative, FΦ (y ) is extremely small (for example FΦ (−3) = 0.00135). Approximating FZn (y ) by FΦ (y ) is then only going to be meaningful if |FZn (y ) − FΦ(y) | is small relative to FΦ (y ). For y = −3, say, this requires n to be huge. The same problem occurs for large y , since 1 − Fφ (y ) is extremely small and is the quantity of interest for large y . Finding how FZn (y ) behaves for large n and large |y | is a part of the theory of large deviations, which will be introduced in Chapter 7. The central limit theorem (CLT) helps explain why Gaussian random variables play such a central role in probability theory. In fact, many of the cookbook formulas of elementary statistics are based on the tacit assumption that the underlying variables are Gaussian, and the CLT helps explain why these formulas often give reasonable results. One should be careful to avoid reading more into the CLT than it says. For example, the √ normalized sum, [Sn − nX ]/( nσ ) need not have a density that is approximately Gaussian. 32 CHAPTER 1. INTRODUCTION AND REVIEW OF PROBABILITY 1 · · · · · · · · · · · 0.8 · · · · · · · · · · · · · · · · · · · · · · · FZn (z ) 0.6 · · · √ Zn = Sn −E[Sn ] σ n · n=4 n = 20 X 0.4 · · · · · · · · · · · · · 0.2 · · · · · · · · · · · · · · n = 50 · · 0 -2 -1 0 1 2 Figure 1.10: The same distribution functions as Figure 1.5 normalized to 0 mean √ and unit standard deviation, i.e., the distribution functions of Zn = Sn −E[Sn ] for n = σX n 4, 20, 50. Note that as n increases, the distribution function of Zn slowly starts to resemble the normal distribution function. In fact, if the underlying variables are discrete, the normalized sum is also, and does not have a density at all. What is happening is that the normalized sum can have very detailed ﬁne structure; this does not disappear as n increases, but becomes “integrated out” in the distribution function. We will not use the CLT extensively here, and will not prove it (See Feller [8] for a thorough and careful exposition on various forms of the central limit theorem). Giving a proof from ﬁrst principles is quite tricky and uses analytical tools that will not be needed subsequently; many elementary texts on probability give “heuristic” proofs indicating that the normalized sum has a density that tends to Gaussian (thus indicating that both the heuristics and the manipulations are wrong). The central limit theorem gives us very explicit information on how the distribution function of the sample average, FSn /n converges to X . It says that as n increases, FSn /n becomes better approximated by the S-shaped Gaussian distribution function, where with increasing √ n, the standard deviation of the Gaussian curve decreases as 1/ n. Thus not only do we know that FSn is approaching a unit step, but we know the shape of FSn as it appoaches this step. Thus it is reasonable to ask why we emphasize the weak law of large numbers when the central limit is so much more explicit. There are three answers—the ﬁrst is that sometimes one doesn’t care about the added information in (1.56), and that the added information obscures some issues of interest. The next is that (1.56) is only meaningful if the variance σ 2 of X is ﬁnite, whereas, as we soon show, (1.51) holds whether or not the variance is ﬁnite. One might think that variables with inﬁnite variance are of purely academic interest. However, whether or not a variance is ﬁnite depends on the far tails of a distribution, and thus results are considerably more robust when a ﬁnite variance is not required. The third reason for interest in laws of large numbers is that they hold in many situations 1.4. THE LAWS OF LARGE NUMBERS 33 more general than that of sums of IID random variables18 . These more general laws of large numbers can usually be interpreted as saying that a time-average (i.e., an average over time of a sample function of a stochastic process) approaches the expected value of the process at a given time. Since expected values only exist within a probability model, but time-averages can be evaluated from a sample function of the actual process being modeled, the relationship between time-averages and expected values is often the link between...
View Full Document

## This note was uploaded on 09/27/2010 for the course EE 229 taught by Professor R.srikant during the Spring '09 term at University of Illinois, Urbana Champaign.

Ask a homework question - tutors are online