This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Chapter 31 Large Deviations for IID Sequences: The Return of Relative Entropy Section 31.1 introduces the exponential version of the Markov in- equality, which will be our major calculating device, and shows how it naturally leads to both the cumulant generating function and the Legendre transform, which we should suspect (correctly) of being the large deviations rate function. We also see the reappearance of rela- tive entropy, as the Legendre transform of the cumulant generating functional of distributions. Section 31.2 proves the large deviations principle for the empir- ical mean of IID sequences in finite-dimensional Euclidean spaces (Cram´ er’s Theorem). Section 31.3 proves the large deviations principle for the empiri- cal distribution of IID sequences in Polish spaces (Sanov’s Theorem), using Cram´ er’s Theorem for a well-chosen collection of bounded con- tinuous functions on the Polish space, and the tools of Section 30.2. Here the rate function is the relative entropy. Section 31.4 proves that even the infinite-dimensional empirical process distribution of an IID sequence in a Polish space obeys the LDP, with the rate function given by the relative entropy rate. The usual approach in large deviations theory is to establish an LDP for some comparatively tractable basic case through explicit calculations, and then use the machinery of Section 30.2 to extend it to LDPs for more complicated cases. This chapter applies this strategy to IID sequences. 216 CHAPTER 31. IID LARGE DEVIATIONS 217 31.1 Cumulant Generating Functions and Rela- tive Entropy Suppose the only inequality we knew in probability theory was Markov’s inequal- ity, P ( X ≥ a ) ≤ E [ X ] /a when X ≥ 0. How might we extract an exponential probability bound from it? Well, for any real-valued variable, e tX is positive, so we can say that P ( X ≥ a ) = P ( e tX ≥ e ta ) ≤ E e tX /e ta . E e tX is of course the moment generating function of X . It has the nice property that addition of independent random variables leads to multiplication of their moment generat- ing functions, as E e t ( X 1 + X 2 ) = E e tX 1 e tX 2 = E e tX 1 E e tX 2 if X 1 | = X 2 . If X 1 , X 2 , . . . are IID, then we can get a deviation bound for their sample mean X n through the moment generating function: P ( X n ≥ a ) = P n i =1 X i ≥ na P ( X n ≥ a ) ≤ e- nta ( E e tX 1 ) n 1 n log P ( X n ≥ a ) ≤ - ta + log E e tX 1 ≤ inf t- ta + log E e tX 1 ≤ - sup t ta- log E e tX 1 This suggests that the functions log E e tX and sup ta- log E e tX will be useful to us. Accordingly, we encapsulate them in a pair of definitions. Definition 423 (Cumulant Generating Function) The cumulant generat- ing function of a random variable X in R d is a function Λ : R d → R , Λ( t ) ≡ log E e t · X (31.1) Definition 424 (Legendre Transform) The Legendre transform of a real- valued function f on R d is another real-valued function on R d , f * ( x ) ≡ sup t ∈ R d t · x- f ( t ) (31.2)(31....
View Full Document
This note was uploaded on 12/20/2011 for the course STAT 36-754 taught by Professor Schalizi during the Spring '06 term at University of Michigan.
- Spring '06