This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Copyright c 2010 by Karl Sigman 1 Rare event simulation and importance sampling Suppose we wish to use Monte Carlo simulation to estimate a probability p = P ( A ) when the event A is rare (e.g., when p is very small). An example would be p = P ( M k > b ) with a very large b for M k = max j k R k , the maximum over the first k steps of a random walk. We could naively simulate n (large) iid copies of A , denoted by A 1 ,A 2 ,...,A n , then set X i = I { A i } and use the crude estimate p p ( n ) = 1 n n X i =1 X i . (1) But this is not a good idea: def = E ( X i ) = P ( A ) = p and 2 def = V ar ( X i ) = p (1 p ) and so, since p is assumed very small, the ratio / = p p (1 p ) /p 1 / p as p 0; relative to , is of a much larger magnitude. This is very bad since when constructing confidence intervals, p ( n ) z / 2 n , the length of the interval is in units of : If is much larger than what we are trying to estimate, , then the confidence interval will be way too large to be of any use. It would be like saying I am 95% confident that he weighs 140 pounds plus or minus 500 pounds. To make matters worse, increasing the number n of copies in the Monte Carlo so as to reduce the interval length, while sounding OK, could be impractical since n would end up having to be enormous. Importance sampling is a technique that gets around this problem by changing the proba bility distributions of the model so as to make the rare event happen often instead of rarely. To understand the basic idea, suppose we wish to compute E ( h ( X )) = R h ( x ) f ( x ) dx for a continuous random variable X distributed with density f ( x ). For example, if h ( x ) = I { x > b } for a given large b , then h ( X ) = I { X > b } and E ( h ( X )) = P ( X > b ). Now let g ( x ) be any other density such that f ( x ) = 0 whenever g ( x ) = 0, and observe that we can rewrite E ( h ( X )) = Z h ( x ) f ( x ) dx = Z h h ( x ) f ( x ) g ( x ) i g ( x ) dx = E h h ( X ) f ( X ) g ( X ) i , where E denotes expected value when g is used as the distribution of X (as opposed to the original distribution f ). In other words: If X has distribution g , then the expected value of h ( X ) f ( X ) g ( X ) is the same as what we originally wanted. The ratio L ( X ) = f ( X ) g ( X ) is called the likelihood ratio. We can write E ( h ( X )) = E ( h ( X ) L ( X )); (2) the lefthand side uses distribution f for X , while the righthand side uses distribution g for X . 1 To make this work in our favor, we would want to choose g so that the variance of h ( X ) L ( X ) (under g ) is small relative to its mean. We can easily generalize this idea to multidimensions: Suppose h = h ( X 1 ,...,X k ) is real valued where ( X 1 ,...,X k ) has joint density f ( x 1 ,...x k ). Then for an alternative joint density g ( x 1 ,...x k ), we once again can write E ( h ( X 1 ,...,X k )) = E ( h ( X 1 ,...,X k ) L ( X 1 ,...,X,....
View Full
Document
 Spring '07
 sigman

Click to edit the document details