Unformatted text preview: Monte Carlo Integration 337 Advances in computing power have allowed the development of algorithms to calculate numerically what previously could only be done analytically. Two main types of numerical problems arise in statistics: 1. Integration  necessary for ﬁnding expected values 2. Optimization  maximizing or minimizing a function, for example to ﬁnd the MLE we maximize the likelihood There are various methods for numerical integration, but we’ll focus on those using a random sample, known as Monte Carlo methods. We’ll talk about optimization later. 338 We need three main concepts from probability theory: the pdf, cdf, and expected value. To keep the notation consistent, I’ll talk only about continuous random variables. The ideas are the same for discrete random variables, but we use summation rather than integration, and we need to be a bit more careful about deﬁning inverses of functions. Throughout, I’ll use X to represent a random variable and x to represent a particular value that X might take. 339 Continuous random variables possess a probability density function or pdf f(x). It satisﬁes:
1. f (x) ≥ 0 for all x ∞ 2. −∞ f (x)dx = 1 3. P (a < X < b) = Area under f from a to b
a b f (x)dx Typically the pdf depends on some parameters. These are values in the function that can change, giving us a family of different functions. 340 Example: Exponential pdf
3.0
λ 1 2 3 f (x) = λe−λx 0 x>0 otherwise 1.0 0.0 0.5 1.0 1.5 x 2.0 2.5 3.0 Question: What is the probability that X is between 1 and 2?
2 f(x) 2.0 0.0 λe −λx 1 −λx 2 dx = −e 1 = e−λ − e−2λ > f < function(lambda){ + exp(lambda)exp(2*lambda) +} > f(1:3) [1] 0.23254416 0.11701964 0.04730832
341 The cumulative distribution function or cdf is deﬁned as x F (x) = P (X ≤ x) = f (t)dt What is the cdf of the exponential distribution?
F ( x) = =
x
−∞ f (t)dt λe−λt dt = [ − e − λt ] x 0 = 1 − e − λt −∞ x 0 342 The expected value of a random variable is deﬁned as
E [X ] =
∞ xf (x)dx −∞ One way to think of this is that we’re weighting possible values (x) by how relatively likely they are (f(x)). Using integration by parts, you can show the expected value of an exponential random variable is 1/λ. The expected value of a function g(X) is just ∞ E [g (X )] = g (x)f (x)dx
−∞
343 Monte Carlo integration makes use of the approximation
E [g (X )] = ≈
∞ g (x)f (x)dx B 1 g (x(i) ), B i=1 −∞ where x(1) , . . . , x(B ) make up an independent sample from the distribution with pdf f. The bigger B is, the better the approximation will tend to be (by the Law of Large Numbers). 344 This can be used to easily approximate the mean of any function we’re interested in. These integrals might be difﬁcult to do by hand.
> lambda < 10 > B < 1000 > xsamp < rexp(B, rate = lambda) > mean(xsamp) [1] 0.1012265 > mean(xsamp^2) [1] 0.02059624 > mean(sin(xsamp)) [1] 0.1002195 (Note that your results will be slightly different from mine, due to random variability.) 345 What if there isn’t a built in function (like rexp) for generating a random sample from the distribution? We’ll consider three methods for generating a sample and/or approximating the integral. 1. Inverse cdf method 2. Rejection sampling 3. Importance sampling The methods are listed both in decreasing order of preference as well as decreasing strength of the assumptions needed to use them. So the inverse cdf method is best but requires the strongest assumptions.
346 If the cdf has an inverse in closed form, there is a simple method for generating random values from the distribution. To carry out the inverse cdf method: 1. Generate n samples from a standard uniform distribution. Call this vector u. In R, u < runif(n). 2. Take y < F.inv(u), where F.inv computes the inverse of the cdf of the distribution we want. We can prove that the cdf of the random values produced in this way is exactly F. 347 Therefore,
P (Y ≤ y )
0 u P (U ≤ u) = 1 u<0 0≤u≤1 u>1 = = = = P (F −1 (U ) ≤ y ) P (F (F −1 (U )) ≤ F (y )) P (U ≤ F (y ) F (y ) We used the fact that F is nondecreasing in the second line. (Why must F be nondecreasing?)
348 We need to: 1. Find the cdf 2. Find the inverse cdf 3. Write a function to carry out the inverse cdf method. Density 0.0 0.0 0.2 0.4 0.6 0.8 Example: Triangle distribution with endpoints at a and b and center at c. 2(x−a) (b−a)(c−a) a ≤ x < c 2(b−x) f (x) = c≤x≤b (b−a)(b−c) 0 otherwise
1.0 0.5 1.0 x 1.5 2.0 349 Using the fact that the total area is one, and that the area of a triangle is 1/2 base x height, we ﬁnd that x<0 0 (x−a)2 0≤x<c (b−a)(c−a) F (x) = 2 1 − (b−x) c≤x≤b (b−a)(b−c) 1 b<x Inverting this function, we have
F −1 (x) = y (b − a)(c − a) + a b − (1 − y )(b − a)(b − c)
−a 0 ≤ y < c−a b c−a b−a ≤ y ≤ 1 Now we can write our function. 350 Suppose we want to sample from a distribution with a pdf proportional to f (x). (This comes up a lot in Bayesian statistics.) Suppose also that we can 1. sample from the distribution with pdf g (x). 2. evaluate f (x). 3. ﬁnd M > 0 such that f (x) ≤ M g (x) ∀x.
M g (x) “envelope function” if g is uniform
f (x) x
351 Then to carry out rejection sampling, we 1. Draw x ∼ g (x) 2. Accept x with probability f (x)/(M g (x)) 3. Repeat 1 and 2 until we have a large enough sample. In practice, to do 2 we can draw a uniform random variable u and accept x if u ≤ f (x)/(M g (x)).
M g (x) f (x)
x
352 Note that the closer M g (x) is tof (x), the higher probability there is of acceptance. This leads to a more efﬁcient sampling algorithm. However, it can often be difﬁcult to “tailor” g(x) to f(x). f (x)
x M g (x) 353 Rejection sampling, like the inverse cdf method, gives an exact sample from the distribution. However, in cases for which either works, the inverse cdf method is generally faster, since it doesn’t discard samples. The reason you would use rejection sampling instead of the inverse cdf method is if you can’t calculate the inverse of the cdf in closed form. 354 Example: Truncated normal distribution
f (x; µ, σ 2 , a, b) ∝ f (x; µ, σ 2 )I {a ≤ x ≤ b} Normal pdf; if we use this as the envelope, M=1 a b
355 Recall the basic Monte Carlo approximation:
E [g (X )] = ≈
∞ g (x)f (x)dx B 1 g (x(i) ), B i=1 −∞ where x(1) , . . . , x(B ) make up an independent sample from the distribution with pdf f. What if we can’t sample from f directly? Rejection sampling requires us to ﬁnd an envelope function. Importance sampling doesn’t have this restriction.
356 Suppose we can sample from some other distribution with density function h (called the “importance function”) deﬁned over the same range as f. Then ∞ E [g (X )] = g (x)f (x)dx −∞ ∞ f (x) each sample = g (x) h(x)dx h(x) −∞ gets a weight
≈
B 1 f (x(i) ) g (x(i) ) , ( i) ) B i=1 h(x where x(1) , . . . , x(B ) make up an independent sample from the distribution with pdf h.
357 Illustration
f (x) h(x) x
f/h 0.8 0.2
0 1.4 10 20 30 Samples 40 50 60 70 358 A very similar idea is sampling importance resampling (SIR), sometimes also called the weighted bootstrap. Once again, we sample x(1) , . . . , x(B ) from h.
f (x(i) ) Then calculate w(i) = for i = 1, . . . , B ˜ ( i) ) h(x w ( i) ˜ ( i) and normalize the weights: w = B w ( i) i=1 ˜ Now sample B times (with replacement) from the discrete distribution that puts P (X = x(i) ) = w(i) This gives an approximate sample from the distribution with pdf f.
359 xsamp < rh(1000) # Sample from importance function weights < f(xsamp)/h(xsamp) weights < weights/sum(weights) final < sample(xsamp, size = 1000, replace = TRUE, prob = weights) hist(final, prob = TRUE, xlab = "x", ylab = "f(x)", main = "") lines(xseq, f(xseq), col = 2)
0.030 f(x) 0.000
0 0.010 0.020 20 x 40 60 360 ...
View
Full
Document
This note was uploaded on 05/04/2011 for the course BUS 103 taught by Professor Opp during the Fall '10 term at Berkeley.
 Fall '10
 OPP
 Finance

Click to edit the document details