Course: STA 705, Fall 2008
<a href="/keyword/markov-chain-monte-carlo/" >markov <a href="/keyword/chain-monte-carlo/" ><a href="/keyword/chain-monte/" >chain monte</a> carlo</a> </a> STA705 Fall 2003 1 Introduction Recall we have been placing computer algorithms into two camps, those based on maximizing a function f (x), aimed toward computing maximum likelihood estimators and posterior modes, those based on computing an expectation EF [h(X)]. Recall one common set of methods for computing EF [h(X)] was to compute an iid sample X1 , . . . , Xn F and then use the sample average hn (X) = (1/n) h(Xi ) as an estimator of EF [h(X)]. Rejection sampling was one method for drawing the iid sample. Importance sampling was based on a similar idea, but instead the sample was drawn from a separate distribution G and a weighted average was used. <a href="/keyword/markov-chain-monte-carlo/" >markov <a href="/keyword/chain-monte-carlo/" ><a href="/keyword/chain-monte/" >chain monte</a> carlo</a> </a> is another method for sampling from F . Like importance sampling, there is a twist. In rejection sampling there is an iid sample from F and you can use straightforward central limit theorem results. In importance sampling, the sample is independent from another distribution G and requires a weighted average to produce reasonable estimates. In <a href="/keyword/markov-chain-monte-carlo/" >markov <a href="/keyword/chain-monte-carlo/" ><a href="/keyword/chain-monte/" >chain monte</a> carlo</a> </a> , one constructs a Markov Chain of random variables X0 , X1 , . . . that has equilibrium distribution F . Thus, the resulting Xi values are only approximated distributed according to F , and furthermore they are dependent. Nonetheless, the values of the chain may be used to compute estimates of EF [h(X)], complete with variance estimates. <a href="/keyword/markov-chain-monte-carlo/" >markov <a href="/keyword/chain-monte-carlo/" ><a href="/keyword/chain-monte/" >chain monte</a> carlo</a> </a> (MCMC) is best suited for high dimensional multivariate distributions where it is di cult to nd rejection samplers with high acceptance probabilities or decent approximate distributions for importance sampling. 2 How do you construct a Markov Chain with a speci c equilibrium distribution? First, let s suppose that F is a discrete distribution on {1, 2, 3}. Let f1 , f2 , and f3 be the probabilities of each outcome. We would like to construct a Markov Chain X0 , X1 , . . . with equilibrium distribution F . How do we do this? We have a discrete time, discrete state space Markov Chain, which can be described by a transition matrix P . Our state space has 3 elements, so P is a 3 by 3 matrix p11 p12 p13 P = p21 p22 p23 p31 p32 p33 (1) where each row of the matrix sums to 1, thus pi1 +pi2 +pi3 must sum to 1 for each i. Speci cally, each row of P must form a distribution over the state space. Recall that pij = P r(Xt+1 = j|Xt = i, A), 1 where A is any event describing X0 , . . . , Xt 1 . Thus, only the current observation matters in determining the next observation in the chain. Assuming the Markov Chain is ergodic, it will have an equilibrium distribution. We would like that equilibrium distribution to be F , which means F must satisfy the equation (f1 , f2 , f3 ) = (f1 , f2 , f3 )P Equivalently, fk = i pik fi for each k. <a href="/keyword/markov-chain-monte-carlo/" >markov <a href="/keyword/chain-monte-carlo/" ><a href="/keyword/chain-monte/" >chain monte</a> carlo</a> </a> speci es that candidate values may be generated according to the following scheme. For each member of the state space, let QX be a distribution over the state space. If the state space has 3 elements, there are 3 di erent Q distributions Q1 , Q2 , and Q3 . Choose a starting value for X0 , and generate Xt+1 from the current value Xt by 1. Generate Y QXt (Y is called the candidate value ) 2. Let Xt+1 = Y (referred to as accepting Y ) with probability min 1, fy qy (xt ) fxt qxt (y) otherwise let Xt+1 = Xt (referred to as rejecting Y ) Note a key di ere...

