ch8 - Monte Carlo Methods with R Monitoring Convergence[1...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Monte Carlo Methods with R: Monitoring Convergence [1] Chapter 8: Monitoring Convergence of MCMC Algorithms “Why does he insist that we must have a diagnosis? Some things are not meant to be known by man.” Susanna Gregory An Unholy Alliance This Chapter ◮ We look at different diagnostics to check the convergence of an MCMC algorithm ◮ To answer to question: “When do we stop our MCMC algorithm?” ◮ We distinguish between two separate notions of convergence: ⊲ Convergence to stationarity ⊲ Convergence of ergodic averages ◮ We also discuss some convergence diagnostics contained in the coda package Monte Carlo Methods with R: Monitoring Convergence [2] Monitoring Convergence Introduction ◮ The MCMC algorithms that we have seen ⊲ Are convergent because the chains they produce are ergodic. ◮ Although this is a necessary theoretical validation of the MCMC algorithms ⊲ It is insufficient from the implementation viewpoint ◮ Theoretical guarantees do not tell us ⊲ When to stop these algorithms and produce our estimates with confidence. ◮ In practice, this is nearly impossible ◮ Several runs of your program are usually required until ⊲ You are satisfied with the outcome ⊲ You run out of time and/or patience Monte Carlo Methods with R: Monitoring Convergence [3] Monitoring Convergence Monitoring What and Why ◮ There are three types of convergence for which assessment may be necessary. ◮ Convergence to the stationary distribution ◮ Convergence of Averages ◮ Approximating iid Sampling Monte Carlo Methods with R: Monitoring Convergence [4] Monitoring Convergence Convergence to the Stationary Distribution ◮ First requirement for convergence of an MCMC algorithm ⊲ (x(t)) ∼ f , the stationary distribution ⊲ This sounds like a minimal requirement ◮ Assessing that x(t) ∼ f is difficult with only a single realization ◮ A slightly less ambitious goal: Assess the independence from the starting point x(0) based on several realizations of the chain using the same transition kernel. ◮ When running an MCMC algorithm, the important issues are ⊲ The speed of exploration of the support of f ⊲ The degree of correlation between the x(t)’s Monte Carlo Methods with R: Monitoring Convergence [5] Monitoring Convergence Tools for AssessingConvergence to the Stationary Distribution ◮ A major tool for assessing convergence: Compare performance of several chains ◮ This means that the slower chain in the group governs the convergence diagnostic ◮ Multiprocessor machines is an incentive for running replicate parallel chains ⊲ Can check for the convergence by using several chains at once ⊲ May not be much more costly than using a single chain ◮ Looking at a single path of the Markov chain produced by an MCMC algorithm makes it difficult to assess convergence ◮ MCMC algorithms suffer from the major defect that ⊲ “you’ve only seen where you’ve been” ◮ The support of f that has not yet been visited is almost impossible to detect. Monte Carlo Methods with R: Monitoring Convergence [6] Monitoring Convergence Convergence of Averages ◮ A more important convergence issue is convergence of the empirical average 1 T T t=1 h(x(t)) → BEf [h(X )] ◮ Two features that distinguish stationary MCMC outcomes from iid ones ⊲ The probabilistic dependence in the sample ⊲ The mixing behavior of the transition, ⊲ That is, how fast the chain explores the support of f ◮ “Stuck in a mode” might appear to be stationarity ⊲ The missing mass problem again ◮ Also: The CLT might not be available Monte Carlo Methods with R: Monitoring Convergence [7] Monitoring Convergence Approximating iid sampling ◮ Ideally, the approximation to f provided by MCMC algorithms should ⊲ Extend to the (approximate) production of iid samples from f . ◮ A practical solution to this issue is to use subsampling (or batch sampling) ⊲ Reduces correlation between the successive points of the Markov chain. ◮ Subsampling illustrates this general feature but it loses in efficiency ◮ Compare two estimators ⊲ δ1: Uses all of the Markov chain ⊲ δ2: Uses subsamples ◮ It can be shown that var(δ1) ≤ var(δ2) Monte Carlo Methods with R: Monitoring Convergence [8] Monitoring Convergence The coda package ◮ Plummer et al. have written an R package called coda ◮ Contains many of the tools we will be discussing in this chapter ◮ Download and install with library(coda) ◮ Transform an MCMC output made of a vector or a matrix into an MCMC object that can be processed by coda, as in > summary(mcmc(X)) or > plot(mcmc(X)) Monte Carlo Methods with R: Monitoring Convergence [9] Monitoring Convergence to Stationarity Graphical Diagnoses ◮ A first approach to convergence control ⊲ Draw pictures of the output of simulated chains ◮ Componentwise as well as jointly ⊲ In order to detect deviant or nonstationary behaviors ◮ coda provides this crude analysis via the plot command ◮ When applied to an mcmc object ⊲ Produces a trace of the chain across iterations ⊲ And a non-parametric estimate of its density, parameter by parameter Monte Carlo Methods with R: Monitoring Convergence [10] Monitoring Convergence to Stationarity Graphical Diagnoses for a Logistic Random Effect Model Example: Random effect logit model ◮ Observations yij are modeled conditionally on one covariate xij as P (yij = 1|xij , ui, β ) = exp {βxij + ui} , i = 1, . . . , n, j = 1, . . . , m 1 + exp {βxij + ui} ⊲ ui ∼ N (0, σ 2) is an unobserved random effect ⊲ This is missing data ◮ We fit this with a Random Walk Metropolis–Hastings algorithm. Monte Carlo Methods with R: Monitoring Convergence [11] Monitoring Convergence to Stationarity Fitting a Logistic Random Effect Model ◮ The complete data likelihood is ij exp {βxij + ui} 1 + exp {βxij + ui} yij 1 1 + exp {βxij + ui} ◮ This is the target in our Metropolis–Hastings algorithm (t) (t−1) ⊲ Simulate random effects ui ∼ N (ui , σ 2) ⊲ Simulate the logit coefficient β (t) ∼ N (β (t−1), τ 2) ⊲ Specify σ 2 and τ 2 ◮ σ 2 and τ 2 affect mixing 1−yij Monte Carlo Methods with R: Monitoring Convergence [12] Monitoring Convergence to Stationarity ACF and Coda ◮ Trace and acf: ◮ R code ◮ Coda Monte Carlo Methods with R: Monitoring Convergence [13] Tests of Stationarity Nonparametric Tests: Kolmogorov-Smirnov ◮ Other than a graphical check, we can try to test for independence ◮ Standard non-parametric tests of fit, such as Kolmogorov–Smirnov ⊲ Apply to a single chain to compare the distributions of the two halves ⊲ Also can apply to parallel chains ◮ There needs to be a correction for the Markov correlation ⊲ The correction can be achieved by introducing a batch size ◮ We use 1 sup K= Mη M M (gG) I(0,η)(x1 g =1 ⊲ With G = batch size, M = sample size )− (gG) I(0,η)(x2 g =1 ) Monte Carlo Methods with R: Monitoring Convergence [14] Tests of Stationarity Kolmogorov-Smirnov for the Pump Failure Data Example: Poisson Hierarchical Model ◮ Consider again the nuclear pump failures ◮ We monitor the subchain (β (t)) produced by the algorithm ⊲ We monitor one chain split into two halves ⊲ We also monitor two parallel chains ◮ Use R command ks.test ◮ We will see (next slide) that the results are not clear Monte Carlo Methods with R: Monitoring Convergence [15] Monitoring Convergence Kolmogorov-Smirnov p-values for the Pump Failure Data ◮ Upper=split chain; Lower = Parallel chains; L → R: Batch size 10, 100, 200. ◮ Seems too variable to be of little use ◮ This is a good chain! (fast mixing, low autocorrelation) Monte Carlo Methods with R: Monitoring Convergence [16] Monitoring Convergence Tests Based on Spectral Analysis ◮ There are convergence assessments spectral or Fourier analysis ◮ One is due to Geweke ⊲ Constructs the equivalent of a t test ⊲ Assess equality of means of the first and last parts of the Markov chain. ◮ The test statistic is √ T (δ A − δ B ) 2 2 σA σB + , τA τB ⊲ δA and δB are the means from the first and last parts 2 2 ⊲ σA and σB are the spectral variance estimates ◮ Implement with geweke.diag and geweke.plot Monte Carlo Methods with R: Monitoring Convergence [17] Monitoring Convergence Geweke Diagnostics for Pump Failure Data ◮ For λ1 ⊲ t-statistic = 1.273 ⊲ Plot discards successive beginning segments ⊲ Last z -score only uses last half of chain ◮ Heidelberger and Welch have a similar test: heidel.diag Monte Carlo Methods with R: Monitoring Convergence [18] Monitoring Convergence of Averages Plotting the Estimator ◮ The initial and most natural diagnostic is to plot the evolution of the estimator ◮ If the curve of the cumulated averages has not stabilized after T iterations ⊲ The length of the Markov chain must be increased. ◮ The principle can be applied to multiple chains as well. ⊲ Can use cumsum, plot(mcmc(coda)), and cumuplot(coda) ◮ For λ1 from Pump failures ◮ cumuplot of second half Monte Carlo Methods with R: Monitoring Convergence [19] Monitoring Convergence of Averages Trace Plots and Density Estimates ◮ plot(mcmc(lambda)) produces two graphs ◮ Trace Plot ◮ Density Estimate ◮ Note: To get second half of chain temp=lambda[2500:5000], plot(mcmc(temp)) Monte Carlo Methods with R: Monitoring Convergence [20] Monitoring Convergence of Averages Multiple Estimates ◮ Can use several convergent estimators of Ef [h(θ)] based on the same chain ⊲ Monitor until all estimators coincide ◮ Recall Poisson Count Data ⊲ Two Estimators of Lambda: Empirical Average and RB ⊲ Convergence Diagnostic → Both estimators converge - 50,000 Iterations Monte Carlo Methods with R: Monitoring Convergence [21] Monitoring Convergence of Averages Computing Multiple Estimates ◮ Start with a Gibbs sampler θ|η and η |θ ◮ Typical estimates of h(θ) ⊲ The empirical average ST = 1 T T (t) t=1 h(θ ) C ⊲ The Rao–Blackwellized version ST = P ⊲ Importance sampling: ST = ⊲ wt ∝ f (θ(t))/gt(θ(t)) ⊲ f = target, g = candidate T t=1 1 T T t=1 wt h(θ(t)), E [ h (θ )| η ( t ) ] , Monte Carlo Methods with R: Monitoring Convergence [22] Monitoring Convergence of Multiple Estimates Cauchy Posterior Simulation ◮ The hierarchical model Xi ∼ Cauchy(θ), θ ∼ N (0, σ 2) i = 1, . . . , 3 ◮ Has posterior distribution π (θ | x 1 , x 2 , x 3 ) ∝ e − θ 2 /2σ 2 3 i=1 1 (1 + (θ − xi)2) ◮ We can use a Completion Gibbs sampler 1 + (θ − xi)2 ηi|θ, xi ∼ E xp 2 θ | x 1 , x 2 , x 3 , η 1 , η2 , η3 ∼ N , 1 η1 x 1 + η2 x 2 + η3 x 3 , η1 + η2 + η 3 + σ − 2 η1 + η2 + η3 + σ − 2 , Monte Carlo Methods with R: Monitoring Convergence [23] Monitoring Convergence of Multiple Estimates Completion Gibbs Sampler ◮ The Gibbs sampler is based on the latent variables ηi, where 1 2 e− 2 ηi(1+(xi−θ) )dηi = ◮ With ηi ∼ Exponential ◮ Monitor with three estimates of θ ⊲ Empirical Average ⊲ Rao-Blackwellized ⊲ Importance sample 2 1 + (x i − θ )2 1 (1 + (xi − θ)2) 2 Monte Carlo Methods with R: Monitoring Convergence [24] Monitoring Convergence of Multiple Estimates Calculating the Estimates ◮ Empirical Average 1 M M ˆ θ (j ) j =1 ◮ Rao-Blackwellized θ | η1 , η 2 , η 3 ∼ N i ηi x i 1 σ2 + i ηi , ◮ Importance sampling with Cauchy candidate 1 + 2 σ −1 ηi i Monte Carlo Methods with R: Monitoring Convergence [25] Monitoring Convergence of Multiple Estimates Monitoring the Estimates ◮ Emp. Avg ◮ RB ◮ IS ⊲ Estimates converged ⊲ IS seems most stable ◮ When applicable, superior diagnostic to single chain ◮ Intrinsically conservative ⊲ Speed of convergence determined by slowest estimate Monte Carlo Methods with R: Monitoring Convergence [26] Monitoring Convergence of Averages Between and Within Variances ◮ The Gelman-Rubin diagnostic uses multiple chains ◮ Based on a between-within variance comparison (anova-like) ⊲ Implemented in coda as gelman.diag(coda) and gelman.plot(coda). (t) (t) ◮ For m chains {θ1 }, . . . {θm } 1 ⊲ The between-chain variance is BT = M −1 ⊲ The within-chain variance is WT = 1 M −1 M m=1 (θ m − θ )2 , M 1 m=1 T −1 T t=1 (t) (θ m − θ m )2 ◮ If the chains have converged, these variances are the same (anova null hypothesis) Monte Carlo Methods with R: Monitoring Convergence [27] Monitoring Convergence of Averages Gelman-Rubin Statistic ◮ BT and WT are combined into an F -like statistic ◮ The shrink factor, defined by 2 RT = ⊲ σT = ˆ2 T −1 T σT ˆ2 BT + M νT + 1 , WT νT + 3 WT + B T . ⊲ F -distribution approximation ◮ Enjoyed wide use because of simplicity and intuitive connections with anova ◮ RT does converge to 1 under stationarity, ◮ However, its distributional approximation relies on normality ◮ These approximations are at best difficult to satisfy and at worst not valid. Monte Carlo Methods with R: Monitoring Convergence [28] Monitoring Convergence of Averages Gelman Plot for Pump Failures ◮ Three chains for λ1 ◮ nsim=1000 ◮ Suggests convergence ◮ gelman.diag gives Point est. 97.5 % quantile 1.00 1.01 ◮ R code ...
View Full Document

{[ snackBarMessage ]}

Ask a homework question - tutors are online