MIT15_097S12_lec15

Why by 20 with probability 1 as m m m i1 log pyi

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: apply Jensen’s inequality: � − 1y∼q(y)[f (X )] ≥ f (1y∼q(�)[X ]) y q (y ) log p(y ) dy ≥ − log q (y ) q (y ) p(y ) dy q (y ) � ≥ − log p(y )dy ≥ − log 1 = 0 so, � q (y ) log q (y ) dy ≥ 0, p(y ) with equality under the same conditions required for equality in Jensen’s inequality: if and only if X is constant, that is, q (y ) = p(y ) ∀y . 16 We will use the KL divergence to ﬁnd the distribution from the likelihood family that is ‘closest’ to the true generating distribution: θ∗ = arg min D(q (·)||p(·|θ)). θ∈Θ (18) For convenience, we will suppose that the arg min in (18) is unique. The results can be easily extended to the case where the arg min is not unique. The main results of this section are two theorems, the ﬁrst for discrete parameter spaces and the second for continuous parameter spaces. The intuition for the theorem is that as long as there is some probability in the prior that θ = θ∗ , then as m → ∞ the whole posterior will be close to θ∗ . Theorem 3 (Posterior consistency in ﬁnite parameter space). Let F be...
View Full Document

This note was uploaded on 03/24/2014 for the course MIT 15.097 taught by Professor Cynthiarudin during the Spring '12 term at MIT.

Ask a homework question - tutors are online