The parameters of the prior distribution and in the

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ion) p(y |θ' )p(θ' )dθ' 0 θmH +α−1 (1 − θ)m−mH +β −1 =1 . 'mH +α−1 (1 − θ ' )m−mH +β −1 dθ ' 0θ p(y |θ)p(θ) We can recognize the denominator as a Beta function from the definition in (8): p(θ|y ) = 1 θmH +α−1 (1 − θ)m−mH +β −1 , B (mH + α, m − mH + β ) and we recognize this as being a Beta distribution: p(θ|y ) ∼ Beta (mH + α, m − mH + β ) . 10 (14) As with the MAP estimate, we can see the interplay between the data yi and the prior parameters α and β in forming the posterior. As before, the exact choice of α and β does not matter asymptotically, the data overwhelm the prior. If we knew that the Beta distribution is the conjugate prior to the Bernoulli, we could have figured out the same thing faster by recognizing that p(θ|y ) ∝ p(y |θ)p(θ) = θmH +α−1 (1 − θ)m−mH +β −1 , (15) and then realizing that by the definition of a conjugate prior, the posterior must be a Beta distribution. There is exactly one Beta distribution that satisfies (15): the one that is normalized correctly, and that is equation (14). The parameters of the prior distribution (α and β in the case of the Beta prior) are called prior hyperparameters. We choose them to best represent our beliefs about the distribution of θ. The parameters of the posterior dis­ tribution (mH + α and m − mH + β ) are called posterior hyperparameters. Any time a l...
View Full Document

Ask a homework question - tutors are online