This preview shows page 1. Sign up to view the full content.
Unformatted text preview: ion)
p(y |θ' )p(θ' )dθ'
θmH +α−1 (1 − θ)m−mH +β −1
'mH +α−1 (1 − θ ' )m−mH +β −1 dθ '
p(y |θ)p(θ) We can recognize the denominator as a Beta function from the deﬁnition in
p(θ|y ) = 1
θmH +α−1 (1 − θ)m−mH +β −1 ,
B (mH + α, m − mH + β ) and we recognize this as being a Beta distribution:
p(θ|y ) ∼ Beta (mH + α, m − mH + β ) .
10 (14) As with the MAP estimate, we can see the interplay between the data yi and
the prior parameters α and β in forming the posterior. As before, the exact
choice of α and β does not matter asymptotically, the data overwhelm the
If we knew that the Beta distribution is the conjugate prior to the Bernoulli,
we could have ﬁgured out the same thing faster by recognizing that
p(θ|y ) ∝ p(y |θ)p(θ) = θmH +α−1 (1 − θ)m−mH +β −1 , (15) and then realizing that by the deﬁnition of a conjugate prior, the posterior
must be a Beta distribution. There is exactly one Beta distribution that
satisﬁes (15): the one that is normalized correctly, and that is equation (14).
The parameters of the prior distribution (α and β in the case of the Beta
prior) are called prior hyperparameters. We choose them to best represent
our beliefs about the distribution of θ. The parameters of the posterior dis
tribution (mH + α and m − mH + β ) are called posterior hyperparameters.
Any time a l...
View Full Document
- Spring '12