This preview shows page 1. Sign up to view the full content.
Unformatted text preview: t maximizes the likelihood function, p(y θ), we ﬁnd θ that
maximizes the posterior, p(θy ). The distinction is between the θ under which
the data are most likely, and the most likely θ given the data.
We don’t have to worry about evaluating the partition function p(y θ' )p(θ' )dθ'
because it is constant with respect to θ. Again it is generally more convenient
to work with the logarithm.
p(y θ)p(θ)
θ
θ
p(y θ' )p(θ' )dθ'
= arg max p(y θ)p(θ) = arg max (log p(y θ) + log p(θ)) . ˆ
θMAP ∈ arg max p(θy ) = arg max
θ θ (6) When the prior is uniform, the MAP estimate is identical to the ML estimate
because the log p(θ) is constant.
One might ask what would be a bad choice for a prior. We will see later that
reasonable choices of the prior are those that do not assign zero probability to
the true value of θ. If we have such a prior, the MAP estimate is consistent,
which we will discuss in more detail later. Some other properties of the MAP
estimate are illustrated in the next example.
Coin Flip Example Part 3. We again return to the coin ﬂip example.
Suppo...
View
Full
Document
 Spring '12
 CynthiaRudin

Click to edit the document details