Lecture 21 Notes

b a predictive

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: virginica) petal length P (y | x) = σ (ax + b) σ (z ) = 1/(1 + exp(−z )) Posterior P (a, b | xi , yi ) = ￿ ZP (a, b) σ (axi + b)yi σ (−axi − b)1−yi i P (a, b) = N (0, I ) Sample from posterior !' !* !# b !) !$ !( !% !"# !"$ !"% & a &"' &"# &"$ &"% Predictive distribution For each θ in sample, predict P(X) or P(Y | X) Average predictions over all θ in sample Cheaper approximations Getting cheaper Maximum a posteriori (MAP) Maximum likelihood (MLE) Conditional MLE / MAP Instead of true posterior, just use single most probable hypothesis MAP arg max P (D | θ)P (θ) θ Summarize entire posterior density using the maximum MLE arg max P (D | θ) θ Like MAP, but ignore prior term ‣ often prior is overwhelmed if we have enough data Conditional MLE, MAP arg max P (y | x, θ) θ arg max P (y | x, θ)P (θ) θ Split D = (x, y) Condition on x, try to explain only y Iris example: MAP vs. posterior !' !* !# b !) !$ !( !% !"# !"$ !"% & a &"' &"# &"$ &"% Irises: MAP vs. posterior *'( * &') &'$ &'" &'( & !&'( ! " # $ % Too certain This behavior of MAP (or MLE) is typical: we are too sure of ourselves But, often gets better with more data Thm: MAP and MLE are consistent estimates of true θ, if “data per parameter” → " Sequential Decisions M...
View Full Document

Ask a homework question - tutors are online