Lecture 20 Notes

# 1 35 14 02 iris setosa 56 30 45 15 iris versicolor 49

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: bot Model = map of the world Example The “botanist learning problem” ‣ Experience = physical measurements of surveyed specimens &amp; expert judgements of their true species ‣ Model = factor graph relating species to measurements Sample data sepal sepal petal petal length width length width species 5.1 3.5 1.4 0.2 Iris setosa 5.6 3.0 4.5 1.5 Iris versicolor 4.9 3.0 1.4 0.2 Iris setosa 2.1 Iris virginica 1.0 Iris versicolor 6.4 5.8 2.8 2.7 5.6 4.1 Factor graph ϕ0 ϕ1 ϕ2 ϕ3 One of many possible factor graphs Values of Φs not shown, but part of model ϕ4 Factor graph ϕ0 ϕ1 ϕ3 ϕ2 ϕ1 ϕ3 ϕ2 ϕ1 X1 ϕ2 X3 X4 ϕ4 ϕ0 ϕ3 ϕ2 ϕ1 X2 ϕ0 ϕ4 ϕ0 ϕ3 ϕ4 ϕ4 … Factor graph ϕ0 ϕ1 ϕ3 ϕ2 ϕ1 ϕ3 ϕ2 ϕ1 X1 ϕ4 ϕ0 ϕ3 ϕ2 ϕ1 X2 ϕ0 ϕ2 ϕ4 ϕ0 ϕ3 ϕ4 ϕ4 X3 X4 Φ1 params … … In general For our purposes, a model M is exactly a distribution P(X | M) over possible samples When is M better than M’? When P(X | M) is more accurate than P(X | M’). Bayes rule encodes this: from prior P(M) and evidence X, compute posterior P(M | X) ‣ P(M | X) = P(X | M) P(M) / P(X) ‣ better predictions (higher P(X | M)) yield higher posterior Conditional model Split variables into (X, Y) Suppose we always observe X Two ways P(X, Y) and P’(X, Y) can differ: ‣ P(X) &quot; P’(X), and/or ‣ P(Y | X) &quot; P’(Y | X) First way doesn’t matter for decisions Conditional model: only speciﬁes P(Y | X, M) Conditional model example Experience = samples of (X, Y) X = features of object Y = whether object is a “framling” Model = rule for deciding whether a new object is a framling Sample data &amp; possible model tall pointy blue framling T T F T T F F T F T F F T T T F T F F T H = tall ∧ ¬blue Hypothesis space Hypothesis space H = set of models we are willing to consider ‣ for philosophical or computational reasons E.g., all factor graphs of a given structure Or, all conjunctions of up to two literals Prior is a distribution over H A simple learning algorithm Conditional learning: samples (xi, yi) Let H be a set of propositional formulae ‣ H = { H1, H2, … } H is consistent if H(xi) = yi for all i Version space V = { all consistent H } ⊆ H Version space algorithm: predict y = majority vote of H(x) over all H ∈ V Framlings tall pointy blue framling T T F T T T F T T F F F F T F T T F F T H = { conjunctions of up to 2 literals } = { T, F, tall, pointy, blue, ¬tall, ¬pointy, ¬blue, tall ∧ pointy, tall ∧ blue, pointy ∧ blue, ¬tall ∧ pointy, … } tall Framlings pointy blue framling T T F T T T F T T F F F F T F T T F F T Analysis Mistake = make wrong prediction If some H ∈ H is always right, eventually we’ll eliminate all competitors, and make no more mistakes If no H ∈ H is always right, eventually V will become empty ‣ e.g., if label noise or feature noise Analysis Suppose | H | = N How many mistakes could we make? Analysis Suppose | H | = N How many mistakes could we make? Since we predict w/ majority of V, after any mistake, we eliminate half (or more) of V Can’t do that more than log2(N) times Discussion In example, N = 20, log2(N) = 4.32 Made only 2 mistakes Mistake bound: limits wrong decisions, as desired But, required strong assumptions (no noise, true H contained in H ) Could be very slow!...
View Full Document

## This note was uploaded on 01/24/2014 for the course CS 15-780 taught by Professor Bryant during the Fall '09 term at Carnegie Mellon.

Ask a homework question - tutors are online