This preview shows page 1. Sign up to view the full content.
Unformatted text preview: bot
Model = map of the world Example
The “botanist learning problem”
‣ Experience = physical measurements of
surveyed specimens & expert judgements of
their true species
‣ Model = factor graph relating species to
measurements Sample data
sepal sepal petal petal
length width length width species
5.1 3.5 1.4 0.2 Iris setosa 5.6 3.0 4.5 1.5 Iris
versicolor 4.9 3.0 1.4 0.2 Iris setosa 2.1 Iris
virginica 1.0 Iris
versicolor 6.4
5.8 2.8
2.7 5.6
4.1 Factor graph
ϕ0
ϕ1
ϕ2 ϕ3 One of many possible factor graphs
Values of Φs not shown, but part of model ϕ4 Factor graph
ϕ0
ϕ1 ϕ3 ϕ2
ϕ1 ϕ3 ϕ2
ϕ1 X1 ϕ2 X3
X4 ϕ4
ϕ0 ϕ3 ϕ2
ϕ1 X2 ϕ0 ϕ4
ϕ0 ϕ3 ϕ4
ϕ4 … Factor graph
ϕ0
ϕ1 ϕ3 ϕ2
ϕ1 ϕ3 ϕ2
ϕ1 X1 ϕ4
ϕ0 ϕ3 ϕ2
ϕ1 X2 ϕ0 ϕ2 ϕ4
ϕ0 ϕ3 ϕ4
ϕ4 X3
X4
Φ1 params … … In general
For our purposes, a model M is exactly a
distribution P(X  M) over possible samples
When is M better than M’? When P(X  M)
is more accurate than P(X  M’).
Bayes rule encodes this: from prior P(M) and
evidence X, compute posterior P(M  X)
‣ P(M  X) = P(X  M) P(M) / P(X)
‣ better predictions (higher P(X  M)) yield
higher posterior Conditional model
Split variables into (X, Y)
Suppose we always observe X
Two ways P(X, Y) and P’(X, Y) can differ:
‣ P(X) " P’(X), and/or
‣ P(Y  X) " P’(Y  X)
First way doesn’t matter for decisions
Conditional model: only speciﬁes P(Y  X, M) Conditional model example
Experience = samples of (X, Y)
X = features of object
Y = whether object is a “framling”
Model = rule for deciding whether a new
object is a framling Sample data & possible model
tall pointy blue framling T T F T T F F T F T F F T T T F T F F T H = tall ∧ ¬blue Hypothesis space
Hypothesis space H = set of models we are
willing to consider
‣ for philosophical or computational reasons
E.g., all factor graphs of a given structure
Or, all conjunctions of up to two literals
Prior is a distribution over H A simple learning algorithm
Conditional learning: samples (xi, yi)
Let H be a set of propositional formulae
‣ H = { H1, H2, … }
H is consistent if H(xi) = yi for all i
Version space V = { all consistent H } ⊆ H
Version space algorithm: predict y = majority
vote of H(x) over all H ∈ V Framlings
tall pointy blue framling T
T
F
T
T T
F
T
T
F F
F
F
T
F T
T
F
F
T H = { conjunctions of up to 2 literals } = { T, F, tall,
pointy, blue, ¬tall, ¬pointy, ¬blue, tall ∧ pointy, tall ∧
blue, pointy ∧ blue, ¬tall ∧ pointy, … } tall Framlings pointy blue framling T
T
F
T
T T
F
T
T
F F
F
F
T
F T
T
F
F
T Analysis
Mistake = make wrong prediction
If some H ∈ H is always right, eventually we’ll
eliminate all competitors, and make no more
mistakes
If no H ∈ H is always right, eventually V will
become empty
‣ e.g., if label noise or feature noise Analysis
Suppose  H  = N
How many mistakes could we make? Analysis
Suppose  H  = N
How many mistakes could we make?
Since we predict w/ majority of V, after any
mistake, we eliminate half (or more) of V
Can’t do that more than log2(N) times Discussion
In example, N = 20, log2(N) = 4.32
Made only 2 mistakes
Mistake bound: limits wrong decisions, as
desired
But, required strong assumptions (no noise,
true H contained in H )
Could be very slow!...
View
Full
Document
This note was uploaded on 01/24/2014 for the course CS 15780 taught by Professor Bryant during the Fall '09 term at Carnegie Mellon.
 Fall '09
 Bryant

Click to edit the document details