2
Model-Based Learning
§
Idea:
§
Learn the model empirically through experience
§
Solve for values as if the learned model were correct
§
Simple empirical model learning
§
Count outcomes for each s,a
§
Normalize to give estimate of
T(s,a,s )
§
Discover
R(s,a,s )
when we experience (s,a,s
)
§
Solving the MDP with the learned model
§
Value iteration, or policy iteration
7
π
(s)
s
s,
π
(s)
s,
π
(s),s
s
Example: Learn Model in Model-
Based Learning
§
Episodes:
x
y
T(<3,3>, right, <4,3>) = 1 / 3
T(<2,3>, right, <3,3>) = 2 / 2
+100
-100
γ
= 1
(1,1) up -1
(1,2) up -1
(1,2) up -1
(1,3) right -1
(2,3) right -1
(3,3) right -1
(3,2) up -1
(3,3) right -1
(4,3) exit +100
(done)
(1,1) up -1
(1,2) up -1
(1,3) right -1
(2,3) right -1
(3,3) right -1
(3,2) up -1
(4,2) exit -100
(done)
8
Model-based vs. Model-free
§
Model-based RL
§
First act in MDP and learn T, R
§
Then value iteration or policy iteration with learned T, R
§
Advantage: efficient use of data
§
Disadvantage: requires building a model for T, R
§
Model-free RL
§
Bypass the need to learn T, R
§