Model-Based Learning
Idea:
Learn the model empirically through experience
Solve for values as if the learned model were correct
Simple empirical model learning
Count outcomes for each s,a
Normalize to give estimate of
T(s,a,s )
Discover
R(s,a,s )
when we experience (s,a,s
)
Solving the MDP with the learned model
Value iteration, or policy iteration
Example: Learn Model in Model-
Based Learning
Episodes:
T(<3,3>, right, <4,3>) = 1 / 3
T(<2,3>, right, <3,3>) = 2 / 2
+100
-100
(1,1) up -1
(1,2) up -1
(1,2) up -1
(1,3) right -1
(2,3) right -1
(3,3) right -1
(3,2) up -1
(3,3) right -1
(4,3) exit +100
(done)
(1,1) up -1
(1,2) up -1
(1,3) right -1
(2,3) right -1
(3,3) right -1
(3,2) up -1
(4,2) exit -100
(done)
Model-based vs. Model-free
Model-based RL
First act in MDP and learn T, R
Then value iteration or policy iteration with learned T, R
Advantage: efficient use of data
Disadvantage: requires building a model for T, R
Model-free RL
Bypass the need to learn T, R
