{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

SP10 cs188 lecture 12 -- reinforcement learning II (2PP)

# SP10 cs188 lecture 12 -- reinforcement learning II (2PP) -...

This preview shows pages 1–4. Sign up to view the full content.

2/25/2010 1 CS 188: Artificial Intelligence Spring 2010 Lecture 12: Reinforcement Learning II 2/25/2010 Pieter Abbeel – UC Berkeley Many slides over the course adapted from either Dan Klein, Stuart Russell or Andrew Moore 1 Announcements square4 W3 Utilities: due tonight square4 P3 Reinforcement Learning (RL): square4 Out tonight, due Thursday next week square4 You will get to apply RL to: square4 Gridworld agent square4 Crawler square4 Pac-man 2

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
2/25/2010 2 Reinforcement Learning square4 Still assume a Markov decision process (MDP): square4 A set of states s S square4 A set of actions (per state) A square4 A model T(s,a,s’) square4 A reward function R(s,a,s’) square4 Still looking for a policy π (s) square4 New twist: don’t know T or R square4 I.e. don’t know which states are good or what the actions do square4 Must actually try actions and states out to learn 3 The Story So Far: MDPs and RL square4 If we know the MDP square4 Compute V*, Q*, π * exactly square4 Evaluate a fixed policy π square4 If we don’t know the MDP square4 We can estimate the MDP then solve square4 We can estimate V for a fixed policy π square4 We can estimate Q*(s,a) for the optimal policy while executing an exploration policy 4 square4 Model-based DPs square4 Value and policy Iteration square4 Policy evaluation square4 Model-based RL square4 Model-free RL: square4 Value learning square4 Q-learning Things we know how to do: Techniques:
2/25/2010 3 Problems with TD Value Learning square4 TD value leaning is a model-free way to do policy evaluation square4 However, if we want to turn values into a (new) policy, we’re sunk: square4 Idea: learn Q-values directly square4 Makes action selection model-free too!

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}