SP10 cs188 lecture 12 -- reinforcement learning II (2PP)

SP10 cs188 lecture 12 -- reinforcement learning II (2PP) -...

Info iconThis preview shows pages 1–4. Sign up to view the full content.

View Full Document Right Arrow Icon
2/25/2010 1 CS 188: Artificial Intelligence Spring 2010 Lecture 12: Reinforcement Learning II 2/25/2010 Pieter Abbeel – UC Berkeley Many slides over the course adapted from either Dan Klein, Stuart Russell or Andrew Moore 1 Announcements s W3 Utilities: due tonight s P3 Reinforcement Learning (RL): s Out tonight, due Thursday next week s You will get to apply RL to: s Gridworld agent s Crawler s Pac-man 2
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
2/25/2010 2 Reinforcement Learning s Still assume a Markov decision process (MDP): s A set of states s S s A set of actions (per state) A s A model T(s,a,s’) s A reward function R(s,a,s’) s Still looking for a policy π (s) s New twist: don’t know T or R s I.e. don’t know which states are good or what the actions do s Must actually try actions and states out to learn 3 The Story So Far: MDPs and RL s If we know the MDP s Compute V*, Q*, π * exactly s Evaluate a fixed policy π s If we don’t know the MDP s We can estimate the MDP then solve s We can estimate V for a fixed policy π s We can estimate Q*(s,a) for the optimal policy while executing an exploration policy 4 s Model-based DPs s Value and policy Iteration s Policy evaluation s Model-based RL s Model-free RL: s Value learning s Q-learning Things we know how to do: Techniques:
Background image of page 2
2/25/2010 3 Problems with TD Value Learning s TD value leaning is a model-free way to do policy evaluation s However, if we want to turn values into a (new) policy, we’re sunk: s Idea: learn Q-values directly s Makes action selection model-free too! a
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 4
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 03/01/2010 for the course COMPUTER S 188 taught by Professor Abbel during the Spring '10 term at Berkeley.

Page1 / 11

SP10 cs188 lecture 12 -- reinforcement learning II (2PP) -...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online