SP10 cs188 lecture 11 -- reinforcement learning (2PP)

SP10 cs188 lecture 11 -- reinforcement learning (2PP) - CS...

Info iconThis preview shows pages 1–4. Sign up to view the full content.

View Full Document Right Arrow Icon
1 CS 188: Artificial Intelligence Spring 2010 Lecture 11: Reinforcement Learning 2/23/2010 Pieter Abbeel – UC Berkeley Many slides over the course adapted from either Dan Klein, Stuart Russell or Andrew Moore 1 Announcements s P0 / P1 / W1 / W2 in glookup s If you have no entry, etc, email staff list! s If you have questions, see one of us or email list. s W1, W2: can be picked up from 188 return box in 283 Soda s W3: Utilities --- Due Thursday. s Recall: readings for current material s Online book: Sutton and Barto http://www.cs.ualberta.ca/~sutton/book/ebook/the-book.html 2
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
2 MDPs recap s Markov decision processes: s States S s Actions A s Transitions P(s’|s,a) (or T(s,a,s’)) s Rewards R(s,a,s’) (and discount γ ) s Start state s 0 s Solution methods: s Value iteration (VI) s Policy iteration (PI) s Asynchronous value iteration s Current limitations: s Relatively small state spaces s Assumes T and R are known 4 MDP Example: Grid World s The agent lives in a grid s Walls block the agent’s path s The agent’s actions do not always go as planned: s 80% of the time, the action North takes the agent North (if there is no wall there) s 10% of the time, North takes the agent West; 10% East s If there is a wall in the direction the agent would have been taken, the agent stays put s Rewards come at the end s Goal: maximize sum of rewards
Background image of page 2
3 MDP Example: Grid World MDP = (S, A, T, R, s 0 , γ ) Set of actions A Set of states S Transition model T Initial state s 0 Discount factor γ Value Iteration s Idea: s V i (s) : the expected discounted sum of rewards accumulated when starting from state s and acting optimally for a horizon of i
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 4
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 03/01/2010 for the course COMPUTER S 188 taught by Professor Abbel during the Spring '10 term at University of California, Berkeley.

Page1 / 10

SP10 cs188 lecture 11 -- reinforcement learning (2PP) - CS...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online