{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

SP10 cs188 lecture 11 -- reinforcement learning (2PP)

# SP10 cs188 lecture 11 -- reinforcement learning (2PP) - CS...

This preview shows pages 1–4. Sign up to view the full content.

1 CS 188: Artificial Intelligence Spring 2010 Lecture 11: Reinforcement Learning 2/23/2010 Pieter Abbeel – UC Berkeley Many slides over the course adapted from either Dan Klein, Stuart Russell or Andrew Moore 1 Announcements square4 P0 / P1 / W1 / W2 in glookup square4 If you have no entry, etc, email staff list! square4 If you have questions, see one of us or email list. square4 W1, W2: can be picked up from 188 return box in 283 Soda square4 W3: Utilities --- Due Thursday. square4 Recall: readings for current material square4 Online book: Sutton and Barto http://www.cs.ualberta.ca/~sutton/book/ebook/the-book.html 2

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
2 MDPs recap square4 Markov decision processes: square4 States S square4 Actions A square4 Transitions P(s’|s,a) (or T(s,a,s’)) square4 Rewards R(s,a,s’) (and discount γ ) square4 Start state s 0 square4 Solution methods: square4 Value iteration (VI) square4 Policy iteration (PI) square4 Asynchronous value iteration square4 Current limitations: square4 Relatively small state spaces square4 Assumes T and R are known 4 MDP Example: Grid World square4 The agent lives in a grid square4 Walls block the agent’s path square4 The agent’s actions do not always go as planned: square4 80% of the time, the action North takes the agent North (if there is no wall there) square4 10% of the time, North takes the agent West; 10% East square4 If there is a wall in the direction the agent would have been taken, the agent stays put square4 Rewards come at the end square4 Goal: maximize sum of rewards
3 MDP Example: Grid World MDP = (S, A, T, R, s 0 , γ ) Set of actions A Set of states S Transition model T Initial state s 0 Discount factor γ Value Iteration square4 Idea: square4 V i

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### Page1 / 10

SP10 cs188 lecture 11 -- reinforcement learning (2PP) - CS...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online