lecture 13 - CS 188: Artificial Intelligence Spring 2010...

Info iconThis preview shows pages 1–5. Sign up to view the full content.

View Full Document Right Arrow Icon
1 CS 188: Artificial Intelligence Spring 2010 Lecture 13: Probability 3/2/2010 Pieter Abbeel – UC Berkeley Many slides adapted from Dan Klein. 1 Announcements s Upcoming s **new** Tomorrow/Wednesday: probability review session s 7:30-9:30pm in 306 Soda s P3 due on Thursday (3/4) s W4 going out on Thursday, due next week Thursday (3/11) s Midterm in evening of 3/18 2
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
2 Today s We’re almost done with search and planning! s MDP’s: policy search wrap-up s Next, we’ll start studying how to reason with probabilities s Diagnosis s Tracking objects s Speech recognition s Robot mapping s … lots more! s Third part of course: machine learning 3 Policy Search 4
Background image of page 2
3 MDPs recap s MDP recap: (S, A, T, R, s 0 , γ ) s In small MDPs: can find V(s) and/or Q(s,a) s Known T, R: value iteration, policy iteration s Unknown T, R: Q learning s In large MDPs: cannot enumerate all states 5 Function Approximation s Q-learning with linear q-functions: s Intuitive interpretation: s Adjust weights of active features s E.g. if something unexpectedly bad happens, disprefer all states with that state’s features s Formal justification: online least squares 6 Exact Q’s Approximate Q’s
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
4 Policy Search Idea s Problem: often the feature-based policies that work well aren’t the ones that approximate V / Q best s Solution: learn the policy that maximizes rewards rather than the value that predicts rewards s This is the idea behind policy search, such as what controlled the upside-down helicopter 7 Policy Search s Simplest policy search: s Start with an initial linear value function or Q-function s Nudge each feature weight up and down and see if
Background image of page 4
Image of page 5
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 04/21/2010 for the course EECS 188 taught by Professor Cs188 during the Spring '01 term at University of California, Berkeley.

Page1 / 11

lecture 13 - CS 188: Artificial Intelligence Spring 2010...

This preview shows document pages 1 - 5. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online