SP11 cs188 lecture 12 -- probability 6PP

SP11 cs188 lecture 12 -- probability 6PP - Announcements CS...

Info iconThis preview shows pages 1–4. Sign up to view the full content.

View Full Document Right Arrow Icon
1 CS 188: Artificial Intelligence Spring 2011 Lecture 12: Probability 3/2/2011 Pieter Abbeel – UC Berkeley Many slides adapted from Dan Klein. Announcements § P3 due on Monday (3/7) at 4:59pm § W3 going out tonight § Midterm Tuesday 3/15 5pm-8pm § Closed notes, books, laptops. May use one-page (two-sided) cheat sheet of your own design (group design OK but not recommended). § Monday 3/14 : no lecture at usual 5:30-7:00pm time § Midterm review? Practice midterm? 2 Today § MDP’s and Reinforcement Learning § Generalization § One of the most important concepts in machine learning! § Policy search § Next, we ` ll start studying how to reason with probabilities § Diagnosis § Tracking objects § Speech recognition § Robot mapping § lots more! § Third part of course: machine learning 3 The Story So Far: MDPs and RL § We can solve small MDPs exactly, offline § We can estimate values V π (s) directly for a fixed policy π . § We can estimate Q*(s,a) for the optimal policy while executing an exploration policy 4 § Value and policy Iteration § Temporal difference learning § Q-learning § Exploratory action selection Things we know how to do: Techniques: Q-Learning § In realistic situations, we cannot possibly learn about every single state! § Too many states to visit them all in training § Too many states to hold the q-tables in memory § Instead, we want to generalize: § Learn about some small number of training states from experience § Generalize that experience to new, similar states § This is a fundamental idea in machine learning, and we ` ll see it over and over again 5 Example: Pacman § Let ` s say we discover through experience that this state is bad: § In naïve q learning, we know nothing about this state or its q states: § Or even this one! 6
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
2 Feature-Based Representations § Solution: describe a state using a vector of features § Features are functions from states to real numbers (often 0/1) that capture important properties of the state § Example features: § Distance to closest ghost § Distance to closest dot § Number of ghosts § 1 / (dist to dot) 2 § Is Pacman in a tunnel? (0/1) § …… etc. § Can also describe a q-state (s, a) with features (e.g. action moves closer to food) 7 Linear Feature Functions § Using a feature representation, we can write a q function (or value function) for any state using a few weights: § Advantage: our experience is summed up in a few powerful numbers § Disadvantage: states may share features but be very different in value! 8 Function Approximation § Q-learning with linear q-functions: § Intuitive interpretation: § Adjust weights of active features § E.g. if something unexpectedly bad happens, disprefer all states with that state ` s features § Formal justification: online least squares 9 Exact Q ` s Approximate Q ` s Example: Q-Pacman 10 Linear regression 0 10 20 30 40 0 10 20 30 20 22 24 26 0 10 20 0 20 40 Given examples Predict given a new point 11 0 20 0 20 40 0 10 20 30 40 0 10 20 30 20 22 24 26 Linear regression Prediction Prediction 12
Background image of page 2
3 Ordinary Least Squares (OLS) 0 20 0 Error or l residual z
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 4
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 08/26/2011 for the course CS 188 taught by Professor Staff during the Spring '08 term at University of California, Berkeley.

Page1 / 8

SP11 cs188 lecture 12 -- probability 6PP - Announcements CS...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online