{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

SP11 cs188 lecture 12 -- probability 6PP

# SP11 cs188 lecture 12 -- probability 6PP - Announcements CS...

This preview shows pages 1–4. Sign up to view the full content.

1 CS 188: Artificial Intelligence Spring 2011 Lecture 12: Probability 3/2/2011 Pieter Abbeel – UC Berkeley Many slides adapted from Dan Klein. Announcements § P3 due on Monday (3/7) at 4:59pm § W3 going out tonight § Midterm Tuesday 3/15 5pm-8pm § Closed notes, books, laptops. May use one-page (two-sided) cheat sheet of your own design (group design OK but not recommended). § Monday 3/14 : no lecture at usual 5:30-7:00pm time § Midterm review? Practice midterm? 2 Today § MDP’s and Reinforcement Learning § Generalization § One of the most important concepts in machine learning! § Policy search § Next, we ` ll start studying how to reason with probabilities § Diagnosis § Tracking objects § Speech recognition § Robot mapping § lots more! § Third part of course: machine learning 3 The Story So Far: MDPs and RL § We can solve small MDPs exactly, offline § We can estimate values V π (s) directly for a fixed policy π . § We can estimate Q*(s,a) for the optimal policy while executing an exploration policy 4 § Value and policy Iteration § Temporal difference learning § Q-learning § Exploratory action selection Things we know how to do: Techniques: Q-Learning § In realistic situations, we cannot possibly learn about every single state! § Too many states to visit them all in training § Too many states to hold the q-tables in memory § Instead, we want to generalize: § Learn about some small number of training states from experience § Generalize that experience to new, similar states § This is a fundamental idea in machine learning, and we ` ll see it over and over again 5 Example: Pacman § Let ` s say we discover through experience that this state is bad: § In naïve q learning, we know nothing about this state or its q states: § Or even this one! 6

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
2 Feature-Based Representations § Solution: describe a state using a vector of features § Features are functions from states to real numbers (often 0/1) that capture important properties of the state § Example features: § Distance to closest ghost § Distance to closest dot § Number of ghosts § 1 / (dist to dot) 2 § Is Pacman in a tunnel? (0/1) § …… etc. § Can also describe a q-state (s, a) with features (e.g. action moves closer to food) 7 Linear Feature Functions § Using a feature representation, we can write a q function (or value function) for any state using a few weights: § Advantage: our experience is summed up in a few powerful numbers § Disadvantage: states may share features but be very different in value! 8 Function Approximation § Q-learning with linear q-functions: § Intuitive interpretation: § Adjust weights of active features § E.g. if something unexpectedly bad happens, disprefer all states with that state ` s features § Formal justification: online least squares 9 Exact Q ` s Approximate Q ` s Example: Q-Pacman 10 Linear regression 0 10 20 30 40 0 10 20 30 20 22 24 26 0 10 20 0 20 40 Given examples Predict given a new point 11 0 20 0 20 40 0 10 20 30 40 0 10 20 30 20 22 24 26 Linear regression Prediction Prediction 12
3 Ordinary Least Squares (OLS) 0 20 0 Error or l residual z

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### Page1 / 8

SP11 cs188 lecture 12 -- probability 6PP - Announcements CS...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online