lecture 9

# lecture 9 - CS 188 Artificial Intelligence Spring 2010...

This preview shows pages 1–5. Sign up to view the full content.

1 CS 188: Artificial Intelligence Spring 2010 Lecture 9: MDPs 2/16/2010 Pieter Abbeel – UC Berkeley Many slides adapted from Dan Klein 1 Announcements s Assignments s P2 due Thursday s We reserved Soda 271 on Wednesday Feb 17 from 4 to 6. One of the GSI's will periodically drop in to see if he can provide any clarifications/assistance. It's a great opportunity to meet other students who might still be looking for a partner. s Readings: s For MDPs / reinforcement learning, we’re using an online reading s s Lecture version is the standard for this class 2

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
2 Example: Insurance s Consider the lottery [0.5,\$1000; 0.5,\$0] s What is its expected monetary value ? (\$500) s What is its certainty equivalent ? s Monetary value acceptable in lieu of lottery s \$400 for most people s Difference of \$100 is the insurance premium s There’s an insurance industry because people will pay to reduce their risk s If everyone were risk-neutral, no insurance needed! 3 Example: Insurance s Because people ascribe different utilities to different amounts of money, insurance agreements can increase both parties’ expected utility You own a car. Your lottery: L Y = [0.8, \$0 ; 0.2, -\$200] i.e., 20% chance of crashing You do not want -\$200! U Y (L Y ) = 0.2*U Y (-\$200) = -200 U Y (-\$50) = -150 Amount Your Utility U Y \$0 0 -\$50 -150 -\$200 -1000
3 Example: Insurance s Because people ascribe different utilities to different amounts of money, insurance agreements can increase both parties’ expected utility You own a car. Your lottery: L Y = [0.8, \$0 ; 0.2, -\$200] i.e., 20% chance of crashing You do not want -\$200! U Y (L Y ) = 0.2*U Y (-\$200) = -200 U Y (-\$50) = -150 Insurance company buys risk: L I = [0.8, \$50 ; 0.2, -\$150] i.e., \$50 revenue + your L Y Insurer is risk-neutral: U(L)=U(EMV(L)) U I (L I ) = U(0.8*50 + 0.2*(-150)) = U(\$10) > U(\$0) Example: Human Rationality? s Famous example of Allais (1953) s A: [0.8,\$4k; 0.2,\$0] s B: [1.0,\$3k; 0.0,\$0] s C: [0.2,\$4k; 0.8,\$0] s D: [0.25,\$3k; 0.75,\$0] s Most people prefer B > A, C > D s But if U(\$0) = 0, then s B > A U(\$3k) > 0.8 U(\$4k) s C > D 0.8 U(\$4k) > U(\$3k) 6

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
4 Reinforcement Learning s Basic idea: s Receive feedback in the form of rewards s Agent’s utility is defined by the reward function s Must learn to act so as to maximize expected rewards Grid World s The agent lives in a grid s Walls block the agent’s path s The agent’s actions do not always go as planned: s 80% of the time, the action North takes the agent North (if there is no wall there) s 10% of the time, North takes the agent West; 10% East s If there is a wall in the direction the agent would have been taken, the agent stays put s Small “living” reward each step s Big rewards come at the end s Goal: maximize sum of rewards*
This is the end of the preview. Sign up to access the rest of the document.

## This note was uploaded on 04/21/2010 for the course EECS 188 taught by Professor Cs188 during the Spring '01 term at Berkeley.

### Page1 / 18

lecture 9 - CS 188 Artificial Intelligence Spring 2010...

This preview shows document pages 1 - 5. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online