lecture 9 - CS 188 Artificial Intelligence Spring 2010...

Info iconThis preview shows pages 1–5. Sign up to view the full content.

View Full Document Right Arrow Icon
1 CS 188: Artificial Intelligence Spring 2010 Lecture 9: MDPs 2/16/2010 Pieter Abbeel – UC Berkeley Many slides adapted from Dan Klein 1 Announcements s Assignments s P2 due Thursday s We reserved Soda 271 on Wednesday Feb 17 from 4 to 6. One of the GSI's will periodically drop in to see if he can provide any clarifications/assistance. It's a great opportunity to meet other students who might still be looking for a partner. s Readings: s For MDPs / reinforcement learning, we’re using an online reading s s Lecture version is the standard for this class 2
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
2 Example: Insurance s Consider the lottery [0.5,$1000; 0.5,$0] s What is its expected monetary value ? ($500) s What is its certainty equivalent ? s Monetary value acceptable in lieu of lottery s $400 for most people s Difference of $100 is the insurance premium s There’s an insurance industry because people will pay to reduce their risk s If everyone were risk-neutral, no insurance needed! 3 Example: Insurance s Because people ascribe different utilities to different amounts of money, insurance agreements can increase both parties’ expected utility You own a car. Your lottery: L Y = [0.8, $0 ; 0.2, -$200] i.e., 20% chance of crashing You do not want -$200! U Y (L Y ) = 0.2*U Y (-$200) = -200 U Y (-$50) = -150 Amount Your Utility U Y $0 0 -$50 -150 -$200 -1000
Background image of page 2
3 Example: Insurance s Because people ascribe different utilities to different amounts of money, insurance agreements can increase both parties’ expected utility You own a car. Your lottery: L Y = [0.8, $0 ; 0.2, -$200] i.e., 20% chance of crashing You do not want -$200! U Y (L Y ) = 0.2*U Y (-$200) = -200 U Y (-$50) = -150 Insurance company buys risk: L I = [0.8, $50 ; 0.2, -$150] i.e., $50 revenue + your L Y Insurer is risk-neutral: U(L)=U(EMV(L)) U I (L I ) = U(0.8*50 + 0.2*(-150)) = U($10) > U($0) Example: Human Rationality? s Famous example of Allais (1953) s A: [0.8,$4k; 0.2,$0] s B: [1.0,$3k; 0.0,$0] s C: [0.2,$4k; 0.8,$0] s D: [0.25,$3k; 0.75,$0] s Most people prefer B > A, C > D s But if U($0) = 0, then s B > A U($3k) > 0.8 U($4k) s C > D 0.8 U($4k) > U($3k) 6
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
4 Reinforcement Learning s Basic idea: s Receive feedback in the form of rewards s Agent’s utility is defined by the reward function s Must learn to act so as to maximize expected rewards Grid World s The agent lives in a grid s Walls block the agent’s path s The agent’s actions do not always go as planned: s 80% of the time, the action North takes the agent North (if there is no wall there) s 10% of the time, North takes the agent West; 10% East s If there is a wall in the direction the agent would have been taken, the agent stays put s Small “living” reward each step s Big rewards come at the end s Goal: maximize sum of rewards*
Background image of page 4
Image of page 5
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 04/21/2010 for the course EECS 188 taught by Professor Cs188 during the Spring '01 term at Berkeley.

Page1 / 18

lecture 9 - CS 188 Artificial Intelligence Spring 2010...

This preview shows document pages 1 - 5. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online