383-Fall11-Lec17

# 383-Fall11-Lec17 - CMPSCI383: Lecture17,November8,2011...

This preview shows pages 1–12. Sign up to view the full content.

CMPSCI 383: Artificial Intelligence cture 17 November 8 2011 Lecture 17, November 8, 2011 Making Complex Decisions Philip Thomas (TA)

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
isclaimer Disclaimer not covering everything in 17 1 7 3 Im not covering everything in 17.1 17.3
quential Decision Problems Sequential Decision Problems Chapter 16 was “one shot” Where should the airport be placed? Should I accept a certain bet? What about problems where an agent must make a sequence of decisions? e assume that a decision will influence the We assume that a decision will influence the future decisions that must be made. Robot control (helicopter / balancing) Elevator scheduling Anesthesia administration, DRAM schedulers, ackgammon… Backgammon…

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
lack Sheep Wall Black Sheep Wall Chess Poker Checkers Blackjack Tag Marco Polo Fully Observable Partially Observable For now, we assume the problem is fully observable.
Simple Example A Simple Example “ ridworld” with 2 Goal states Gridworld with 2 Goal states Actions: Up, Down, Left, Right Fully observable: Agent knows where it is 08 + 1 2 3 0.8 0.1 0.1 –1 1 START 1234 (a) (b)

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
ansition Model Transition Model ' | ) s sa (' , Ps 3 0.8 + 1 2 0.1 0.1 –1 1 START 1234 (a) (b)
arkov Assumption Markov Assumption ' | ) s sa (' , Ps 3 0.8 + 1 2 0.1 0.1 –1 1 START 1234 (a) (b)

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
arkov Assumption Markov Assumption is it reasonable? … is it reasonable? Real world problems where it applies? Real world problems where it doesn’t apply?
gent’s Utility Function Agent s Utility Function erformance depends on the entire sequence Performance depends on the entire sequence of states and actions. “Environment history” In each state, the agent receives a reward R ( s ) . The reward is real valued. It may be positive or negative. Utility of environment history = sum of reward received.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
eward function Reward function ' |) s sa (| , ) Ps () R s 08 + 1 .04 .04 .04 2 3 0.8 0.1 0.1 –1 .04 .04 1 START .04 .04 .04 1234 (a) (b)
ecision Rules Decision Rules ecision rules say what to do in each state Decision rules say what to do in each state. Often called policies, π . Action for state s is given by π (s) .

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### Page1 / 44

383-Fall11-Lec17 - CMPSCI383: Lecture17,November8,2011...

This preview shows document pages 1 - 12. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online