383-Fall11-Lec17 - CMPSCI383: Lecture17,November8,2011...

Info iconThis preview shows pages 1–12. Sign up to view the full content.

View Full Document Right Arrow Icon
CMPSCI 383: Artificial Intelligence cture 17 November 8 2011 Lecture 17, November 8, 2011 Making Complex Decisions Philip Thomas (TA)
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
isclaimer Disclaimer not covering everything in 17 1 7 3 Im not covering everything in 17.1 17.3
Background image of page 2
quential Decision Problems Sequential Decision Problems Chapter 16 was “one shot” Where should the airport be placed? Should I accept a certain bet? What about problems where an agent must make a sequence of decisions? e assume that a decision will influence the We assume that a decision will influence the future decisions that must be made. Robot control (helicopter / balancing) Elevator scheduling Anesthesia administration, DRAM schedulers, ackgammon… Backgammon…
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
lack Sheep Wall Black Sheep Wall Chess Poker Checkers Blackjack Tag Marco Polo Fully Observable Partially Observable For now, we assume the problem is fully observable.
Background image of page 4
Simple Example A Simple Example “ ridworld” with 2 Goal states Gridworld with 2 Goal states Actions: Up, Down, Left, Right Fully observable: Agent knows where it is 08 + 1 2 3 0.8 0.1 0.1 –1 1 START 1234 (a) (b)
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
ansition Model Transition Model ' | ) s sa (' , Ps 3 0.8 + 1 2 0.1 0.1 –1 1 START 1234 (a) (b)
Background image of page 6
arkov Assumption Markov Assumption ' | ) s sa (' , Ps 3 0.8 + 1 2 0.1 0.1 –1 1 START 1234 (a) (b)
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
arkov Assumption Markov Assumption is it reasonable? … is it reasonable? Real world problems where it applies? Real world problems where it doesn’t apply?
Background image of page 8
gent’s Utility Function Agent s Utility Function erformance depends on the entire sequence Performance depends on the entire sequence of states and actions. “Environment history” In each state, the agent receives a reward R ( s ) . The reward is real valued. It may be positive or negative. Utility of environment history = sum of reward received.
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
eward function Reward function ' |) s sa (| , ) Ps () R s 08 + 1 .04 .04 .04 2 3 0.8 0.1 0.1 –1 .04 .04 1 START .04 .04 .04 1234 (a) (b)
Background image of page 10
ecision Rules Decision Rules ecision rules say what to do in each state Decision rules say what to do in each state. Often called policies, π . Action for state s is given by π (s) .
Background image of page 11

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 12
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

Page1 / 44

383-Fall11-Lec17 - CMPSCI383: Lecture17,November8,2011...

This preview shows document pages 1 - 12. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online