5-MDPs and RL_solutions

5-MDPs and RL_solutions - CS188 Fall 2010 Section 5: MDPs...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
Treasure Hunting While Pacman is out collecting all the dots from mediumClassic , Ms. Pacman takes some time to go treasure hunting in the Gridworld island. Ever prepared, she has a map that shows where all the hazards are, and where the treasure is. From any unmarked square, Ms. Pacman can take the standard actions (N, S, E, W), but she is surefooted enough that her actions always succeed (i.e. there is no movement noise). If she lands in a hazard (H) square or a treasure (T) square, her only action is to call for an airlift (X), which takes her to the terminal ‘Done’ state; this results in a reward of -64 if she’s escaping a hazard, but +128 if she’s running off with the treasure. There is no “living reward.” (a) What are the optimal values, V * of each state in the above grid if γ = 0 . 5? 128 64 32 -64 -64 16 2 4 8 (b) What are the Q-values for the last square on the second row (i.e., the one without fire)? Q
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 2

5-MDPs and RL_solutions - CS188 Fall 2010 Section 5: MDPs...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online