Treasure Hunting While Pacman is out collecting all the dots from mediumClassic , Ms. Pacman takes some time to go treasure hunting in the Gridworld island. Ever prepared, she has a map that shows where all the hazards are, and where the treasure is. From any unmarked square, Ms. Pacman can take the standard actions (N, S, E, W), but she is surefooted enough that her actions always succeed (i.e. there is no movement noise). If she lands in a hazard (H) square or a treasure (T) square, her only action is to call for an airlift (X), which takes her to the terminal ‘Done’ state; this results in a reward of -64 if she’s escaping a hazard, but +128 if she’s running oﬀ with the treasure. There is no “living reward.” (a) What are the optimal values, V * of each state in the above grid if γ = 0 . 5? 128 64 32 -64 -64 16 2 4 8 (b) What are the Q-values for the last square on the second row (i.e., the one without ﬁre)? Q

