cs188_sp09_mt1_sol 7 - 4)( T (4 ,S,G ) R (4 ,S,G ) + T (4...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
NAME: 7 5. (21 points) MDPs: Robot Soccer A soccer robot A is on a fast break toward the goal, starting in position 1. From positions 1 through 3, it can either shoot ( S ) or dribble the ball forward ( D ); from 4 it can only shoot. If it shoots, it either scores a goal (state G ) or misses (state M ). If it dribbles, it either advances a square or loses the ball, ending up in M . X O X X X O O X O 4 3 2 1 X O X X X O O X O 4 3 2 1 X O O O 4 A D Goal 1 2 3 In this MDP, the states are 1, 2, 3, 4, G and M , where G and M are terminal states. The transition model depends on the parameter y , which is the probability of dribbling success. Assume a discount of γ = 1. T ( k,S,G ) = k 6 T ( k,S,M ) = 1 - k 6 for k ∈ { 1 , 2 , 3 , 4 } T ( k,D,k + 1) = y T ( k,D,M ) = 1 - y for k ∈ { 1 , 2 , 3 } R ( k,S,G ) = 1 for k ∈ { 1 , 2 , 3 , 4 } , and rewards are 0 for all other transitions (a) (2 pt) What is V π (1) for the policy π that always shoots? V π (1) = T (1 ,S,G ) R (1 ,S,G ) + T (1 ,S,M ) R (1 ,S,M ) = 1 6 (b) (2 pt) What is Q * (3 ,D ) in terms of y ? Q * (3 ,D ) = T (3 ,D, 4)( R (3 ,D, 4) + V * (4)) + T (3 ,D,M ) R (3 ,D,M ) = T (3 ,D, 4) V * (4) = T (3 ,D, 4) Q * (4 ,S ) = T (3 ,D,
Background image of page 1
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 4)( T (4 ,S,G ) R (4 ,S,G ) + T (4 ,S,M ) R (4 ,S,m )) = T (3 ,D, 4) T (4 ,S,G ) R (4 ,S,G ) = 2 3 y (c) (2 pt) Using y = 3 4 , complete the rst two iterations of value iteration. i V * i (1) V * i (2) V * i (3) V * i (4) 1 1 6 1 3 1 2 2 3 2 1 4 3 8 1 2 2 3 (d) (2 pt) After how many iterations will value iteration compute the optimal values for all states? After 3 iterations, the values will have converged when y = 3 4 . Above, only V * (1) has not yet converged. We note that for y > 3 4 , a fourth iteration would be required because a fast break has up to four transitions. (e) (2 pt) For what range of values of y is Q * (3 ,S ) Q * (3 ,D )? Q * (3 ,S ) Q * (3 ,D ) T (3 ,S,G ) 1 T (3 ,D, 4) T (4 ,S,G ) 1 1 2 y 2 3 3 4 y...
View Full Document

This note was uploaded on 08/30/2009 for the course CS 188 taught by Professor Staff during the Spring '08 term at University of California, Berkeley.

Ask a homework question - tutors are online