{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

cs188_sp09_mt1_sol 9

# cs188_sp09_mt1_sol 9 - NAME 9(h(4 pt What is the optimal...

This preview shows page 1. Sign up to view the full content.

NAME: 9 (h) (4 pt) What is the optimal policy π * when A doesn’t know whether or not D is present? Using T ( k, D, k + 1) = 5 12 , we find that π * = { 1 : S, 2 : S, 3 : S, 4 : S } . We showed from part (e) that for any y < 3 4 , shooting is preferable to dribbling from state 3. Therefore, we know that V * (3) = Q * (3 , S ). We can perform similar computations for states 1 and 2: V * (3) = Q * (3 , S ) = 1 2 π * (3) = S Q * (2 , S ) = 1 3 Q * (2 , D ) = 5 12 · V * (3) + 7 12 · 0 = 5 24 V * (2) = max( Q * (2 , S ) , Q * (2 , D )) = 1 3 π * (2) = S Q * (1 , S ) = 1 6 Q * (1 , D ) = 5 12 · V * (2) + 7 12 · 0 = 5 36 V * (1) = max( Q * (1 , S ) , Q * (1 , D )) = 1 6 π * (1) = S Under the second answer for part (g), similar computations give π * = { 1 :
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

Ask a homework question - tutors are online