cs188_sp09_mt1_sol 9 - NAME: 9 (h) (4 pt) What is the...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
NAME: 9 (h) (4 pt) What is the optimal policy π * when A doesn’t know whether or not D is present? Using T ( k,D,k + 1) = 5 12 , we find that π * = { 1 : S, 2 : S, 3 : S, 4 : S } . We showed from part (e) that for any y < 3 4 , shooting is preferable to dribbling from state 3. Therefore, we know that V * (3) = Q * (3 ,S ). We can perform similar computations for states 1 and 2: V * (3) = Q * (3 ,S ) = 1 2 π * (3) = S Q * (2 ,S ) = 1 3 Q * (2 ,D ) = 5 12 · V * (3) + 7 12 · 0 = 5 24 V * (2) = max( Q * (2 ,S ) ,Q * (2 ,D )) = 1 3 π * (2) = S Q * (1 ,S ) = 1 6 Q * (1 ,D ) = 5 12 · V * (2) + 7 12 · 0 = 5 36 V * (1) = max( Q * (1 ,S ) ,Q * (1 ,D )) = 1 6 π * (1) = S Under the second answer for part (g), similar computations give π * = { 1 :
Background image of page 1
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 08/30/2009 for the course CS 188 taught by Professor Staff during the Spring '08 term at University of California, Berkeley.

Ask a homework question - tutors are online