NAME:
9
(h) (4 pt)
What is the optimal policy
π
*
when
A
doesn’t know whether or not
D
is present?
Using
T
(
k, D, k
+ 1) =
5
12
, we find that
π
*
=
{
1 :
S,
2 :
S,
3 :
S,
4 :
S
}
. We showed from part (e) that for
any
y <
3
4
, shooting is preferable to dribbling from state 3. Therefore, we know that
V
*
(3) =
Q
*
(3
, S
).
We can perform similar computations for states 1 and 2:
V
*
(3) =
Q
*
(3
, S
) =
1
2
π
*
(3) =
S
Q
*
(2
, S
) =
1
3
Q
*
(2
, D
) =
5
12
·
V
*
(3) +
7
12
·
0 =
5
24
V
*
(2) = max(
Q
*
(2
, S
)
, Q
*
(2
, D
)) =
1
3
π
*
(2) =
S
Q
*
(1
, S
) =
1
6
Q
*
(1
, D
) =
5
12
·
V
*
(2) +
7
12
·
0 =
5
36
V
*
(1) = max(
Q
*
(1
, S
)
, Q
*
(1
, D
)) =
1
6
π
*
(1) =
S
Under the second answer for part (g), similar computations give
π
*
=
{
1 :
 Spring '08
 Staff
 Artificial Intelligence, Harshad number, optimal policy, similar computations

