NAME:
9
(h) (4 pt)
What is the optimal policy
π
*
when
A
doesn’t know whether or not
D
is present?
Using
T
(
k, D, k
+ 1) =
5
12
, we find that
π
*
=
{
1 :
S,
2 :
S,
3 :
S,
4 :
S
}
. We showed from part (e) that for
any
y <
3
4
, shooting is preferable to dribbling from state 3. Therefore, we know that
V
*
(3) =
Q
*
(3
, S
).
We can perform similar computations for states 1 and 2:
V
*
(3) =
Q
*
(3
, S
) =
1
2
π
*
(3) =
S
Q
*
(2
, S
) =
1
3
Q
*
(2
, D
) =
5
12
·
V
*
(3) +
7
12
·
0 =
5
24
V
*
(2) = max(
Q
*
(2
, S
)
, Q
*
(2
, D
)) =
1
3
π
*
(2) =
S
Q
*
(1
, S
) =
1
6
Q
*
(1
, D
) =
5
12
·
V
*
(2) +
7
12
·
0 =
5
36
V
*
(1) = max(
Q
*
(1
, S
)
, Q
*
(1
, D
)) =
1
6
π
*
(1) =
S
Under the second answer for part (g), similar computations give
π
*
=
{
1 :
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '08
 Staff
 Artificial Intelligence, Harshad number, optimal policy, similar computations

Click to edit the document details