EE292
Spring 2006
Analysis & Control of Markov Chains
May 29 , 2006
Prof. Ben Van Roy
Homework Assignment 6 : Solutions
1
The optimal policy is to keep the server on if it is on, and to turn the server on if it is off. See
the attached code. Starting at the policy of keeping the server off always, one iteration of policy
iteration was required to reach the optimal policy. Starting with a
J
vector of all zeros the greedy
policies with respect to the value iterates were optimal beyond the seventh iteration.
2
(a) It is clear that
m
= 1
corresponds to value iteration. To see why
m
=
∞
corresponds to
policy iteration it suffices to show that
lim
m
→∞
T
m
μ
k
J
k
=
J
μ
k
+1
. To see this, we first note that
T
μ
k
J
μ
k
+1
=
J
μ
k
+1
. Moreover, for arbitrary
J,
¯
J
,
T
μ
k
J

T
μ
k
¯
J
∞
≤
α J

¯
J
∞
so that we must have
T
m
μ
k
J
μ
k

J
μ
k
+1
∞
≤
α
m
J
μ
k

J
μ
k
+1
∞
,
from which the result follows.
(b) Let us prove by induction that
J
*
≤
J
k
≤
T
k
J
0
and
TJ
k
≤
J
k
for all
k
. Observe that the first invariant suffices to show that
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '10
 Bertsekas
 Logic, Tµn Jn Tµn Jn Tµn Jn Jn, Jn Tµn Jn, Jn Jn, Tµn Jn Jn, Tµn Jn Tµn Jn Jn

Click to edit the document details