{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

# sol6 - EE292 Analysis Control of Markov Chains Prof Ben Van...

This preview shows pages 1–2. Sign up to view the full content.

EE292 Spring 2006 Analysis & Control of Markov Chains May 29 , 2006 Prof. Ben Van Roy Homework Assignment 6 : Solutions 1 The optimal policy is to keep the server on if it is on, and to turn the server on if it is off. See the attached code. Starting at the policy of keeping the server off always, one iteration of policy iteration was required to reach the optimal policy. Starting with a J vector of all zeros the greedy policies with respect to the value iterates were optimal beyond the seventh iteration. 2 (a) It is clear that m = 1 corresponds to value iteration. To see why m = corresponds to policy iteration it suffices to show that lim m →∞ T m μ k J k = J μ k +1 . To see this, we first note that T μ k J μ k +1 = J μ k +1 . Moreover, for arbitrary J, ¯ J , T μ k J - T μ k ¯ J α J - ¯ J so that we must have T m μ k J μ k - J μ k +1 α m J μ k - J μ k +1 , from which the result follows. (b) Let us prove by induction that J * J k T k J 0 and TJ k J k for all k . Observe that the first invariant suffices to show that

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}