This preview shows pages 1–2. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: EE292 Spring 2006 Analysis & Control of Markov Chains May 29 , 2006 Prof. Ben Van Roy Homework Assignment 6 : Solutions 1 The optimal policy is to keep the server on if it is on, and to turn the server on if it is off. See the attached code. Starting at the policy of keeping the server off always, one iteration of policy iteration was required to reach the optimal policy. Starting with a J vector of all zeros the greedy policies with respect to the value iterates were optimal beyond the seventh iteration. 2 (a) It is clear that m = 1 corresponds to value iteration. To see why m = corresponds to policy iteration it suffices to show that lim m T m k J k = J k +1 . To see this, we first note that T k J k +1 = J k +1 . Moreover, for arbitrary J, J , k T k J T k J k k J J k so that we must have k T m k J k J k +1 k m k J k J k +1 k , from which the result follows....
View Full
Document
 Spring '10
 Bertsekas

Click to edit the document details