sol6 - EE292 Spring 2006 Analysis & Control of...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: EE292 Spring 2006 Analysis & Control of Markov Chains May 29 , 2006 Prof. Ben Van Roy Homework Assignment 6 : Solutions 1 The optimal policy is to keep the server on if it is on, and to turn the server on if it is off. See the attached code. Starting at the policy of keeping the server off always, one iteration of policy iteration was required to reach the optimal policy. Starting with a J vector of all zeros the greedy policies with respect to the value iterates were optimal beyond the seventh iteration. 2 (a) It is clear that m = 1 corresponds to value iteration. To see why m = corresponds to policy iteration it suffices to show that lim m T m k J k = J k +1 . To see this, we first note that T k J k +1 = J k +1 . Moreover, for arbitrary J, J , k T k J- T k J k k J- J k so that we must have k T m k J k- J k +1 k m k J k- J k +1 k , from which the result follows....
View Full Document

Page1 / 3

sol6 - EE292 Spring 2006 Analysis & Control of...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online