This preview shows pages 1–2. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: EE292 Spring 2006 Analysis & Control of Markov Chains May 10, 2006 Prof. Ben Van Roy Homework Assignment 4 : Solutions 4.13 Let x k be the gamblers fortune at time k . Let k = 1 if the gambler loses at time k and 1 if he wins at time k , where Pr ( k = 1) = 1 Pr ( k = 1) = p > 1 2 . Let u k be the state at time k . We have that u k x k . The system evolves according to: x k +1 = x k + k u k The reward function is given by the terminal reward E { ln( x N ) } . Claim that J k ( x k ) = A k + ln( x k ) where A k is independent of x k . Note that this is trivially true for k + N ; J N ( x N ) = ln ( x N ) . Assume that: J k +1 ( x k +1 ) = A k +1 + ln( x k +1 ) Then, J k ( x k ) = max u k E { J k +1 ( x k +1 ) } = max u k A k +1 + p ln( x k + u k ) + (1 p ) ln( x k u k ) Letting I k = p ln( x k + u k ) + (1 p ) ln( x k u k ) , verify that the first order optimality condition requires u * k = (2 p 1) x k < x k ( I k is easily seen to be concave in u k ). Substituting)....
View
Full
Document
This note was uploaded on 02/15/2011 for the course EECS 6.231 taught by Professor Bertsekas during the Spring '10 term at MIT.
 Spring '10
 Bertsekas

Click to edit the document details