This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: EE292 Spring 2006 Analysis & Control of Markov Chains May 10, 2006 Prof. Ben Van Roy Homework Assignment 4 : Solutions 4.13 Let x k be the gambler’s fortune at time k . Let ω k = 1 if the gambler loses at time k and 1 if he wins at time k , where Pr ( ω k = 1) = 1 Pr ( ω k = 1) = p > 1 2 . Let u k be the state at time k . We have that ≤ u k ≤ x k . The system evolves according to: x k +1 = x k + ω k u k The reward function is given by the terminal reward E { ln( x N ) } . Claim that J k ( x k ) = A k + ln( x k ) where A k is independent of x k . Note that this is trivially true for k + N ; J N ( x N ) = ln ( x N ) . Assume that: J k +1 ( x k +1 ) = A k +1 + ln( x k +1 ) Then, J k ( x k ) = max u k E { J k +1 ( x k +1 ) } = max u k A k +1 + p ln( x k + u k ) + (1 p ) ln( x k u k ) Letting I k = p ln( x k + u k ) + (1 p ) ln( x k u k ) , verify that the first order optimality condition requires u * k = (2 p 1) x k < x k ( I k is easily seen to be concave in u k ). Substituting)....
View
Full Document
 Spring '10
 Bertsekas
 Englishlanguage films, SEPTA Regional Rail, UK, xk, Jn

Click to edit the document details