{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

sol4 - EE292 Spring 2006 Analysis& Control of Markov...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: EE292 Spring 2006 Analysis & Control of Markov Chains May 10, 2006 Prof. Ben Van Roy Homework Assignment 4 : Solutions 4.13 Let x k be the gambler’s fortune at time k . Let ω k =- 1 if the gambler loses at time k and 1 if he wins at time k , where Pr ( ω k = 1) = 1- Pr ( ω k =- 1) = p > 1 2 . Let u k be the state at time k . We have that ≤ u k ≤ x k . The system evolves according to: x k +1 = x k + ω k u k The reward function is given by the terminal reward E { ln( x N ) } . Claim that J k ( x k ) = A k + ln( x k ) where A k is independent of x k . Note that this is trivially true for k + N ; J N ( x N ) = ln ( x N ) . Assume that: J k +1 ( x k +1 ) = A k +1 + ln( x k +1 ) Then, J k ( x k ) = max u k E { J k +1 ( x k +1 ) } = max u k A k +1 + p ln( x k + u k ) + (1- p ) ln( x k- u k ) Letting I k = p ln( x k + u k ) + (1- p ) ln( x k- u k ) , verify that the first order optimality condition requires u * k = (2 p- 1) x k < x k ( I k is easily seen to be concave in u k ). Substituting)....
View Full Document

{[ snackBarMessage ]}

Page1 / 3

sol4 - EE292 Spring 2006 Analysis& Control of Markov...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon bookmark
Ask a homework question - tutors are online