sol4 - EE292 Spring 2006 Analysis & Control of...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: EE292 Spring 2006 Analysis & Control of Markov Chains May 10, 2006 Prof. Ben Van Roy Homework Assignment 4 : Solutions 4.13 Let x k be the gamblers fortune at time k . Let k =- 1 if the gambler loses at time k and 1 if he wins at time k , where Pr ( k = 1) = 1- Pr ( k =- 1) = p > 1 2 . Let u k be the state at time k . We have that u k x k . The system evolves according to: x k +1 = x k + k u k The reward function is given by the terminal reward E { ln( x N ) } . Claim that J k ( x k ) = A k + ln( x k ) where A k is independent of x k . Note that this is trivially true for k + N ; J N ( x N ) = ln ( x N ) . Assume that: J k +1 ( x k +1 ) = A k +1 + ln( x k +1 ) Then, J k ( x k ) = max u k E { J k +1 ( x k +1 ) } = max u k A k +1 + p ln( x k + u k ) + (1- p ) ln( x k- u k ) Letting I k = p ln( x k + u k ) + (1- p ) ln( x k- u k ) , verify that the first order optimality condition requires u * k = (2 p- 1) x k < x k ( I k is easily seen to be concave in u k ). Substituting)....
View Full Document

This note was uploaded on 02/15/2011 for the course EECS 6.231 taught by Professor Bertsekas during the Spring '10 term at MIT.

Page1 / 3

sol4 - EE292 Spring 2006 Analysis & Control of...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online