{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

sol8 - EE292 Analysis Control of Markov Chains Prof Ben Van...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
EE292 Spring 2006 June 9 , 2006 Prof. Ben Van Roy Homework Assignment 8 : Solutions 1 a) We consider each customer as a bandit so that we have i x i bandits in all. The state of a bandit is simply the queue the bandit belongs to (so that a bandit can be in one of n + 1 possible states). The probability that a bandit in state i transitions to state j is simply p ij for i > 0 . p 00 = 1 . The bandit transitions only upon being pulled after some random service time whose distribution is exponential with mean τ k where k is the queue the bandit is in. Now as opposed to finding a policy that minimizes E [ R 0 e - βt ( i g i x i ( t )) dt ] one can consider maximizing E [ R 0 e - βt n i =0 ( g max - g i ) x i ( t ) dt ] . This in turn is equivalent to maximizing rewards for a problem wherein a bandit upon being pulled in state k receives the random reward g ( x k ) = e - βT k ( g k - G k ) where T k is exponentially distributed with mean τ k and
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

Page1 / 2

sol8 - EE292 Analysis Control of Markov Chains Prof Ben Van...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon bookmark
Ask a homework question - tutors are online