{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

# sol8 - EE292 Analysis Control of Markov Chains Prof Ben Van...

This preview shows pages 1–2. Sign up to view the full content.

EE292 Spring 2006 June 9 , 2006 Prof. Ben Van Roy Homework Assignment 8 : Solutions 1 a) We consider each customer as a bandit so that we have i x i bandits in all. The state of a bandit is simply the queue the bandit belongs to (so that a bandit can be in one of n + 1 possible states). The probability that a bandit in state i transitions to state j is simply p ij for i > 0 . p 00 = 1 . The bandit transitions only upon being pulled after some random service time whose distribution is exponential with mean τ k where k is the queue the bandit is in. Now as opposed to ﬁnding a policy that minimizes E [ R 0 e - βt ( i g i x i ( t )) dt ] one can consider maximizing E [ R 0 e - βt n i =0 ( g max - g i ) x i ( t ) dt ] . This in turn is equivalent to maximizing rewards for a problem wherein a bandit upon being pulled in state k receives the random reward g ( x k ) = e - βT k ( g k - G k ) where T k is exponentially distributed with mean τ k and

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### Page1 / 2

sol8 - EE292 Analysis Control of Markov Chains Prof Ben Van...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online