sol8 - EE292 Analysis & Control of Markov Chains Prof....

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
EE292 Spring 2006 June 9 , 2006 Prof. Ben Van Roy Homework Assignment 8 : Solutions 1 a) We consider each customer as a bandit so that we have i x i bandits in all. The state of a bandit is simply the queue the bandit belongs to (so that a bandit can be in one of n + 1 possible states). The probability that a bandit in state i transitions to state j is simply p ij for i > 0 . p 00 = 1 . The bandit transitions only upon being pulled after some random service time whose distribution is exponential with mean τ k where k is the queue the bandit is in. Now as opposed to finding a policy that minimizes E [ R 0 e - βt ( i g i x i ( t )) dt ] one can consider maximizing E [ R 0 e - βt n i =0 ( g max - g i ) x i ( t ) dt ] . This in turn is equivalent to maximizing rewards for a problem wherein a bandit upon being pulled in state k receives the random reward g ( x k ) = e - βT k ( g k - G k ) where T k is exponentially distributed with mean τ k and
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 02/15/2011 for the course EECS 6.231 taught by Professor Bertsekas during the Spring '10 term at MIT.

Page1 / 2

sol8 - EE292 Analysis & Control of Markov Chains Prof....

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online