EE292
Spring 2006
June 9 , 2006
Prof. Ben Van Roy
Homework Assignment 8 : Solutions
1
a)
We consider each customer as a bandit so that we have
∑
i
x
i
bandits in all. The state of a bandit
is simply the queue the bandit belongs to (so that a bandit can be in one of
n
+ 1
possible states).
The probability that a bandit in state
i
transitions to state
j
is simply
p
ij
for
i >
0
.
p
00
= 1
. The
bandit transitions only upon being pulled after some random service time whose distribution is
exponential with mean
τ
k
where
k
is the queue the bandit is in.
Now as opposed to ﬁnding a policy that minimizes
E
[
R
∞
0
e

βt
(
∑
i
g
i
x
i
(
t
))
dt
]
one can consider
maximizing
E
[
R
∞
0
e

βt
∑
n
i
=0
(
g
max

g
i
)
x
i
(
t
)
dt
]
.
This in turn is equivalent to maximizing rewards for a problem wherein a bandit upon being pulled
in state
k
receives the random reward
g
(
x
k
) =
e

βT
k
(
g
k

G
k
)
where
T
k
is exponentially distributed
with mean
τ
k
and
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '10
 Bertsekas
 Probability theory, Markov chain, 1 m, tk, BANDIT

Click to edit the document details