**Unformatted text preview: **Problem Set 6 MS&E 221
Due: Friday, March 1 11:59PM
(Retailer): Consider a retailer selling a product for which the sales price per item
is $100. The item costs the retailer $50 per item, and deliveries cost $100 per delivery. The demand
is Poisson distributed with a mean of 2 per day. The retailer's store capacity for this item is 4 units.
When the retailer orders additional items at the end of each day, the items are delivered to the store
by the time at the next day's opening. The retailer is seeking an optimal inventory ordering policy.
The retailer uses an innite horizon discounted cash ow formulation with the discount rate α
such that eα = 0.99. Question 6.1 (a) Use either value or policy iteration for n = 25, 100, 400 iterations to compute an approximate
optimal policy. Based on the policy you derived, how many items should be ordered with
0, 1, 2, 3, 4 items on hand at the end of a retail day. (In your write-up, specify the state space,
the actions, the reward function and how states transition as a function of the actions.)
(b) Use the primal linear program discussed in class to compute the optimal policy and compare it
with the policy you derived in the previous part.
Answer: Dene Xn as the number of units at the end of day n − 1. So, the state space is {0, 1, 2, 3, 4}. The action set is {0, 1, 2, 3, 4} where each set member presents the number of orders.
The reward function is: 0 for a = 0, x = 0 Pmin(x+a,4)−1 −2 2k
Pmin(x+a,4)−1 −2 2k
100
ke
+
min
(x
+
a,
4)(1
−
e k! )
for a = 0, x 6= 0
r(x, a) =
k=0
k!
Pk=0
P k
k
min(x+a,4)−1
min(x+a,4)−1
100
e−2 2k! ) − 100 − 50a
for a > 0
ke−2 2k! + min(x + a, 4)(1 − k=0
k=0 where x presents the state and a the action.
Let an be the action for day n − 1 and dn be the demand on day n.
Then Xn+1 = max(min(Xn + an , 4) − dn , 0).
(a) Based on the code implementation below for value iteration, the approximate optimal policy
using value iteration is:
• For n = 25: π(0) = 4, π(1) = 0, π(2) = 0, π(3) = 0, π(4) = 0 1 • For n = 100: π(0) = 4, π(1) = 0, π(2) = 0, π(3) = 0, π(4) = 0
• For n = 400: π(0) = 4, π(1) = 0, π(2) = 0, π(3) = 0, π(4) = 0 Based on the code implementation below for policy iteration, the approximate optimal policy
using policy iteration is:
• For n = 25: π(0) = 4, π(1) = 0, π(2) = 0, π(3) = 0, π(4) = 0
• For n = 100: π(0) = 4, π(1) = 0, π(2) = 0, π(3) = 0, π(4) = 0
• For n = 400: π(0) = 4, π(1) = 0, π(2) = 0, π(3) = 0, π(4) = 0 MATLAB Code for Value Iteration:
%reward function
reward = zeros(5,5);
for x = 0:4
for a = 0:4
u = min(x + a,4);
if u==0
continue
end
for k=0:(u-1)
reward(x+1,a+1) = reward(x+1,a+1) + 100*k*poisspdf(k,2);
end
reward(x+1,a+1) = reward(x+1,a+1) + 100*u*(1 - poisscdf(u-1,2));
if a>0
reward(x+1,a+1) = reward(x+1,a+1) - 100 - 50*a;
end
end
end
%transitions
P = zeros(5,5,5);
for a=0:4
for x=0:4
u = min(x+a,4);
for d=0:u
next_x = max(u - d,0);
P(a+1,x+1,next_x+1) = P(a+1,x+1,next_x+1) + poisspdf(d,2);
end
P(a+1,x+1,0+1) = P(a+1,x+1,0+1) + (1- poisscdf(u,2));
end
end
dsc_rate = 0.99; %discount rate 2 n = 400; % n is the number of iterations %main body of algorithm
v = [0, 0, 0, 0, 0]’; %initialization
R= zeros(5);
for iter= 1: n
%create a 5,5 matrix where (i,j)the element
%is the RHS of the Bellman equation
R= zeros(5);
for x=0:4 %the nested loop can be optimized using vector representation
for a=0:4
p0 = zeros(1,5);
p0 (1,:) = P(a+1,x+1,:);
R(x+1,a+1) = reward (x+1,a+1) + dsc_rate * p0 * v;
end
end
v = max(R,,2) %take the row maximum
end MATLAB Code for Policy Iteration:
%reward function
reward = zeros(5,5);
for x = 0:4
for a = 0:4
u = min(x + a,4);
if u==0
continue
end
for k=0:(u-1)
reward(x+1,a+1) = reward(x+1,a+1) + 100*k*poisspdf(k,2);
end
reward(x+1,a+1) = reward(x+1,a+1) + 100*u*(1 - poisscdf(u-1,2));
if a>0
reward(x+1,a+1) = reward(x+1,a+1) - 100 - 50*a;
end
end
end
%transitions
P = zeros(5,5,5);
for a=0:4
for x=0:4 3 u = min(x+a,4);
for d=0:u
next_x = max(u - d,0);
P(a+1,x+1,next_x+1) = P(a+1,x+1,next_x+1) + poisspdf(d,2);
end
P(a+1,x+1,0+1) = P(a+1,x+1,0+1) + (1- poisscdf(u,2));
end
end
dsc_rate = 0.99; %discount rate
n = 25; %n is the number of iterations %main body of algorithm
pi = [0,0,0,0,0]’;
for iter=1: n
Q = zeros(5);
for x = 0:4
for y = 0:4
Q(x+1,y+1) = P(pi(x+1)+1, x+1,y+1);
end
end
r = zeros(5,1);
for x= 0:4
r(x+1,1) = reward(x+1,pi(x+1)+1);
end
u = (eye(5) - dsc_rate * Q)\r;
R = zeros(5);
for x=0:4
for a=0:4
p0 = zeros(1,5);
p0 (1,:) = P(a+1,x+1,:);
R(x+1,a+1) = reward(x+1,a+1) + dsc_rate * p0 * u;
end
end
% get new policy
[v , pi] = max(R,,2); %take the row maximum
pi = pi -1;
end 4 (b) Based on the code implementation below, the optimal policy is:
π(0) = 4, π(1) = 0, π(2) = 0, π(3) = 0, π(4) = 0 MATLAB code for the Liner Program method:
%reward function
reward = zeros(5,5);
for x = 0:4
for a = 0:4
u = min(x + a,4);
if u==0
continue
end
for k=0:(u-1)
reward(x+1,a+1) = reward(x+1,a+1) + 100*k*poisspdf(k,2);
end
reward(x+1,a+1) = reward(x+1,a+1) + 100*u*(1 - poisscdf(u-1,2));
if a>0
reward(x+1,a+1) = reward(x+1,a+1) - 100 - 50*a;
end
end
end
%transitions
P = zeros(5,5,5);
for a=0:4
for x=0:4
u = min(x+a,4);
for d=0:u
next_x = max(u - d,0);
P(a+1,x+1,next_x+1) = P(a+1,x+1,next_x+1) + poisspdf(d,2);
end
P(a+1,x+1,0+1) = P(a+1,x+1,0+1) + (1- poisscdf(u,2));
end
end
dsc_rate = 0.99; %discount rate
f = ones(5,1);
Q1 = zeros(5);
Q1(:,:) = P(1,:,:);
A1 = eye(5) - dsc_rate * Q1;
r1 = reward(:,1); 5 Q2 = zeros(5);
Q2(:,:) = P(2,:,:);
A2 = eye(5) - dsc_rate * Q2;
r2 = reward(:,2);
Q3 = zeros(5);
Q3(:,:) = P(3,:,:);
A3 = eye(5) - dsc_rate * Q3;
r3 = reward(:,3);
Q4 = zeros(5);
Q4(:,:) = P(4,:,:);
A4 = eye(5) - dsc_rate * Q4;
r4 = reward(:,4);
Q5 = zeros(5);
Q5(:,:) = P(5,:,:);
A5 = eye(5) - dsc_rate * Q5;
r5 = reward(:,5);
A = [A1; A2; A3; A4; A5];
b = [r1; r2; r3; r4; r5];
v = linprog(f,(-1)*A,(-1)*b) (Optimal stopping Time):
T for a problem to maximize
Question 6.2 T
−1
X We wish to compute an optimal stopping time policy d(Xj ) + r(XT ), j=0 where (Xn , n ≥ 0) is a Markov chain with nite state space S . This type of problem arises in the
pricing of American put options, since one needs to pay back any dividends received prior to the
exercise of the put.
(a) Write down the Bellman optimality equation for the value function of the problem.
(b) Write down a linear program for which the solution will be the value function.
Answer: (a) Dene v(x) = supT Ex hP
T −1
i
d(X
)
+
r(X
)
. Then the bellman optimality equation is
j
T
j=0
X
v(x) = max(r(x), d(x) +
p(x, y)v(y))
y∈S 6 where p(·, ·) presents the probability of the markov chain (Xn )n transitioning from state x to y .
Then for stopping states, v(x) = r(x). P
For non-stopping states, v(x) = d(x) + y∈S p(x, y)v(y).
(b) The linear program is
P min x∈S v(x) s.t. v(x) ≥ r(x) for
P all x ∈ S
v(x) ≥ d(x) + y∈S p(x, y)v(y) for all x ∈ S Question 6.3 (Visit to Clinic):
Suppose that you have the u and are waiting at a clinic to
see a doctor. There are k ≥ 2 doctors who are all currently seeing a patient. Each doctor i takes
random amount of time Ti for each patient for i = 1, . . . , k. Suppose that Ti 's are independent with
Ti ∼ Expo(λi ) for some λi > 0, i = 1, . . . , k .
You need to provide your reasoning for the following questions (no points will be given otherwise). (a) Let Ri be the remaining time for doctor i to nish treating their current patient. For each
i = 1, . . . , k , what is the distribution of Ri ?
(b) Let W be the time you have to wait until you see a doctor. What is the distribution of W ?
(c) What is the probability that doctor 1 nishes treating her current patient before doctor 2 does?
(d) What is the probability that doctor i is the rst to nish treating her current patient?
(e) What is the expected total amount of time you spend at the clinic? (assume that you leave
immediately after a doctor treats you.)
Answer: (a) By the memoryless property, we have Ri ∼ exp(λi ) for i = 1, . . . , k.
(b) We have W = min1≤i≤k Ri . Since Ri 's are independent, we have from part (a) that
P(W ≥ t) = P( min Ri ≥ t) = P(Ri ≥ t for all 1 ≤ i ≤ k)
1≤i≤k = k
Y
i=1 Hence, W ∼ Expo P k
i=1 λi P(Ri ≥ t) = k
Y exp(−λi t) = exp − i=1 k
X !
λi t . i=1 . (c) We wish to calculate P(R1 < R2 ). By conditioning on R1 , we have
Z ∞
P(R1 < R2 |R1 = t)λ1 e−λ1 t dt =
P(t < R2 )λ1 e−λ1 t dt
0
Z0 ∞
Z ∞
λ1
−λ2 t
−λ1 t
−(λ1 +λ2 )t
=
e
λ1 e
dt =
λ1 e
dt =
λ
1 + λ2
0
0
Z ∞ P(R1 < R2 ) = 7 (d) We wish to calculate P(Ri ≤ S\i ) where S\i := minj6=i Ri . Noting that S\i ∼ Expo
from a similar logic as in part (b), we conclude as in part (c) that
λi
P(Ri ≤ S\i ) = Pk j=1 λj P j6=i λj . (e) First, we condition on Ri < minj6=i Rj .
E Time spent in clinicRi < min Rj = E W Ri < min Rj + E Ti Ri < min Rj
j6=i
j6=i
j6=i
= E Ri Ri < min Rj + E [Ti ]
j6=i 1 1
= Pk
+
λi
i=1 λi where in the last equality, we used the fact that Ri = W if Ri < minj6=i Rj and part (b). Hence,
we have
E [Time spent in clinic] =
= k
X
E Time spent in clinicRi < min Rj P Ri < min Rj
j6=i i=1
k
X
i=1 1
Pk j=1 λj + 1
λi ! j6=i λi
Pk j=1 λj k+1
= Pk
.
j=1 λj (Manufacturing): Consider a manufacturing process with batches of raw materials
coming in. Suppose that the interarrival times of batches are i.i.d. exponential random variables
with rate λ and their processing times are i.i.d exponential random variables with rate µ. Due to
nancial constraints, there is only one machine which can only process one batch of raw materials
at any given instance. We assume that once a batch gets processed by the machine, it leaves the
system immediately. Question 6.4 Let X(t) denote the number of batches in the system at time t and let Tj := inf{t ≥ 0 : X(t) = j}
be the hitting time to have j batches in the system. Let Pi,j be the probability that when the
number of batches in the system changes from i, it moves to j batches. Clearly, we have Pi,j = 0
for j 6= i − 1, i + 1.
(a) Derive an expression for Pi,j in terms of λ and µ.
(b) Using rst-step analysis, derive an expression for Ei [Tj ] in terms of Ei+1 [Tj ], and Ei−1 [Tj ] (and
λ, µ). 8 (c) What is E0 [T1 ]? Based on your answer and the relationship derived in part (b), show that
i 1 X µ k
Ei [Ti+1 ] =
.
λ
λ
k=0 (d) Let λ = 2.0 and µ = 3.0. Starting with no raw material in the system, what is the expected
amount time until the system has j ≥ 1 batches in the system? Explain your reasoning carefully.
Answer:
Let X(t) denote the number of batches in the system at time t. We note that this is a
M/M/1 queueing model as covered in class. Let Ti,j be the time it takes to have j batches in the
system, starting from i batches in the system. Note that E[Ti,j ] = Ei [Tj ]. (a) Let S1 , S2 be independent exponential random variables with respective rates λ and µ. Then,
since arrivals propel the queue upwards by one slot, we have
∞ Z P(t < S2 |S1 = t)λe Pi,i+1 = P(S1 < S2 ) = −λt Z 0 Similarly, we have Pi,i−1 = ∞ dt = λe−(λ+µ)t dt = 0 λ
.
λ+µ µ
λ+µ . (b) Starting from i, let τi be the time at which the Markov chain X(t) leaves state i. Letting S1 , S2
D
be as in part (a), we have τi = min(S1 , S2 ). By conditioning on the rst transition, we have for
i≥1
Ei [Tj ] = Ei [Tj |X(τi ) = i + 1]Pi (X(τi ) = i + 1) + Ei [Tj |X(τi ) = i − 1]Pi (X(τi ) = i − 1)
= Ei [Tj |X(τi ) = i + 1]Pi,i+1 + Ei [Tj |X(τi ) = i − 1]Pi,i−1
= (E[min(S1 , S2 )| min(S1 , S2 ) = S1 ] + Ei [Tj − τi |X(τi ) = i + 1]) Pi,i+1
+ (E[min(S1 , S2 )| min(S1 , S2 ) = S2 ] + Ei [Tj − τi |X(τi ) = i − 1]) Pi,i−1
= E[min(S1 , S2 )] + Ei [Tj − τi |X(τi ) = i + 1]Pi,i+1 + Ei [Tj − τi |X(τi ) = i − 1]Pi,i−1 .
1
since min(S1 , S2 ) ∼ Exp(λ + µ). Applying the strong
Now, note that E[min(S1 , S2 )] = λ+µ
Markov property in the preceeding display, we obtain Ei [Tj ] = λ
µ
1
+
Ei+1 [Tj ] +
Ei−1 [Tj ].
λ+µ λ+µ
λ+µ (c) We have E0 [T1 ] = E[S1 ] = λ1 . From part (a), we obtain
1
λ
µ
+
·0+
· E0 [T2 ]
λ+µ λ+µ
λ+µ
1
µ
=
+
· (E0 [T1 ] + E1 [T2 ])
λ+µ λ+µ
1
µ
1
=
+
·
+ E1 [T2 ] .
λ+µ λ+µ
λ E1 [T2 ] = Rearranging, we have 1
µ
E1 [T2 ] =
1+
.
λ
λ 9 In general, note that
1
λ
µ
+
·0+
· Ei−1 [Ti+1 ]
λ+µ λ+µ
λ+µ
1
µ
=
+
· (Ei−1 [Ti ] + Ei [Ti+1 ])
λ+µ λ+µ Ei [Ti+1 ] = Rearranging,
Ei [Ti+1 ] = 1
(1 + µEi−1 [Ti−1,i ]) .
λ We recursively conclude that i Ei [Ti,i+1 ] = 1 X µ i
.
λ
λ
i=0 (d) We wish to compute E0 [Tj ]. By the strong Markov property, we have
E0 [Tj ] = E0 [T1 ] + · · · + Ej−1 [Tj ] = j−1
X Ek [Tk+1 ]. (1) k=0 Note from part (c) that (here we used the formula i+1
1 − µλ
Ei [Ti+1 ] =
λ−µ
Pn−1
k=0 r= 1−rn
1−r .) E0 [Tj ] = 1 j−
λ−µ Plugging this expression into (1), we conclude that
µ 1− µ j
λ λ−µ 10 = 3 · 1.5j − j − 3. ...

View
Full Document