06-ex-solution.pdf - Problem Set 6 MS&E 221 Due Friday...

This preview shows page 1 out of 10 pages.

Unformatted text preview: Problem Set 6 MS&E 221 Due: Friday, March 1 11:59PM (Retailer): Consider a retailer selling a product for which the sales price per item is $100. The item costs the retailer $50 per item, and deliveries cost $100 per delivery. The demand is Poisson distributed with a mean of 2 per day. The retailer's store capacity for this item is 4 units. When the retailer orders additional items at the end of each day, the items are delivered to the store by the time at the next day's opening. The retailer is seeking an optimal inventory ordering policy. The retailer uses an innite horizon discounted cash ow formulation with the discount rate α such that eα = 0.99. Question 6.1 (a) Use either value or policy iteration for n = 25, 100, 400 iterations to compute an approximate optimal policy. Based on the policy you derived, how many items should be ordered with 0, 1, 2, 3, 4 items on hand at the end of a retail day. (In your write-up, specify the state space, the actions, the reward function and how states transition as a function of the actions.) (b) Use the primal linear program discussed in class to compute the optimal policy and compare it with the policy you derived in the previous part. Answer: Dene Xn as the number of units at the end of day n − 1. So, the state space is {0, 1, 2, 3, 4}. The action set is {0, 1, 2, 3, 4} where each set member presents the number of orders. The reward function is: 0  for a = 0, x = 0 Pmin(x+a,4)−1 −2 2k Pmin(x+a,4)−1 −2 2k  100 ke + min (x + a, 4)(1 − e k! ) for a = 0, x 6= 0 r(x, a) = k=0 k! Pk=0  P k k min(x+a,4)−1 min(x+a,4)−1 100 e−2 2k! ) − 100 − 50a for a > 0 ke−2 2k! + min(x + a, 4)(1 − k=0 k=0 where x presents the state and a the action. Let an be the action for day n − 1 and dn be the demand on day n. Then Xn+1 = max(min(Xn + an , 4) − dn , 0). (a) Based on the code implementation below for value iteration, the approximate optimal policy using value iteration is: • For n = 25: π(0) = 4, π(1) = 0, π(2) = 0, π(3) = 0, π(4) = 0 1 • For n = 100: π(0) = 4, π(1) = 0, π(2) = 0, π(3) = 0, π(4) = 0 • For n = 400: π(0) = 4, π(1) = 0, π(2) = 0, π(3) = 0, π(4) = 0 Based on the code implementation below for policy iteration, the approximate optimal policy using policy iteration is: • For n = 25: π(0) = 4, π(1) = 0, π(2) = 0, π(3) = 0, π(4) = 0 • For n = 100: π(0) = 4, π(1) = 0, π(2) = 0, π(3) = 0, π(4) = 0 • For n = 400: π(0) = 4, π(1) = 0, π(2) = 0, π(3) = 0, π(4) = 0 MATLAB Code for Value Iteration: %reward function reward = zeros(5,5); for x = 0:4 for a = 0:4 u = min(x + a,4); if u==0 continue end for k=0:(u-1) reward(x+1,a+1) = reward(x+1,a+1) + 100*k*poisspdf(k,2); end reward(x+1,a+1) = reward(x+1,a+1) + 100*u*(1 - poisscdf(u-1,2)); if a>0 reward(x+1,a+1) = reward(x+1,a+1) - 100 - 50*a; end end end %transitions P = zeros(5,5,5); for a=0:4 for x=0:4 u = min(x+a,4); for d=0:u next_x = max(u - d,0); P(a+1,x+1,next_x+1) = P(a+1,x+1,next_x+1) + poisspdf(d,2); end P(a+1,x+1,0+1) = P(a+1,x+1,0+1) + (1- poisscdf(u,2)); end end dsc_rate = 0.99; %discount rate 2 n = 400; % n is the number of iterations %main body of algorithm v = [0, 0, 0, 0, 0]’; %initialization R= zeros(5); for iter= 1: n %create a 5,5 matrix where (i,j)the element %is the RHS of the Bellman equation R= zeros(5); for x=0:4 %the nested loop can be optimized using vector representation for a=0:4 p0 = zeros(1,5); p0 (1,:) = P(a+1,x+1,:); R(x+1,a+1) = reward (x+1,a+1) + dsc_rate * p0 * v; end end v = max(R,,2) %take the row maximum end MATLAB Code for Policy Iteration: %reward function reward = zeros(5,5); for x = 0:4 for a = 0:4 u = min(x + a,4); if u==0 continue end for k=0:(u-1) reward(x+1,a+1) = reward(x+1,a+1) + 100*k*poisspdf(k,2); end reward(x+1,a+1) = reward(x+1,a+1) + 100*u*(1 - poisscdf(u-1,2)); if a>0 reward(x+1,a+1) = reward(x+1,a+1) - 100 - 50*a; end end end %transitions P = zeros(5,5,5); for a=0:4 for x=0:4 3 u = min(x+a,4); for d=0:u next_x = max(u - d,0); P(a+1,x+1,next_x+1) = P(a+1,x+1,next_x+1) + poisspdf(d,2); end P(a+1,x+1,0+1) = P(a+1,x+1,0+1) + (1- poisscdf(u,2)); end end dsc_rate = 0.99; %discount rate n = 25; %n is the number of iterations %main body of algorithm pi = [0,0,0,0,0]’; for iter=1: n Q = zeros(5); for x = 0:4 for y = 0:4 Q(x+1,y+1) = P(pi(x+1)+1, x+1,y+1); end end r = zeros(5,1); for x= 0:4 r(x+1,1) = reward(x+1,pi(x+1)+1); end u = (eye(5) - dsc_rate * Q)\r; R = zeros(5); for x=0:4 for a=0:4 p0 = zeros(1,5); p0 (1,:) = P(a+1,x+1,:); R(x+1,a+1) = reward(x+1,a+1) + dsc_rate * p0 * u; end end % get new policy [v , pi] = max(R,,2); %take the row maximum pi = pi -1; end 4 (b) Based on the code implementation below, the optimal policy is: π(0) = 4, π(1) = 0, π(2) = 0, π(3) = 0, π(4) = 0 MATLAB code for the Liner Program method: %reward function reward = zeros(5,5); for x = 0:4 for a = 0:4 u = min(x + a,4); if u==0 continue end for k=0:(u-1) reward(x+1,a+1) = reward(x+1,a+1) + 100*k*poisspdf(k,2); end reward(x+1,a+1) = reward(x+1,a+1) + 100*u*(1 - poisscdf(u-1,2)); if a>0 reward(x+1,a+1) = reward(x+1,a+1) - 100 - 50*a; end end end %transitions P = zeros(5,5,5); for a=0:4 for x=0:4 u = min(x+a,4); for d=0:u next_x = max(u - d,0); P(a+1,x+1,next_x+1) = P(a+1,x+1,next_x+1) + poisspdf(d,2); end P(a+1,x+1,0+1) = P(a+1,x+1,0+1) + (1- poisscdf(u,2)); end end dsc_rate = 0.99; %discount rate f = ones(5,1); Q1 = zeros(5); Q1(:,:) = P(1,:,:); A1 = eye(5) - dsc_rate * Q1; r1 = reward(:,1); 5 Q2 = zeros(5); Q2(:,:) = P(2,:,:); A2 = eye(5) - dsc_rate * Q2; r2 = reward(:,2); Q3 = zeros(5); Q3(:,:) = P(3,:,:); A3 = eye(5) - dsc_rate * Q3; r3 = reward(:,3); Q4 = zeros(5); Q4(:,:) = P(4,:,:); A4 = eye(5) - dsc_rate * Q4; r4 = reward(:,4); Q5 = zeros(5); Q5(:,:) = P(5,:,:); A5 = eye(5) - dsc_rate * Q5; r5 = reward(:,5); A = [A1; A2; A3; A4; A5]; b = [r1; r2; r3; r4; r5]; v = linprog(f,(-1)*A,(-1)*b) (Optimal stopping Time): T for a problem to maximize Question 6.2 T −1 X We wish to compute an optimal stopping time policy d(Xj ) + r(XT ), j=0 where (Xn , n ≥ 0) is a Markov chain with nite state space S . This type of problem arises in the pricing of American put options, since one needs to pay back any dividends received prior to the exercise of the put. (a) Write down the Bellman optimality equation for the value function of the problem. (b) Write down a linear program for which the solution will be the value function. Answer: (a) Dene v(x) = supT Ex hP T −1  i d(X ) + r(X ) . Then the bellman optimality equation is j T j=0 X v(x) = max(r(x), d(x) + p(x, y)v(y)) y∈S 6 where p(·, ·) presents the probability of the markov chain (Xn )n transitioning from state x to y . Then for stopping states, v(x) = r(x). P For non-stopping states, v(x) = d(x) + y∈S p(x, y)v(y). (b) The linear program is P min x∈S v(x) s.t. v(x) ≥ r(x) for P all x ∈ S v(x) ≥ d(x) + y∈S p(x, y)v(y) for all x ∈ S Question 6.3 (Visit to Clinic): Suppose that you have the u and are waiting at a clinic to see a doctor. There are k ≥ 2 doctors who are all currently seeing a patient. Each doctor i takes random amount of time Ti for each patient for i = 1, . . . , k. Suppose that Ti 's are independent with Ti ∼ Expo(λi ) for some λi > 0, i = 1, . . . , k . You need to provide your reasoning for the following questions (no points will be given otherwise). (a) Let Ri be the remaining time for doctor i to nish treating their current patient. For each i = 1, . . . , k , what is the distribution of Ri ? (b) Let W be the time you have to wait until you see a doctor. What is the distribution of W ? (c) What is the probability that doctor 1 nishes treating her current patient before doctor 2 does? (d) What is the probability that doctor i is the rst to nish treating her current patient? (e) What is the expected total amount of time you spend at the clinic? (assume that you leave immediately after a doctor treats you.) Answer: (a) By the memoryless property, we have Ri ∼ exp(λi ) for i = 1, . . . , k. (b) We have W = min1≤i≤k Ri . Since Ri 's are independent, we have from part (a) that P(W ≥ t) = P( min Ri ≥ t) = P(Ri ≥ t for all 1 ≤ i ≤ k) 1≤i≤k = k Y i=1 Hence, W ∼ Expo P k i=1 λi  P(Ri ≥ t) = k Y exp(−λi t) = exp − i=1 k X ! λi t . i=1 . (c) We wish to calculate P(R1 < R2 ). By conditioning on R1 , we have Z ∞ P(R1 < R2 |R1 = t)λ1 e−λ1 t dt = P(t < R2 )λ1 e−λ1 t dt 0 Z0 ∞ Z ∞ λ1 −λ2 t −λ1 t −(λ1 +λ2 )t = e λ1 e dt = λ1 e dt = λ 1 + λ2 0 0 Z ∞ P(R1 < R2 ) = 7 (d) We wish to calculate P(Ri ≤ S\i ) where S\i := minj6=i Ri . Noting that S\i ∼ Expo from a similar logic as in part (b), we conclude as in part (c) that λi P(Ri ≤ S\i ) = Pk j=1 λj P j6=i λj  . (e) First, we condition on Ri < minj6=i Rj .      E Time spent in clinic Ri < min Rj = E W Ri < min Rj + E Ti Ri < min Rj j6=i j6=i j6=i   = E Ri Ri < min Rj + E [Ti ]  j6=i 1 1 = Pk + λi i=1 λi where in the last equality, we used the fact that Ri = W if Ri < minj6=i Rj and part (b). Hence, we have E [Time spent in clinic] = = k X     E Time spent in clinic Ri < min Rj P Ri < min Rj j6=i i=1 k X i=1 1 Pk j=1 λj + 1 λi ! j6=i λi Pk j=1 λj k+1 = Pk . j=1 λj (Manufacturing): Consider a manufacturing process with batches of raw materials coming in. Suppose that the interarrival times of batches are i.i.d. exponential random variables with rate λ and their processing times are i.i.d exponential random variables with rate µ. Due to nancial constraints, there is only one machine which can only process one batch of raw materials at any given instance. We assume that once a batch gets processed by the machine, it leaves the system immediately. Question 6.4 Let X(t) denote the number of batches in the system at time t and let Tj := inf{t ≥ 0 : X(t) = j} be the hitting time to have j batches in the system. Let Pi,j be the probability that when the number of batches in the system changes from i, it moves to j batches. Clearly, we have Pi,j = 0 for j 6= i − 1, i + 1. (a) Derive an expression for Pi,j in terms of λ and µ. (b) Using rst-step analysis, derive an expression for Ei [Tj ] in terms of Ei+1 [Tj ], and Ei−1 [Tj ] (and λ, µ). 8 (c) What is E0 [T1 ]? Based on your answer and the relationship derived in part (b), show that i 1 X  µ k Ei [Ti+1 ] = . λ λ k=0 (d) Let λ = 2.0 and µ = 3.0. Starting with no raw material in the system, what is the expected amount time until the system has j ≥ 1 batches in the system? Explain your reasoning carefully. Answer: Let X(t) denote the number of batches in the system at time t. We note that this is a M/M/1 queueing model as covered in class. Let Ti,j be the time it takes to have j batches in the system, starting from i batches in the system. Note that E[Ti,j ] = Ei [Tj ]. (a) Let S1 , S2 be independent exponential random variables with respective rates λ and µ. Then, since arrivals propel the queue upwards by one slot, we have ∞ Z P(t < S2 |S1 = t)λe Pi,i+1 = P(S1 < S2 ) = −λt Z 0 Similarly, we have Pi,i−1 = ∞ dt = λe−(λ+µ)t dt = 0 λ . λ+µ µ λ+µ . (b) Starting from i, let τi be the time at which the Markov chain X(t) leaves state i. Letting S1 , S2 D be as in part (a), we have τi = min(S1 , S2 ). By conditioning on the rst transition, we have for i≥1 Ei [Tj ] = Ei [Tj |X(τi ) = i + 1]Pi (X(τi ) = i + 1) + Ei [Tj |X(τi ) = i − 1]Pi (X(τi ) = i − 1) = Ei [Tj |X(τi ) = i + 1]Pi,i+1 + Ei [Tj |X(τi ) = i − 1]Pi,i−1 = (E[min(S1 , S2 )| min(S1 , S2 ) = S1 ] + Ei [Tj − τi |X(τi ) = i + 1]) Pi,i+1 + (E[min(S1 , S2 )| min(S1 , S2 ) = S2 ] + Ei [Tj − τi |X(τi ) = i − 1]) Pi,i−1 = E[min(S1 , S2 )] + Ei [Tj − τi |X(τi ) = i + 1]Pi,i+1 + Ei [Tj − τi |X(τi ) = i − 1]Pi,i−1 . 1 since min(S1 , S2 ) ∼ Exp(λ + µ). Applying the strong Now, note that E[min(S1 , S2 )] = λ+µ Markov property in the preceeding display, we obtain Ei [Tj ] = λ µ 1 + Ei+1 [Tj ] + Ei−1 [Tj ]. λ+µ λ+µ λ+µ (c) We have E0 [T1 ] = E[S1 ] = λ1 . From part (a), we obtain 1 λ µ + ·0+ · E0 [T2 ] λ+µ λ+µ λ+µ 1 µ = + · (E0 [T1 ] + E1 [T2 ]) λ+µ λ+µ   1 µ 1 = + · + E1 [T2 ] . λ+µ λ+µ λ E1 [T2 ] = Rearranging, we have 1 µ E1 [T2 ] = 1+ . λ λ 9 In general, note that 1 λ µ + ·0+ · Ei−1 [Ti+1 ] λ+µ λ+µ λ+µ 1 µ = + · (Ei−1 [Ti ] + Ei [Ti+1 ]) λ+µ λ+µ Ei [Ti+1 ] = Rearranging, Ei [Ti+1 ] = 1 (1 + µEi−1 [Ti−1,i ]) . λ We recursively conclude that i Ei [Ti,i+1 ] = 1 X  µ i . λ λ i=0 (d) We wish to compute E0 [Tj ]. By the strong Markov property, we have E0 [Tj ] = E0 [T1 ] + · · · + Ej−1 [Tj ] = j−1 X Ek [Tk+1 ]. (1) k=0 Note from part (c) that (here we used the formula i+1 1 − µλ Ei [Ti+1 ] = λ−µ Pn−1 k=0 r= 1−rn 1−r .) E0 [Tj ] = 1 j− λ−µ Plugging this expression into (1), we conclude that  µ 1− µ j λ λ−µ 10  = 3 · 1.5j − j − 3. ...
View Full Document

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture