N 25 n is the number of iterations main body of

This preview shows page 5 - 9 out of 11 pages.

n = 25; %n is the number of iterations %main body of algorithm pi = [0,0,0,0,0]’; for iter=1: n Q = zeros(5); for x = 0:4 for y = 0:4 Q(x+1,y+1) = P(pi(x+1)+1, x+1,y+1); end end r = zeros(5,1); for x= 0:4 r(x+1,1) = reward(x+1,pi(x+1)+1); end u = (eye(5) - dsc_rate * Q)\r; R = zeros(5); for x=0:4 for a=0:4 p0 = zeros(1,5); p0 (1,:) = P(a+1,x+1,:); R(x+1,a+1) = reward(x+1,a+1) + dsc_rate * p0 * u; end end % get new policy [v , pi] = max(R,[],2); %take the row maximum pi = pi -1; end (b) Based on the code implementation below, the optimal policy is: π (0) = 4 , π (1) = 0 , π (2) = 0 , π (3) = 0 , π (4) = 0 MATLAB code for the Liner Program method: %reward function reward = zeros(5,5); 5
Copyright c Peter Glynn All rights reserved. MS&E 221 Spring 2020 for x = 0:4 for a = 0:4 u = min(x + a,4); if u==0 continue end for k=0:(u-1) reward(x+1,a+1) = reward(x+1,a+1) + 100*k*poisspdf(k,2); end reward(x+1,a+1) = reward(x+1,a+1) + 100*u*(1 - poisscdf(u-1,2)); if a>0 reward(x+1,a+1) = reward(x+1,a+1) - 100 - 50*a; end end end %transitions P = zeros(5,5,5); for a=0:4 for x=0:4 u = min(x+a,4); for d=0:u next_x = max(u - d,0); P(a+1,x+1,next_x+1) = P(a+1,x+1,next_x+1) + poisspdf(d,2); end P(a+1,x+1,0+1) = P(a+1,x+1,0+1) + (1- poisscdf(u,2)); end end dsc_rate = 0.99; %discount rate f = ones(5,1); Q1 = zeros(5); Q1(:,:) = P(1,:,:); A1 = eye(5) - dsc_rate * Q1; r1 = reward(:,1); Q2 = zeros(5); Q2(:,:) = P(2,:,:); A2 = eye(5) - dsc_rate * Q2; r2 = reward(:,2); Q3 = zeros(5); 6
Copyright c Peter Glynn All rights reserved. MS&E 221 Spring 2020 Q3(:,:) = P(3,:,:); A3 = eye(5) - dsc_rate * Q3; r3 = reward(:,3); Q4 = zeros(5); Q4(:,:) = P(4,:,:); A4 = eye(5) - dsc_rate * Q4; r4 = reward(:,4); Q5 = zeros(5); Q5(:,:) = P(5,:,:); A5 = eye(5) - dsc_rate * Q5; r5 = reward(:,5); A = [A1; A2; A3; A4; A5]; b = [r1; r2; r3; r4; r5]; v = linprog(f,(-1)*A,(-1)*b) Question 4.3 (Optimal stopping Time) : We wish to compute an optimal stopping time policy T for a problem to maximize T - 1 X j =0 d ( X j ) + r ( X T ) , where ( X n , n 0) is a Markov chain with finite state space S . This type of problem arises in the pricing of American put options, since one needs to pay back any dividends received prior to the exercise of the put. (a) Write down the Bellman optimality equation for the value function of the problem. (b) Write down a linear program for which the solution will be the value function. Answer:
7
Copyright c Peter Glynn All rights reserved. MS&E 221 Spring 2020 (b) The linear program is min x S v ( x ) s.t. v ( x ) r ( x ) for all x S v ( x ) d ( x ) + y S p ( x, y ) v ( y ) for all x S Question 4.4 (Visit to Clinic) : Suppose that you have the flu and are waiting at a clinic to see a doctor. There are k

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture