# hw6-program - if (next_x = = Gx and next_y = = Gy)...

This preview shows page 1. Sign up to view the full content.

Hint: The following is the pseudo code for this problem. Let’s say learning rate is α . Mx=5; My=4; % board size Initialize all Qs to 0; Gx=5; Gy=4; % goal γ =0.9; for t = 1 to some number % do the learning many times cur_x=1; cur_y=1; iter=0; do iter++; % randomly pick next action r = rand(0,1) if (r<0.25) % left if (cur_x = = 1) continue; % can’t move further else action=left; next_x=cur_x-1; next_y=cur_y; end then similar process for the other three actions: right, up, down.
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: if (next_x = = Gx and next_y = = Gy) reward=100; max_Q=0; else reward = 0, max_Q=max_a(Q(next_x, next_y, action)); Q(cur_x, cur_y, action) = (1-α )* Q(cur_x, cur_y, action) + α (reward + γ * max_Q) if (next_x = = Gx && next_y = = Gy) break ; cur_x=next_x; cur_y=next_y; end % print grid, optimal policy, Q value. For x=1 to Mx; y=1 to My mm=max_a(Q(x,y, action)) Print corresponding action to mm....
View Full Document

## This note was uploaded on 01/25/2012 for the course CS 6375 taught by Professor Yangliu during the Spring '09 term at University of Texas at Dallas, Richardson.

Ask a homework question - tutors are online