This preview shows page 1. Sign up to view the full content.
Unformatted text preview: if (next_x = = Gx and next_y = = Gy) reward=100; max_Q=0; else reward = 0, max_Q=max_a(Q(next_x, next_y, action)); Q(cur_x, cur_y, action) = (1-α )* Q(cur_x, cur_y, action) + α (reward + γ * max_Q) if (next_x = = Gx && next_y = = Gy) break ; cur_x=next_x; cur_y=next_y; end % print grid, optimal policy, Q value. For x=1 to Mx; y=1 to My mm=max_a(Q(x,y, action)) Print corresponding action to mm....
View Full Document
This note was uploaded on 01/25/2012 for the course CS 6375 taught by Professor Yangliu during the Spring '09 term at University of Texas at Dallas, Richardson.
- Spring '09
- Machine Learning