hw9sol

# hw9sol - i 1 2 3 4 5 etc. state 1

This preview shows page 1. Sign up to view the full content.

Sample solution for MDP problem: With gamma = 1 (no discounting), value iteration gives something like this: i 0 1 2 3 4 5 etc. state 1 a1 0 -100 -100.1 * -100.1001 * -100.1001001 * -100.1001001 * a2 -90 * -101.079 -101.17901 -101.1891 -101.1891001 state 2 a3 0 -11 -11.09 -11.1001 -11.1001001 -11.1001001 state 3 = goal 0 0 0 0 0 0 So, in state 1, action 1 is preferred over action 2 But with gamma = 0.9, value iteration gives something like this:
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: i 1 2 3 4 5 etc. state 1 a1-100-100.081-100.089974-100.0900476-100.090055 a2-90 *-99.9711*-100.0529011* -100.0610766 * -100.0611468 * state 2 a3-11-11.081-11.08997399-11.09008098-11.09005497 state 3 = goal So, in state 1, action 2 is preferred over action 1 s 1 goal-90-11-100 0.001 0.999 0.999 0.001 s 2 s 3 a 3 a 2 a 1 0.999 0.001...
View Full Document

## This note was uploaded on 09/10/2008 for the course CS 460 taught by Professor Svenkoenig during the Fall '08 term at Urbana.

Ask a homework question - tutors are online