This preview shows page 1. Sign up to view the full content.
Unformatted text preview: ppose now we wish to associate rewards with actions instead of states i.e., R(a i) is the reward for doing a in state i. How should the Bellman equation be rewritten to use R(a i) instead of R(i)? (e) (5) Can any nite search problem be translated exactly into a Markov decision problem, such that an optimal solution of the latter is also an optimal solution of the former? If so, explain precisely how to to translate the problem AND how to translate the solution back if not, explain precisely why not (e.g., give a counterexample). (f) (5) In this part we will apply the value iteration algorithm to the MDP that corresponds to the above search problem. Assume that each state has an initial value estimate of 0. Copy and complete the following table, showing the value of each state after each iteration and the optimal action choice given those values. Co...
View
Full
Document
This note was uploaded on 05/17/2009 for the course CS 188 taught by Professor Staff during the Spring '08 term at University of California, Berkeley.
 Spring '08
 Staff
 Computer Science

Click to edit the document details