Unformatted text preview: ppose now we wish to associate rewards with actions instead of states i.e., R(a i) is the reward for doing a in state i. How should the Bellman equation be rewritten to use R(a i) instead of R(i)? (e) (5) Can any nite search problem be translated exactly into a Markov decision problem, such that an optimal solution of the latter is also an optimal solution of the former? If so, explain precisely how to to translate the problem AND how to translate the solution back if not, explain precisely why not (e.g., give a counterexample). (f) (5) In this part we will apply the value iteration algorithm to the MDP that corresponds to the above search problem. Assume that each state has an initial value estimate of 0. Copy and complete the following table, showing the value of each state after each iteration and the optimal action choice given those values. Co...
