{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

Computer Science 188 - Fall 1997 - Russell - Final Exam

Suppose now we wish to associate rewards with actions

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ppose now we wish to associate rewards with actions instead of states i.e., R(a i) is the reward for doing a in state i. How should the Bellman equation be rewritten to use R(a i) instead of R(i)? (e) (5) Can any nite search problem be translated exactly into a Markov decision problem, such that an optimal solution of the latter is also an optimal solution of the former? If so, explain precisely how to to translate the problem AND how to translate the solution back if not, explain precisely why not (e.g., give a counterexample). (f) (5) In this part we will apply the value iteration algorithm to the MDP that corresponds to the above search problem. Assume that each state has an initial value estimate of 0. Copy and complete the following table, showing the value of each state after each iteration and the optimal action choice given those values. Co...
View Full Document

{[ snackBarMessage ]}