Course Hero Logo

5 5 pts question 20 alice wishes to derive an

Course Hero uses AI to attempt to automatically extract content from documents to surface to you and others so you can study better, e.g., in search results, to enrich docs, and more. This preview shows page 8 - 11 out of 12 pages.

5 / 5 ptsQuestion 20Alice wishes to derive an algorithmic procedure that is analogous to Q-value iteration in this new formulation.For each letter (A), (B), (C) and (D), fill in a single entry for the term corresponding to the correct equation toimplement X-value iteration. N(s) refers to the neighboring states of states(i.e.: the set of states s’ that canpossibly be reached by taking some action from states). Select one bubble per row to form the wholeequation.New Formulation: X(s, s')(A) [(B) + (C)(D)]Choose the correct value for the above new formulation.For A, select the correct option: 21.2. 13.For B, select the correct option: 11.2. 03. 1For C, select the correct option: 11.2. 03. 1For D, select the correct option: 51.Typesetting math: 100%
Answer 1:Answer 2:Answer 3:Answer 4:2.3.4.5.6. V(s'')7. X(s', s'')8. X(s, s'')2115Conceptual questions about the new formulation1 / 1 ptsQuestion 21In a deterministic MDP like above, can value iteration with the new value functionX(s,s')learn the optimalpolicy?
2 / 2 ptsQuestion 22We can also extend this formulation of value iteration with the new value function,X(s,s'),to non-deterministic dynamicsT(s,a,s')as follows. For each transition (s,a, s',R(s,a,s'))we take, we update thecorrespondingX(s,s')value according to equation (2), where we replace all occurrences ofI(s,s')by theaction we actually take,a.After the X-values converge, we extract the policy with.Typesetting math: 100%
Willthis new X-value iteration process always converge to the same policy as vanilla value iteration inenvironments with non-deterministic dynamics inT(s,a,s')?NoYesPacman is taking Intro to Ghost Intelligence at a University, this semester. It is 7 days before the midterm,but Pacman is still procrastinating! Pacman still has 1 Electronic Homework (E),1 Written Homework (W),and 1 Project (P) to finish before the exam. Each of them takes 1 day to complete, and Pacman can onlywork on at most one task every day. Also, Pacman needs 2 days to review the course material before theexam(RIandR ).Pacman needs your help to assign the dates to complete these tasks!Pacman formulates the problem as a CSP, where the tasks (E,W, P, R, R)are variables, each with domain{1, ..., 7}, representing the seven days from now until the exam.Pacman wants the assignments of tasks to meet the following constraints:1. Each task(E,W, P, R, R) mustbe assigned to a different2. Both the Electronic Homework(E)and Project(P)are due in 4 days, so they must be finished in days 1,2, 3, or3. Since we useRandRto represent the first and the second day of reviewing for the exam, we assumeR<R ,and the two days for reviewing (R,R2)must alsonotbe4. Pacman must finish all the assignments (E,W, P)before starting to review for the exam(R1).

Upload your study docs or become a

Course Hero member to access this document

Upload your study docs or become a

Course Hero member to access this document

End of preview. Want to read all 12 pages?

Upload your study docs or become a

Course Hero member to access this document

Term
Winter
Professor
N/A
Tags
Dynamic Programming, Optimization, decision problem, optimization problem, Search problem

Newly uploaded documents

Show More

Newly uploaded documents

Show More

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture