Question 20

5 / 5 ptsQuestion 20Alice wishes to derive an algorithmic procedure that is analogous to Q-value iteration in this new formulation.For each letter (A), (B), (C) and (D), fill in a single entry for the term corresponding to the correct equation toimplement X-value iteration. N(s) refers to the neighboring states of states(i.e.: the set of states s’ that canpossibly be reached by taking some action from states). Select one bubble per row to form the wholeequation.New Formulation: X(s, s')(A) [(B) + (C)(D)]Choose the correct value for the above new formulation.For A, select the correct option: 21.2. 13.For B, select the correct option: 11.2. 03. 1For C, select the correct option: 11.2. 03. 1For D, select the correct option: 51.Typesetting math: 100%
Answer 1:Answer 2:Answer 3:Answer 4:2.3.4.5.6. V(s'')7. X(s', s'')8. X(s, s'')2115Conceptual questions about the new formulation1 / 1 ptsQuestion 21In a deterministic MDP like above, can value iteration with the new value functionX(s,s')learn the optimalpolicy?
2 / 2 ptsQuestion 22We can also extend this formulation of value iteration with the new value function,X(s,s'),to non-deterministic dynamicsT(s,a,s')as follows. For each transition (s,a, s',R(s,a,s'))we take, we update thecorrespondingX(s,s')value according to equation (2), where we replace all occurrences ofI(s,s')by theaction we actually take,a.After the X-values converge, we extract the policy with.Typesetting math: 100%
Willthis new X-value iteration process always converge to the same policy as vanilla value iteration inenvironments with non-deterministic dynamics inT(s,a,s')?NoYesPacman is taking Intro to Ghost Intelligence at a University, this semester. It is 7 days before the midterm,but Pacman is still procrastinating! Pacman still has 1 Electronic Homework (E),1 Written Homework (W),and 1 Project (P) to finish before the exam. Each of them takes 1 day to complete, and Pacman can onlywork on at most one task every day. Also, Pacman needs 2 days to review the course material before theexam(RIandR ).Pacman needs your help to assign the dates to complete these tasks!Pacman formulates the problem as a CSP, where the tasks (E,W, P, R, R)are variables, each with domain{1, ..., 7}, representing the seven days from now until the exam.Pacman wants the assignments of tasks to meet the following constraints:1. Each task(E,W, P, R, R) mustbe assigned to a different2. Both the Electronic Homework(E)and Project(P)are due in 4 days, so they must be finished in days 1,2, 3, or3. Since we useRandRto represent the first and the second day of reviewing for the exam, we assumeR<R ,and the two days for reviewing (R,R2)must alsonotbe4. Pacman must finish all the assignments (E,W, P)before starting to review for the exam(R1).

