This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: CS221 Problem Set #3 1 CS 221 Problem Set #3: Markov Decision Processes and Computer Vision Due by 9:30am on Tuesday, November 10. Please see the course information page on the class website for late homework submission instructions. SCPD students can also fax their solutions to (650) 725-1449. We will not accept solutions by email or courier. 1 Written part (70 points) NOTE: These questions require thought, but do not require long answers. Please try to be as concise as possible. 1. [20 points] Markov decision processes Consider an MDP with finite state and action spaces, and discount factor < 1. Let B be the Bellman update operator with V a vector of values for each state. I.e., if V = B ( V ), then V ( s ) = R ( s ) + max a A summationdisplay s S P sa ( s ) V ( s ) . In this problem, we will prove that iterations of the Bellman update converge to a unique solution. (a) [3 points] We will first prove a simple lemma. Prove that the following holds for any two functions f,g : A mapsto R : | max a f ( a ) max a g ( a ) | max a | f ( a ) g ( a ) | (Hint: you may find the quantities a f = arg max a f ( a ) and a g = arg max a g ( a ) useful.) (b) [10 points] Will now prove that, for any two finite-valued vectors V 1 , V 2 , it holds true that || B ( V 1 ) B ( V 2 ) || || V 1 V 2 || . where || V || = max s S | V ( s ) | . i. [5 points] Let V 1 = B ( V 1 ), and V 2 = B ( V 2 ). Using the definition of B ( V 1 ) and B ( V 2 ) above, and part 1a, show that the following holds for any s : | V 1 ( s ) V 2 ( s ) | max a | summationdisplay s P sa ( s )( V 1 ( s ) V 2 ( s )) | ii. [5 points] Now show that, for any s : max a | summationdisplay s P sa ( s )( V 1 ( s ) V 2 ( s )) | || V 1 V 2 || . Then, using part 1(b)i, conclude that || B ( V 1 ) B ( V 2 ) || || V 1 V 2 || . (Hint: You may find the triangle inequality useful: | i x i | i | x i | .) CS221 Problem Set #3 2 (c) [7 points] We say that V is a fixed point of B if B ( V ) = V . Using the result from part 1b, prove that B has at most one fixed pointi.e., that there is at most one solution to the Bellman equations. You may assume that B has at least one fixed point. Note: Some closely related results are also mentioned in the course text book, but without proof. It is not okay to just cite those results without also giving a formal proof of them yourself!...
View Full Document
This note was uploaded on 12/15/2009 for the course CS 221 taught by Professor Koller,ng during the Fall '09 term at Stanford.
- Fall '09