This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: CS221 Problem Set #3 1 CS 221 Problem Set #3: Markov Decision Processes and Computer Vision Due by 9:30am on Tuesday, November 10. Please see the course information page on the class website for late homework submission instructions. SCPD students can also fax their solutions to (650) 7251449. We will not accept solutions by email or courier. 1 Written part (70 points) NOTE: These questions require thought, but do not require long answers. Please try to be as concise as possible. 1. [20 points] Markov decision processes Consider an MDP with finite state and action spaces, and discount factor < 1. Let B be the Bellman update operator with V a vector of values for each state. I.e., if V = B ( V ), then V ( s ) = R ( s ) + max a A summationdisplay s S P sa ( s ) V ( s ) . In this problem, we will prove that iterations of the Bellman update converge to a unique solution. (a) [3 points] We will first prove a simple lemma. Prove that the following holds for any two functions f,g : A mapsto R :  max a f ( a ) max a g ( a )  max a  f ( a ) g ( a )  (Hint: you may find the quantities a f = arg max a f ( a ) and a g = arg max a g ( a ) useful.) (b) [10 points] Will now prove that, for any two finitevalued vectors V 1 , V 2 , it holds true that  B ( V 1 ) B ( V 2 )   V 1 V 2  . where  V  = max s S  V ( s )  . i. [5 points] Let V 1 = B ( V 1 ), and V 2 = B ( V 2 ). Using the definition of B ( V 1 ) and B ( V 2 ) above, and part 1a, show that the following holds for any s :  V 1 ( s ) V 2 ( s )  max a  summationdisplay s P sa ( s )( V 1 ( s ) V 2 ( s ))  ii. [5 points] Now show that, for any s : max a  summationdisplay s P sa ( s )( V 1 ( s ) V 2 ( s ))   V 1 V 2  . Then, using part 1(b)i, conclude that  B ( V 1 ) B ( V 2 )   V 1 V 2  . (Hint: You may find the triangle inequality useful:  i x i  i  x i  .) CS221 Problem Set #3 2 (c) [7 points] We say that V is a fixed point of B if B ( V ) = V . Using the result from part 1b, prove that B has at most one fixed pointi.e., that there is at most one solution to the Bellman equations. You may assume that B has at least one fixed point. Note: Some closely related results are also mentioned in the course text book, but without proof. It is not okay to just cite those results without also giving a formal proof of them yourself!...
View
Full
Document
This note was uploaded on 12/15/2009 for the course CS 221 taught by Professor Koller,ng during the Fall '09 term at Stanford.
 Fall '09
 KOLLER,NG

Click to edit the document details