cs221-ps3

cs221-ps3 - CS221 Problem Set #3 1 CS 221 Problem Set #3:...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: CS221 Problem Set #3 1 CS 221 Problem Set #3: Markov Decision Processes and Computer Vision Due by 9:30am on Tuesday, November 10. Please see the course information page on the class website for late homework submission instructions. SCPD students can also fax their solutions to (650) 725-1449. We will not accept solutions by email or courier. 1 Written part (70 points) NOTE: These questions require thought, but do not require long answers. Please try to be as concise as possible. 1. [20 points] Markov decision processes Consider an MDP with finite state and action spaces, and discount factor < 1. Let B be the Bellman update operator with V a vector of values for each state. I.e., if V = B ( V ), then V ( s ) = R ( s ) + max a A summationdisplay s S P sa ( s ) V ( s ) . In this problem, we will prove that iterations of the Bellman update converge to a unique solution. (a) [3 points] We will first prove a simple lemma. Prove that the following holds for any two functions f,g : A mapsto R : | max a f ( a ) max a g ( a ) | max a | f ( a ) g ( a ) | (Hint: you may find the quantities a f = arg max a f ( a ) and a g = arg max a g ( a ) useful.) (b) [10 points] Will now prove that, for any two finite-valued vectors V 1 , V 2 , it holds true that || B ( V 1 ) B ( V 2 ) || || V 1 V 2 || . where || V || = max s S | V ( s ) | . i. [5 points] Let V 1 = B ( V 1 ), and V 2 = B ( V 2 ). Using the definition of B ( V 1 ) and B ( V 2 ) above, and part 1a, show that the following holds for any s : | V 1 ( s ) V 2 ( s ) | max a | summationdisplay s P sa ( s )( V 1 ( s ) V 2 ( s )) | ii. [5 points] Now show that, for any s : max a | summationdisplay s P sa ( s )( V 1 ( s ) V 2 ( s )) | || V 1 V 2 || . Then, using part 1(b)i, conclude that || B ( V 1 ) B ( V 2 ) || || V 1 V 2 || . (Hint: You may find the triangle inequality useful: | i x i | i | x i | .) CS221 Problem Set #3 2 (c) [7 points] We say that V is a fixed point of B if B ( V ) = V . Using the result from part 1b, prove that B has at most one fixed pointi.e., that there is at most one solution to the Bellman equations. You may assume that B has at least one fixed point. Note: Some closely related results are also mentioned in the course text book, but without proof. It is not okay to just cite those results without also giving a formal proof of them yourself!...
View Full Document

This note was uploaded on 12/15/2009 for the course CS 221 taught by Professor Koller,ng during the Fall '09 term at Stanford.

Page1 / 5

cs221-ps3 - CS221 Problem Set #3 1 CS 221 Problem Set #3:...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online