ex5 - CS221 Exercise Set #5 1 CS 221, Autumn 2007 Exercise...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: CS221 Exercise Set #5 1 CS 221, Autumn 2007 Exercise Set #5 Handout #21 1. MDPs: Reward Functions 1 In class we discussed Markov Decision Problems (MDPs) formulated with a reward function R ( s ) just over states. Sometimes MDPs are formulated with a reward function R ( s,a ) that also depends on the action taken or a reward function R ( s,a,s ′ ) that also depends on the outcome state. (a) Write the Bellman updates for these formulations. (b) Show how an MDP with reward function R ( s,a,s ′ ) can be transformed into a differ- ent MDP with reward function R ( s,a ), such that optimal policies in the new MDP correspond exactly to optimal policies in the original MDP. (c) Now do the same to convert MDPs with R ( s,a ) into MDPs with R ( s ). 2. Probability Review: Good and Bad News After your yearly checkup, the doctor has bad news and good news. The bad news is that you tested positive for a serious disease, and that the test is 99% accurate (i.e., the probability of testing positive given that you have the disease is 0.99, as is the probabilityprobability of testing positive given that you have the disease is 0....
View Full Document

This note was uploaded on 11/30/2009 for the course CS 221 taught by Professor Koller,ng during the Winter '09 term at Stanford.

Page1 / 2

ex5 - CS221 Exercise Set #5 1 CS 221, Autumn 2007 Exercise...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online