This preview shows pages 1–2. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: CS221 Exercise Set #5 1 CS 221, Autumn 2007 Exercise Set #5 Handout #21 1. MDPs: Reward Functions 1 In class we discussed Markov Decision Problems (MDPs) formulated with a reward function R ( s ) just over states. Sometimes MDPs are formulated with a reward function R ( s,a ) that also depends on the action taken or a reward function R ( s,a,s ′ ) that also depends on the outcome state. (a) Write the Bellman updates for these formulations. (b) Show how an MDP with reward function R ( s,a,s ′ ) can be transformed into a differ- ent MDP with reward function R ( s,a ), such that optimal policies in the new MDP correspond exactly to optimal policies in the original MDP. (c) Now do the same to convert MDPs with R ( s,a ) into MDPs with R ( s ). 2. Probability Review: Good and Bad News After your yearly checkup, the doctor has bad news and good news. The bad news is that you tested positive for a serious disease, and that the test is 99% accurate (i.e., the probability of testing positive given that you have the disease is 0.99, as is the probabilityprobability of testing positive given that you have the disease is 0....
View Full Document
This note was uploaded on 11/30/2009 for the course CS 221 taught by Professor Koller,ng during the Winter '09 term at Stanford.
- Winter '09
- Artificial Intelligence