Unformatted text preview: CS221 Exercise Set #5 1 CS 221, Autumn 2007 Exercise Set #5 Handout #21 1. MDPs: Reward Functions 1 In class we discussed Markov Decision Problems (MDPs) formulated with a reward function R ( s ) just over states. Sometimes MDPs are formulated with a reward function R ( s,a ) that also depends on the action taken or a reward function R ( s,a,s ′ ) that also depends on the outcome state. (a) Write the Bellman updates for these formulations. (b) Show how an MDP with reward function R ( s,a,s ′ ) can be transformed into a differ ent MDP with reward function R ( s,a ), such that optimal policies in the new MDP correspond exactly to optimal policies in the original MDP. (c) Now do the same to convert MDPs with R ( s,a ) into MDPs with R ( s ). 2. Probability Review: Good and Bad News After your yearly checkup, the doctor has bad news and good news. The bad news is that you tested positive for a serious disease, and that the test is 99% accurate (i.e., the probability of testing positive given that you have the disease is 0.99, as is the probabilityprobability of testing positive given that you have the disease is 0....
