ex5 - CS221 Exercise Set #5 1 CS 221 Exercise Set #5 1....

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: CS221 Exercise Set #5 1 CS 221 Exercise Set #5 1. MDPs with Random Stopping Times Suppose we have a Markov Decision Process (MDP) M = ( S , A ,P sa ,,R ), where S is a discrete state space with n states, and the rewards are discounted by a factor . (Recall that P sa ( s ) is the transition model and R ( s ) is the reward function.) We can view this process as a game where we begin in some state s S and take turns selecting actions and transitioning to new states, accumulating rewards along the way. At the n th turn, we first receive some (discounted) reward, n R ( s ), for the current state s . Then, we select an action, a A and transition, randomly, to a new state s according to the probabilities, P sa ( s ). Since the discount factor is < 1, our rewards become smaller and smaller as the game goes on. (Hence, the optimal strategy will try to accumulate big rewards early.) Now consider a slight modification of this game. At the start of each turn we receive an undiscounted reward, R ( s ), and then flip a biased coin that lands heads with probability , < 1. If the coin lands heads , then the game is stopped and we are left with whatever reward we have accumulated thus far. Otherwise, we choose our action and we transition to the next state according to P sa , as usual. We will now show that this new game can be expressed as an MDP. In addition, well also show that the value of this game (i.e., the largest reward we expect to gain from playing it) is equivalent to the discounted reward in the original MDP, M ....
View Full Document

Page1 / 3

ex5 - CS221 Exercise Set #5 1 CS 221 Exercise Set #5 1....

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online