mdps-f10

mdps-f10 - Markov Decision Processes * Based in part on...

Info iconThis preview shows pages 1–12. Sign up to view the full content.

View Full Document Right Arrow Icon
1 Markov Decision Processes * Based in part on slides by Alan Fern, Craig Boutilier and Daniel Weld
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Atomic Model for stochastic environments with generalized rewards h Deterministic worlds + goals of attainment h Atomic model: Graph search h Propositional models: The PDDL planning that we discussed. . h Stochastic worlds +generalized rewards h An action can take you to any of a set of states with known probability h You get rewards for visiting each state h Objective is to increase your “cumulative” reward… 2
Background image of page 2
3
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Optimal Policies depend on horizon, rewards. . - - - -
Background image of page 4
5 Percepts Actions ???? World perfect fully observable instantaneous deterministic Classical Planning Assumptions sole source of change
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
6 Percepts Actions ???? World perfect fully observable instantaneous stochastic Stochastic/Probabilistic Planning: Markov Decision Process (MDP) Model sole source of change
Background image of page 6
7 Types of Uncertainty h Disjunctive (used by non-deterministic planning) Next state could be one of a set of states. h Stochastic/Probabilistic Next state is drawn from a probability distribution over the set of states. How are these models related?
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
8 Markov Decision Processes h An MDP has four components: S , A , R , T : 5 (finite) state set S (|S| = n) 5 (finite) action set A (|A| = m) 5 (Markov) transition function T(s,a,s’) = Pr(s’ | s,a) g Probability of going to state s’ after taking action a in state s g How many parameters does it take to represent? 5 bounded, real-valued (Markov) reward function R(s) g Immediate reward we get for being in state s g For example in a goal-based domain R(s) may equal 1 for goal states and 0 for all others g Can be generalized to include action costs: R(s,a) g Can be generalized to be a stochastic function h Can easily generalize to countable or continuous state and action spaces (but algorithms will be different)
Background image of page 8
9 Graphical View of MDP St Rt St+1 A t Rt+1 St+2 A t+1 Rt+2
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
10 Assumptions h First-Order Markovian dynamics (history independence) 5 Pr(S t+1 |A t ,S t ,A t-1 ,S t-1 ,..., S 0 ) = Pr(S t+1 |A t ,S t ) 5 Next state only depends on current state and current action h First-Order Markovian reward process 5 Pr(R t |A t ,S t ,A t-1 ,S t-1 ,..., S 0 ) = Pr(R t |A t ,S t ) 5 Reward only depends on current state and action 5 As described earlier we will assume reward is specified by a deterministic function R(s) g i.e. Pr(R t =R(S t ) | A t ,S t ) = 1 h Stationary dynamics and reward 5 Pr(S t+1 |A t ,S t ) = Pr(S k+1 |A k ,S k ) for all t, k 5 The world dynamics do not depend on the absolute time h Full observability 5 Though we can’t predict exactly which state we will reach when we execute an action, once it is realized, we know what it is
Background image of page 10
11 Policies (“plans” for MDPs) h Nonstationary policy [ Even though we have stationary dynamics and reward?? ] 5 π:S x T → A, where T is the non-negative integers 5 π(s,t) is action to do at state s with t stages-to-go 5 What if we want to keep acting indefinitely? h
Background image of page 11

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 12
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 41

mdps-f10 - Markov Decision Processes * Based in part on...

This preview shows document pages 1 - 12. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online