This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: POMDPs Slides based on Hansen et. Al.’s tutorial + R&N 3rd Ed Sec 17.4 Planning using Partially Observable Markov Decision Processes: A Tutorial Presenters: Eric Hansen, Mississippi State University Daniel Bernstein, University of Massachusetts/Amherst Zhengzhu Feng, University of Massachusetts/Amherst Rong Zhou, Mississippi State University Introduction and foundations Definition of POMDP Goals, rewards and optimality criteria Examples and applications Computational complexity Belief states and Bayesian conditioning Planning under partial observability Environment Action Imperfect observation Goal Environment Two Approaches to Planning under Partial Observability n Nondeterministic planning n Uncertainty is represented by set of possible states n No possibility is considered more likely than any other n Probabilistic (decisiontheoretic) planning n Uncertainty is represented by probability distribution over possible states n In this tutorial we consider the second, more general approach Markov models Prediction Planning Fully observable Markov chain MDP (Markov decision process) Partially observable Hidden Markov model POMDP (Partially observable Markov decision process) Definition of POMDP s S 1 S 2 z a r z 1 a 1 hidden states: r 1 z 2 a 2 r 2 observations: actions: rewards: Goals, rewards and optimality criteria n Rewards are additive and timeseparable, and objective is to maximize expected total reward n Traditional planning goals can be encoded in reward function Example: achieving a state satisfying property P at minimal cost is encoded by making any state satisfying P a zeroreward absorbing state, and assigning all other states negative reward. n POMDP allows partial satisfaction of goals and tradeoffs among competing goals n Planning horizon can be finite, infinite or indefinite Machine Maintenance X Canonical application of POMDPs in Operations Research Robot Navigation Actions: N, S, E, W, Stop +1 –1 Start 0.8 0.1 0.1 n Canonical application of POMDPs in AI n Toy example from Russell & Norvig’s AI textbook Observations: sense surrounding walls Many other applications n Helicopter control [Bagnell & Schneider 2001] n Dialogue management [Roy, Pineau & Thrun 2000] n Preference elicitation [Boutilier 2002] n Optimal search and sensor scheduling [Krishnamurthy & Singh 2000] n Medical diagnosis and treatment [Hauskrecht & Fraser 2000] n Packet scheduling in computer networks [Chang et al. 2000; Bent & Van Hentenryck 2004] Computational complexity n Finitehorizon n PSPACEhard [Papadimitriou & Tsitsiklis 1987] n NPcomplete if unobservable n Infinitehorizon n Undecidable [Madani, Hanks & Condon 1999] n NPhard for approximation [Lusena, Goldsmith & Mundhenk 2001] n NPhard for memoryless or boundedmemory control problem [Littman 1994; Meuleau et al. 1999] Planning for fully observable MDPs n Dynamic programming n Value iteration [Bellman 1957] n Policy iteration [Howard 1960] n Scaling up n State aggregation and factored representation...
View
Full
Document
This note was uploaded on 03/11/2012 for the course CSE 571 taught by Professor Baral during the Fall '08 term at ASU.
 Fall '08
 Baral

Click to edit the document details