pomdps

pomdps - POMDPs Slides based on Hansen et Al.’s tutorial...

Info iconThis preview shows pages 1–14. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: POMDPs Slides based on Hansen et. Al.’s tutorial + R&N 3rd Ed Sec 17.4 Planning using Partially Observable Markov Decision Processes: A Tutorial Presenters: Eric Hansen, Mississippi State University Daniel Bernstein, University of Massachusetts/Amherst Zhengzhu Feng, University of Massachusetts/Amherst Rong Zhou, Mississippi State University Introduction and foundations Definition of POMDP Goals, rewards and optimality criteria Examples and applications Computational complexity Belief states and Bayesian conditioning Planning under partial observability Environment Action Imperfect observation Goal Environment Two Approaches to Planning under Partial Observability n Nondeterministic planning n Uncertainty is represented by set of possible states n No possibility is considered more likely than any other n Probabilistic (decision-theoretic) planning n Uncertainty is represented by probability distribution over possible states n In this tutorial we consider the second, more general approach Markov models Prediction Planning Fully observable Markov chain MDP (Markov decision process) Partially observable Hidden Markov model POMDP (Partially observable Markov decision process) Definition of POMDP s S 1 S 2 z a r z 1 a 1 hidden states: r 1 z 2 a 2 r 2 observations: actions: rewards: Goals, rewards and optimality criteria n Rewards are additive and time-separable, and objective is to maximize expected total reward n Traditional planning goals can be encoded in reward function Example: achieving a state satisfying property P at minimal cost is encoded by making any state satisfying P a zero-reward absorbing state, and assigning all other states negative reward. n POMDP allows partial satisfaction of goals and tradeoffs among competing goals n Planning horizon can be finite, infinite or indefinite Machine Maintenance X Canonical application of POMDPs in Operations Research Robot Navigation Actions: N, S, E, W, Stop +1 –1 Start 0.8 0.1 0.1 n Canonical application of POMDPs in AI n Toy example from Russell & Norvig’s AI textbook Observations: sense surrounding walls Many other applications n Helicopter control [Bagnell & Schneider 2001] n Dialogue management [Roy, Pineau & Thrun 2000] n Preference elicitation [Boutilier 2002] n Optimal search and sensor scheduling [Krishnamurthy & Singh 2000] n Medical diagnosis and treatment [Hauskrecht & Fraser 2000] n Packet scheduling in computer networks [Chang et al. 2000; Bent & Van Hentenryck 2004] Computational complexity n Finite-horizon n PSPACE-hard [Papadimitriou & Tsitsiklis 1987] n NP-complete if unobservable n Infinite-horizon n Undecidable [Madani, Hanks & Condon 1999] n NP-hard for -approximation [Lusena, Goldsmith & Mundhenk 2001] n NP-hard for memoryless or bounded-memory control problem [Littman 1994; Meuleau et al. 1999] Planning for fully observable MDPs n Dynamic programming n Value iteration [Bellman 1957] n Policy iteration [Howard 1960] n Scaling up n State aggregation and factored representation...
View Full Document

This note was uploaded on 03/11/2012 for the course CSE 571 taught by Professor Baral during the Fall '08 term at ASU.

Page1 / 165

pomdps - POMDPs Slides based on Hansen et Al.’s tutorial...

This preview shows document pages 1 - 14. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online