cs685-mdps

# cs685-mdps - 1 SA-1 1 Probabilistic Robotics Planning and...

This preview shows pages 1–4. Sign up to view the full content.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: 1 SA-1 1 Probabilistic Robotics Planning and Control: Markov Decision Processes 2 Problem Classes • Deterministic vs. stochastic actions • Full vs. partial observability • Today how to make decisions under uncertainty Uncertainty and decisions • Previously how to do state estimation under uncertainty • Uncertainty can affect how the robot makes decisions • How to encode preferences, between different outcomes of the planes (e.g. going to the airport – lots of options, risks) • Utility theory – reasoning about preferences (utility – quality of being useful) • Every state has some utility • Decision theory = probability theory + utility theory • Principle of maximum expected utility – agent is rational if it chooses an action with the highest expected utility 3 Designing control systems • Often in addition to stability, observability, controllability, we want to have some optimality • Such that the goal it that the trajectory will maximize certain performance index (e.g. time travelled, fuel cost, quadratic cost for trajectory tracking …) • Using techniques from calculus of variations to solve for functions which maximize the performance index V • Special class of systems n-stage decision processes • Find such V and choices of action such that the V is maximal • Blackboard example : Recursive computation of V in deterministic case (in case of grid world similar to waverfront planner) • Principle of dynamic programming – decompose the problem in n-stages; at each stage relaxation • Next what if the outcomes of actions are uncertain ? 4 2 Making decisions under uncertainty Suppose I believe the following: P(A 25 gets me there on time | …) = 0.04 P(A 90 gets me there on time | …) = 0.70 P(A 120 gets me there on time | …) = 0.95 P(A 1440 gets me there on time | …) = 0.9999 • Which action to choose? Depends on my preferences for missing flight vs. time spent waiting, etc. • Utility theory is used to represent and infer preferences • Decision theory = probability theory + utility theory A simple knowledge-based agent • The agent must be able to: • Represent states, actions, etc. • Incorporate new percepts • Update internal representations of the world • Deduce hidden properties of the world • Deduce appropriate actions Markov decision processes • Framework for represention complex multi-stage decision problems in the presence of uncertainty • Efficient solutions • Models the dynamics of the environment under different actions • Markov assumptions : next state depends in the previous state, and action not the past CS 685 7 Markov Decision Process • Formal definition • 4-tuple (X, U, T, R) • Set of states X - finite • Set of actions A - finite • Transition model Transition probability for each action, state • Reward model 8 T : X × U × X → [0,1] X × U × X → R 3 Example • Robot navigating on the grid • 4 actions – up, down, left, right • Effects of moves are stochastic, we may end up in...
View Full Document

## This note was uploaded on 04/07/2010 for the course CS 685 taught by Professor Luke,s during the Fall '08 term at George Mason.

### Page1 / 14

cs685-mdps - 1 SA-1 1 Probabilistic Robotics Planning and...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online