{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

MIT16_410F10_lec22 - 16.410/413 Principles of Autonomy and...

Info icon This preview shows pages 1–6. Sign up to view the full content.

View Full Document Right Arrow Icon
16.410/413 Principles of Autonomy and Decision Making Lecture 22: Markov Decision Processes I Emilio Frazzoli Aeronautics and Astronautics Massachusetts Institute of Technology November 29, 2010 Frazzoli (MIT) Lecture 22: MDPs November 29, 2010 1 / 16
Image of page 1

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Assignments Readings Lecture notes [AIMA] Ch. 17.1-3. Frazzoli (MIT) Lecture 22: MDPs November 29, 2010 2 / 16
Image of page 2
Outline 1 Markov Decision Processes Frazzoli (MIT) Lecture 22: MDPs November 29, 2010 3 / 16
Image of page 3

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
From deterministic to stochastic planning problems A basic planning model for deterministic systems (e.g., graph/tree search algorithms, etc.) is : Planning Model (Transition system + goal) A (discrete, deterministic) feasible planning model is defined by A countable set of states S . A countable set of actions A . A transition relation →⊆ S × A × S . An initial state s 1 ∈ S . A set of goal states s G ⊂ S . We considered the case in which the transition relation is purely deterministic: if ( s , a , s 0 ) are in relation, i.e., ( s , a , s 0 ) ∈→ , or, more concisely, s a -→ s 0 , then taking action a from state s will always take the state to s 0 . Can we extend this model to include (probabilistic) uncertainty in the transitions? Frazzoli (MIT) Lecture 22: MDPs November 29, 2010 4 / 16
Image of page 4
Markov Decision Process Instead of a (deterministic) transition relation, let us define transition probabilities; also, let us introduce a reward (or cost) structure: Markov Decision Process (Stoch. transition system + reward) A Markov Decision Process (MDP) is defined by A countable set of states S . A countable set of actions A . A transition probability function T : S × A × S → R + . An initial state s 0 ∈ S . A reward function R : S × A × S → R + . In other words: if action a is applied from state s , a transition to state s 0 will occur with probability T ( s , a , s 0 ). Furthermore, every time a transition is made from s to s 0 using action a , a reward R ( s , a , s 0 ) is collected.
Image of page 5

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 6
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern