Unformatted text preview: arkov decision process:
inﬂuence diagram States, actions, initial state s1, (expected) costs
C(s,a) ∈ [Cmin, Cmax], transitions T(s’ | s, a) Inﬂuence diagrams
Like a Bayes net, except:
‣ diamond nodes are costs/rewards
‣ must have no children
‣ square nodes are decisions
‣ we pick the CPTs (before seeing anything)
‣ minimize expected cost
Circles are ordinary r.v.s as before Markov decision process:
state space diagram goal: all costs = 0,
self-transition 100% States, actions, costs C(s,a) ∈ [Cmin, Cmax],
transitions T(s’ | s, a), initial state s1 Choosing actions
Execution trace: τ = (s1, a1, c1, s2, a2, c2, …)
‣ c1 = C(s1, a1), c2 = C(s2, a2), etc.
‣ s2 ~ T(s | s1, a1), s3 ~ T(s | s2, a2), etc.
Policy #: S → A
‣ or randomized, #(a | s)
Trace from #: a1 ~ #(a | s1), etc.
‣ τ is then an r.v. with known distribution
‣ we’ll write τ ~ # (rest of MDP implicit) Choosing good actions
in (0,1) Value of a policy: J = π Objective: J∗
γ ct τ ∼ π
t = min J π ∈ arg min J π π π Why a...
View Full Document
This note was uploaded on 01/24/2014 for the course CS 15-780 taught by Professor Bryant during the Fall '09 term at Carnegie Mellon.
- Fall '09