Lecture 21 Notes

G low probability outcomes when branch is already

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: node: optimal cost if starting from parent’s s, choosing incoming a ‣ call this Q*(s,a) ‣ action-value function Similarly, J#(s) and Q#(s, a) The update equations For ⋅ node Q∗ (s, a) = (1 − γ )C (s, a) + γ E[J ∗ (s￿ ) | s￿ ∼ T (· | s, a)] For node J (s) ∗ = min Q (s, a) a ∗ (1–γ) \$ immediate cost + γ \$ future cost Updates for a ﬁxed policy For ⋅ node Qπ (s, a) = (1 − γ )C (s, a) + γ E[J π (s￿ ) | s￿ ∼ T (· | s, a)] For node J π (s) = E[Qπ (s, a) | a ∼ π (· | s)] (1–γ) \$ immediate cost + γ \$ future cost Speeding it up Can’t do DPLL-style pruning: outcome node depends on all children Can do some pruning: e.g., low-probability outcomes when branch is already clearly bad Or, use scenarios: subsample outcomes at each expectation node ‣ with enough samples, good estimate of value of each expectation Receding-horizon planning Stop building tree at 2k levels, evaluate leaf nodes with heuristic h(s) ‣ or at 2k–1 levels, evaluate with h(s, a) Minimal guarantees, but often works well in practice Can also use adaptive horizon Just as in deterministic search, a good heuristic is essential! Good heuristic Good heuristic: h...
View Full Document

This note was uploaded on 01/24/2014 for the course CS 15-780 taught by Professor Bryant during the Fall '09 term at Carnegie Mellon.

Ask a homework question - tutors are online