Lecture 21 Notes

# 1 1 c1 c2 2 c3 1 c1

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: discount factor? Why a discount factor? A1: to make the sums ﬁnite Why a discount factor? A1: to make the sums ﬁnite A2: interest rate 1/γ – 1 per period Why a discount factor? A1: to make the sums ﬁnite A2: interest rate 1/γ – 1 per period A3: model mismatch ‣ probability (1–γ) that something unexpected happens on each step and my plan goes out the window Recursive expression Jπ ￿ ￿ 1−γ ￿ t ￿ =E γ ct ￿ τ ∼ π ￿ γ t ￿ = E[J (τ ) | τ ∼ π ] 1−γ J (τ ) = [γ c1 + γ 2 c2 + γ 3 c3 + . . .] γ ￿ ￿ 1−γ = (1 − γ )c1 + γ (γ c2 + γ 2 c3 + . . .) γ = (1 − γ )c1 + γ J (τ + ) (1–γ) \$ immediate cost + γ \$ future cost Tree search γ = 0.5 transitions = 0.5 Root node = current state Alternating levels: action and outcome ‣ min and expectation Build out tree until goal or until γt small enough Interpreting the result Number at each node: optimal cost if starting from state s instead of s1 ‣ call this J*(s)—so, J* = J*(s1) ‣ state-value function Number at each ⋅...
View Full Document

## This note was uploaded on 01/24/2014 for the course CS 15-780 taught by Professor Bryant during the Fall '09 term at Carnegie Mellon.

Ask a homework question - tutors are online