This preview shows page 1. Sign up to view the full content.
Unformatted text preview: discount factor? Why a discount factor?
A1: to make the sums ﬁnite Why a discount factor?
A1: to make the sums ﬁnite
A2: interest rate 1/γ – 1 per period Why a discount factor?
A1: to make the sums ﬁnite
A2: interest rate 1/γ – 1 per period
A3: model mismatch
‣ probability (1–γ) that something unexpected
happens on each step and my plan goes out
the window Recursive expression
Jπ
1−γ t
=E
γ ct τ ∼ π
γ
t
= E[J (τ )  τ ∼ π ] 1−γ
J (τ ) =
[γ c1 + γ 2 c2 + γ 3 c3 + . . .]
γ
1−γ
= (1 − γ )c1 + γ
(γ c2 + γ 2 c3 + . . .)
γ
= (1 − γ )c1 + γ J (τ + ) (1–γ) $ immediate cost + γ $ future cost Tree search
γ = 0.5
transitions = 0.5 Root node = current state
Alternating levels: action and outcome
‣ min and expectation
Build out tree until goal or until γt small enough Interpreting the result
Number at each node: optimal cost if starting
from state s instead of s1
‣ call this J*(s)—so, J* = J*(s1)
‣ statevalue function
Number at each ⋅...
View
Full
Document
This note was uploaded on 01/24/2014 for the course CS 15780 taught by Professor Bryant during the Fall '09 term at Carnegie Mellon.
 Fall '09
 Bryant

Click to edit the document details