For any s s let a s be the maximizing action dening vs

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: E[r(s, a)] + γ Φ(V) = max{Rπ + γ Pπ V}. ￿ s￿ ∈S ￿ Pr[s￿ |s, a]V (s￿ ) . π ValueIteration(V0 ) 1 V ← V0 ￿ V0 arbitrary value (1−γ )￿ 2 while ￿V − Φ(V)￿ ≥ γ do 3 V ← Φ(V) 4 return Φ(V) Mehryar Mohri - Foundations of Machine Learning page 20 VI Algorithm - Convergence Theorem: for any initial value V0 , the sequence defined by Vn+1 = Φ(Vn ) converge to V∗. Proof: we show that Φ is γ -contracting for ￿ · ￿∞ existence and uniqueness of fixed point for Φ . for any s ∈ S, let a∗ (s) be the maximizing action defining Φ(V)(s) . Then, for s ∈ S and any U , • ￿ ￿ Φ(V)(s) − Φ(U)(s) ≤ Φ(V)(s) − E[r(s, a (s))] + γ s￿ ∈S ￿ =γ Pr[s￿ |s, a∗ (s)][V(s￿ ) − U(s￿ )] ≤γ s￿ ∈S ￿ s￿ ∈S Mehryar Mohri - Foundations of Machine Learning ∗ ￿ Pr[s | s, a (s)]U(s ) ￿ ∗ Pr[s￿ |s, a∗ (s)]￿V − U￿∞ = γ ￿V − U￿∞ . page 21 ￿ Complexity and Optimality Complexity: convergence in O(log 1 ) . Observe that ￿ ￿Vn+1 − Vn ￿∞ ≤ γ ￿Vn − Vn−1 ￿∞ ≤ γ n ￿Φ(V0 ) − V0 ￿∞ . ￿ (1 − γ )￿ 1￿ γ n ￿Φ(V0 ) − V0 ￿∞ ≤ ⇒ n = O log . Thus, γ ￿ ￿ -Optimality: let Vn+1 be the value returned. Then, ￿V∗ − Vn+1 ￿∞ ≤ ￿V∗ − Φ(Vn+1 )￿∞ + ￿Φ(Vn+1 ) − Vn+1 ￿∞ ≤ γ ￿V∗ − Vn+1 ￿∞ + γ ￿Vn+1 − Vn ￿∞ . Thus, ∗ ￿V − Vn+1 ￿∞ Mehryar Mohri - Foundations of Machine Learning γ ￿Vn+1 − Vn ￿∞ ≤ ￿. ≤ 1−γ page 22 VI Algorithm - Example a/[3/4, 2] 1 a/[1/4, 2] b/[1, 2] d/[1, 3] c/[1, 2] 2 ￿ ￿3 ￿ ￿ 1 Vn+1 (1) = max 2 + γ Vn (1) + Vn (2) , 2 + γ Vn (2) 4 4 ￿ ￿ Vn+1 (2) = max 3 + γ Vn (1), 2 + γ Vn (2) . For V0 (1) = −1, V0 (2) = 1, γ = 1/2 ,V1 (1) = V1 (2) = 5/2. But, V∗ (1) = 14/3, V∗(2) = 16/3. , Mehryar Mohri - Foundations of Machine Learning page 23 Policy Iteration Algorithm PolicyIteration(π0 ) 1 π ← π0...
View Full Document

Ask a homework question - tutors are online