lecture_11

# For any s s let a s be the maximizing action dening vs

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: E[r(s, a)] + γ Φ(V) = max{Rπ + γ Pπ V}. ￿ s￿ ∈S ￿ Pr[s￿ |s, a]V (s￿ ) . π ValueIteration(V0 ) 1 V ← V0 ￿ V0 arbitrary value (1−γ )￿ 2 while ￿V − Φ(V)￿ ≥ γ do 3 V ← Φ(V) 4 return Φ(V) Mehryar Mohri - Foundations of Machine Learning page 20 VI Algorithm - Convergence Theorem: for any initial value V0 , the sequence deﬁned by Vn+1 = Φ(Vn ) converge to V∗. Proof: we show that Φ is γ -contracting for ￿ · ￿∞ existence and uniqueness of ﬁxed point for Φ . for any s ∈ S, let a∗ (s) be the maximizing action deﬁning Φ(V)(s) . Then, for s ∈ S and any U , • ￿ ￿ Φ(V)(s) − Φ(U)(s) ≤ Φ(V)(s) − E[r(s, a (s))] + γ s￿ ∈S ￿ =γ Pr[s￿ |s, a∗ (s)][V(s￿ ) − U(s￿ )] ≤γ s￿ ∈S ￿ s￿ ∈S Mehryar Mohri - Foundations of Machine Learning ∗ ￿ Pr[s | s, a (s)]U(s ) ￿ ∗ Pr[s￿ |s, a∗ (s)]￿V − U￿∞ = γ ￿V − U￿∞ . page 21 ￿ Complexity and Optimality Complexity: convergence in O(log 1 ) . Observe that ￿ ￿Vn+1 − Vn ￿∞ ≤ γ ￿Vn − Vn−1 ￿∞ ≤ γ n ￿Φ(V0 ) − V0 ￿∞ . ￿ (1 − γ )￿ 1￿ γ n ￿Φ(V0 ) − V0 ￿∞ ≤ ⇒ n = O log . Thus, γ ￿ ￿ -Optimality: let Vn+1 be the value returned. Then, ￿V∗ − Vn+1 ￿∞ ≤ ￿V∗ − Φ(Vn+1 )￿∞ + ￿Φ(Vn+1 ) − Vn+1 ￿∞ ≤ γ ￿V∗ − Vn+1 ￿∞ + γ ￿Vn+1 − Vn ￿∞ . Thus, ∗ ￿V − Vn+1 ￿∞ Mehryar Mohri - Foundations of Machine Learning γ ￿Vn+1 − Vn ￿∞ ≤ ￿. ≤ 1−γ page 22 VI Algorithm - Example a/[3/4, 2] 1 a/[1/4, 2] b/[1, 2] d/[1, 3] c/[1, 2] 2 ￿ ￿3 ￿ ￿ 1 Vn+1 (1) = max 2 + γ Vn (1) + Vn (2) , 2 + γ Vn (2) 4 4 ￿ ￿ Vn+1 (2) = max 3 + γ Vn (1), 2 + γ Vn (2) . For V0 (1) = −1, V0 (2) = 1, γ = 1/2 ,V1 (1) = V1 (2) = 5/2. But, V∗ (1) = 14/3, V∗(2) = 16/3. , Mehryar Mohri - Foundations of Machine Learning page 23 Policy Iteration Algorithm PolicyIteration(π0 ) 1 π ← π0...
View Full Document

Ask a homework question - tutors are online