NIPS2009_0127_slide - L 2 ALP • Ignores policy •...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
Robust Value Function Approximation Using Bilinear Programming Marek Petrik, Shlomo Zilberstein New formulation of value function approximation for MDPs ( ABP ) Minimizes a bound on policy loss ( PL ) Policies Values Policies Values Policies Values LSPI Iterative optimization No convergence Weak PL guarantees:
Background image of page 1
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: L 2 ALP • Ignores policy • Potentially large error • Weak PL guarantees: L 1 ABP • Concurrently optimizes value and policy • Strong PL guarantees : L ∞ • Approximate algorithm for solving ABP ≈ Convergent version of API Value Error Policy Error L ∞ norm...
View Full Document

{[ snackBarMessage ]}

Ask a homework question - tutors are online