This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: On Measurability of Value Functions and Representation of Randomized Policies in Markov Decision Processes Eugene A. Feinberg W.A. Harriman School for Management and Policy State University of New York at Stony Brook Stony Brook, NY 11794-3775 January 1996 This paper deals with a discrete time Markov Decision Process with Borel state and action. We show that the set of all strategic measures generated by randimized stationary policies is Borel. Combined with known results, this fact implies measurability of the sets of strategic measures generated by stationary, Markov, and randomized Markov policies. We consider applications of these measurability results to the two groups of problems: (i) measurability of value functions for various classes of policies and (ii) integral represen- tation of strategic measures for randoimized Markov and arbitrary randomized policies through strategic measures for corresponding nonrandomized policies. 1. Introduction. The foundations of dynamic programming for problems with un- countable state spaces were been built by David Blackwell (1965, 1965a) and his student Ralph Strauch (1966). For the past thirty years, these pioneering results have been devel- oped in various directions such as: (i) problems with more general measurability conditions (Blackwell, Orkin, Freedman 1974, Freedman 1974, Bertsekas and Shreve 1978, Sch¨ al and Sudderth 1987, (ii) problems with more general summation assumptions than positive, negative, and discounted dynamic programming problems (Hinderer 1970, Dynkin and Yushkevich 1979, Sch¨ al 1983, Feinberg 1982, 1982a, 1992, Sch¨ al and Sudderth 1987), (iii) dynamic programming on compact sets (Sch¨ al 1975, Balder 1989 ). The previous sentence represents just a small part of research directions and publications stimulated by research of David Blackwell on dynamic programming. One of the remarkable discoveries in these pioneer papers by Blackwell (1965, 1965a) 1 and Strauch (1966) was that value functions may not be measurable in a standard Borel sense, but they are measurable in a more general sense, namely they are universally mea- surable. More precisely, if the objective is to maximize the expected total rewards, the value function is upper semianalytic and therefore it is universally measurable. This dis- covery established connections between dynamic programming and the theory of analytic sets (Lusin 1927, Kuratowski 1966), an area of pure mathematics developed in the first part of twentieth century, and stimulated additional research in the fields of topology, set theory, and analysis, including new developments related to selection theorems; Wagner (1977). It also allowed Blackwell, Strauch, and future researchers to write optimality op- erators and optimality equations for dynamic programming problems with uncountable state spaces and to analyze these equations....
View Full Document
This note was uploaded on 12/06/2011 for the course MATH 101 taught by Professor Eugenea.feinberg during the Fall '11 term at State University of New York.
- Fall '11
- Dynamic Programming