This preview shows pages 1–4. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Mathematical Methods of Operations Research manuscript No. (will be inserted by the editor) NonRandomized Policies for Constrained Markov Decision Processes Richard C. Chen 1 , Eugene A. Feinberg 2 1 Radar Division, Naval Research Laboratory, Code 5341, Washington DC 20375, USA, (202) 7673417 2 Department of Applied Mathematics and Statistics, State University of New York, Stony Brook, NY 117943600, USA, (631) 6327189 Received: date / Revised version: date Abstract This paper addresses constrained Markov decision processes, with expected discounted total cost criteria, which are controlled by non randomized policies. A dynamic programming approach is used to construct optimal policies. The convergence of the series of finite horizon value func tions to the infinite horizon value function is also shown. A simple example illustrating an application is presented. 1 Introduction This paper addresses constrained Markov decision processes (MDPs) with expected discounted total cost criteria and constraints which are controlled Send offprint requests to : Richard Chen 2 Richard C. Chen, Eugene A. Feinberg by policies which are restricted to be nonrandomized. The dynamic pro gramming approach introduced in [3,5] is extended. Specifically, this paper describes how to construct optimal policies by using the dynamic program ming equations presented in [5]. In [5], dynamic programming equations were introduced, and the infinite horizon dynamic programming operator was shown to be a contracting mapping, but methods for finding optimal policies were not presented. Additionally, for the class of nonrandomized policies, it is shown here that the series of finite horizon value functions converges to the infinite hori zon value function. For a particular problem, this fact was established in [2]. In view of the dynamic programming approach considered in this paper, the convergence of the series of finitehorizon value functions to the infinite horizon value function can be interpreted as value iteration for constrained MDPs. The convergence of another value iteration scheme follows from [5], in which it was shown that the infinite horizon dynamic programming oper ator corresponding to the constrained MDP is a contraction mapping. As a consequence, repeatedly composing it yields a series of functions that con verge to the infinite horizon optimal cost function. For randomized policies, the convergence of the series of finite horizon value functions to the infinite horizon value function was established in [1] for constrained MDPs by using a different approach. The dynamic programming approach to constrained MDPs has also been studied in [6] and [7]. In [6], it was utilized for optimization of the total NonRandomized Policies for Constrained Markov Decision Processes 3 expected costs subject to samplepath constraints. In [7], a dynamic pro gramming approach was applied to constrained MDPs with the expected total cost criteria as is the case here, although [7] considers the case of...
View
Full
Document
 Fall '11
 EugeneA.Feinberg
 Dynamic Programming, Division

Click to edit the document details