ChenFeinbergMMOR - Mathematical Methods of Operations...

Info iconThis preview shows pages 1–4. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Mathematical Methods of Operations Research manuscript No. (will be inserted by the editor) Non-Randomized Policies for Constrained Markov Decision Processes Richard C. Chen 1 , Eugene A. Feinberg 2 1 Radar Division, Naval Research Laboratory, Code 5341, Washington DC 20375, USA, (202) 767-3417 2 Department of Applied Mathematics and Statistics, State University of New York, Stony Brook, NY 11794-3600, USA, (631) 632-7189 Received: date / Revised version: date Abstract This paper addresses constrained Markov decision processes, with expected discounted total cost criteria, which are controlled by non- randomized policies. A dynamic programming approach is used to construct optimal policies. The convergence of the series of finite horizon value func- tions to the infinite horizon value function is also shown. A simple example illustrating an application is presented. 1 Introduction This paper addresses constrained Markov decision processes (MDPs) with expected discounted total cost criteria and constraints which are controlled Send offprint requests to : Richard Chen 2 Richard C. Chen, Eugene A. Feinberg by policies which are restricted to be non-randomized. The dynamic pro- gramming approach introduced in [3,5] is extended. Specifically, this paper describes how to construct optimal policies by using the dynamic program- ming equations presented in [5]. In [5], dynamic programming equations were introduced, and the infinite horizon dynamic programming operator was shown to be a contracting mapping, but methods for finding optimal policies were not presented. Additionally, for the class of non-randomized policies, it is shown here that the series of finite horizon value functions converges to the infinite hori- zon value function. For a particular problem, this fact was established in [2]. In view of the dynamic programming approach considered in this paper, the convergence of the series of finite-horizon value functions to the infinite- horizon value function can be interpreted as value iteration for constrained MDPs. The convergence of another value iteration scheme follows from [5], in which it was shown that the infinite horizon dynamic programming oper- ator corresponding to the constrained MDP is a contraction mapping. As a consequence, repeatedly composing it yields a series of functions that con- verge to the infinite horizon optimal cost function. For randomized policies, the convergence of the series of finite horizon value functions to the infinite horizon value function was established in [1] for constrained MDPs by using a different approach. The dynamic programming approach to constrained MDPs has also been studied in [6] and [7]. In [6], it was utilized for optimization of the total Non-Randomized Policies for Constrained Markov Decision Processes 3 expected costs subject to sample-path constraints. In [7], a dynamic pro- gramming approach was applied to constrained MDPs with the expected total cost criteria as is the case here, although [7] considers the case of...
View Full Document

Page1 / 24

ChenFeinbergMMOR - Mathematical Methods of Operations...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online