{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

ChenFeinbergMMOR - Mathematical Methods of Operations...

Info icon This preview shows pages 1–4. Sign up to view the full content.

View Full Document Right Arrow Icon
Mathematical Methods of Operations Research manuscript No. (will be inserted by the editor) Non-Randomized Policies for Constrained Markov Decision Processes Richard C. Chen 1 , Eugene A. Feinberg 2 1 Radar Division, Naval Research Laboratory, Code 5341, Washington DC 20375, USA, (202) 767-3417 2 Department of Applied Mathematics and Statistics, State University of New York, Stony Brook, NY 11794-3600, USA, (631) 632-7189 Received: date / Revised version: date Abstract This paper addresses constrained Markov decision processes, with expected discounted total cost criteria, which are controlled by non- randomized policies. A dynamic programming approach is used to construct optimal policies. The convergence of the series of finite horizon value func- tions to the infinite horizon value function is also shown. A simple example illustrating an application is presented. 1 Introduction This paper addresses constrained Markov decision processes (MDPs) with expected discounted total cost criteria and constraints which are controlled Send offprint requests to : Richard Chen
Image of page 1

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
2 Richard C. Chen, Eugene A. Feinberg by policies which are restricted to be non-randomized. The dynamic pro- gramming approach introduced in [3,5] is extended. Specifically, this paper describes how to construct optimal policies by using the dynamic program- ming equations presented in [5]. In [5], dynamic programming equations were introduced, and the infinite horizon dynamic programming operator was shown to be a contracting mapping, but methods for finding optimal policies were not presented. Additionally, for the class of non-randomized policies, it is shown here that the series of finite horizon value functions converges to the infinite hori- zon value function. For a particular problem, this fact was established in [2]. In view of the dynamic programming approach considered in this paper, the convergence of the series of finite-horizon value functions to the infinite- horizon value function can be interpreted as value iteration for constrained MDPs. The convergence of another value iteration scheme follows from [5], in which it was shown that the infinite horizon dynamic programming oper- ator corresponding to the constrained MDP is a contraction mapping. As a consequence, repeatedly composing it yields a series of functions that con- verge to the infinite horizon optimal cost function. For randomized policies, the convergence of the series of finite horizon value functions to the infinite horizon value function was established in [1] for constrained MDPs by using a different approach. The dynamic programming approach to constrained MDPs has also been studied in [6] and [7]. In [6], it was utilized for optimization of the total
Image of page 2
Non-Randomized Policies for Constrained Markov Decision Processes 3 expected costs subject to sample-path constraints. In [7], a dynamic pro- gramming approach was applied to constrained MDPs with the expected total cost criteria as is the case here, although [7] considers the case of randomized policies versus the non-randomized policies assumed here. The
Image of page 3

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 4
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern