ChenFeinberg2 - Mathematical Methods of Operations Research...

Info icon This preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
Mathematical Methods of Operations Research manuscript No. (will be inserted by the editor) Compactness of the Space of Non-Randomized Policies in Countable-State Sequential Decision Processes Richard C. Chen · Eugene A. Feinberg Received: date / Accepted: date Abstract For sequential decision processes with countable state spaces, we prove compactness of the set of strategic measures corresponding to nonrandomized poli- cies. For the Borel state case, this set may not be compact [14, p. 170] in spite of compactness of the set of strategic measures corresponding to all policies [17,2]. We use the compactness result from this paper to show the existence of optimal policies for countable-state constrained optimization of expected discounted and nonpositive rewards, when the optimality is considered within the class of nonrandomized poli- cies. This paper also studies the convergence of a value-iteration algorithm for such constrained problems. Keywords Markov Decision Processes · Compactness · Non-Randomized Policies 1 Introduction In many fields including engineering, physics, and economics, mathematical models are used to predict behaviors of systems so that they can be controlled in ways which realize desirable levels of performance. A Markov decision process (MDP) is such a mathematical construct that incorporates the notions of a system state and system dy- namics. In MDPs, the system dynamics are probabilistic in the sense that transitions from state to state are specified by given transition probabilities. These transition probabilities can be controlled according to a feedback control policy, whose aim is to optimize system performance according to given criteria. A sequential decision process (SDP) is a generalization of an MDP in which transition probabilities de- pend not only on the current state, but the entire past history of states and actions. Richard Chen Naval Research Laboratory, Code 5341, 4555 Overlook Ave. SW, Washington, DC 20375 Tel.: 202-767-3417, Fax: 202-404-8687, E-mail: [email protected] Eugene Feinberg Dept. of Applied Math. and Stat., State University of New York, Stony Brook, NY 11794-3600
Image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
2 MDPs and SDPs can be applied to diverse areas such as production, signal process- ing, telecommunications, and speech recognition. In general, a policy for an MDP or SDP may be randomized in the sense that the action taken does not depend deterministically on the current system state, but rather is chosen according to a probability distribution that is specified by the policy according to the system history. However, in some cases, it may be desirable as a matter of doctrine to restrict the space of feasible policies to non-randomized policies. We adopt this view here and assume that optimization is to be done over the space of non-randomized policies.
Image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern