ChenFeinberg2 - Mathematical Methods of Operations Research...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Mathematical Methods of Operations Research manuscript No. (will be inserted by the editor) Compactness of the Space of Non-Randomized Policies in Countable-State Sequential Decision Processes Richard C. Chen · Eugene A. Feinberg Received: date / Accepted: date Abstract For sequential decision processes with countable state spaces, we prove compactness of the set of strategic measures corresponding to nonrandomized poli- cies. For the Borel state case, this set may not be compact [14, p. 170] in spite of compactness of the set of strategic measures corresponding to all policies [17,2]. We use the compactness result from this paper to show the existence of optimal policies for countable-state constrained optimization of expected discounted and nonpositive rewards, when the optimality is considered within the class of nonrandomized poli- cies. This paper also studies the convergence of a value-iteration algorithm for such constrained problems. Keywords Markov Decision Processes · Compactness · Non-Randomized Policies 1 Introduction In many fields including engineering, physics, and economics, mathematical models are used to predict behaviors of systems so that they can be controlled in ways which realize desirable levels of performance. A Markov decision process (MDP) is such a mathematical construct that incorporates the notions of a system state and system dy- namics. In MDPs, the system dynamics are probabilistic in the sense that transitions from state to state are specified by given transition probabilities. These transition probabilities can be controlled according to a feedback control policy, whose aim is to optimize system performance according to given criteria. A sequential decision process (SDP) is a generalization of an MDP in which transition probabilities de- pend not only on the current state, but the entire past history of states and actions. Richard Chen Naval Research Laboratory, Code 5341, 4555 Overlook Ave. SW, Washington, DC 20375 Tel.: 202-767-3417, Fax: 202-404-8687, E-mail: richard.chen@nrl.navy.mil Eugene Feinberg Dept. of Applied Math. and Stat., State University of New York, Stony Brook, NY 11794-3600 2 MDPs and SDPs can be applied to diverse areas such as production, signal process- ing, telecommunications, and speech recognition. In general, a policy for an MDP or SDP may be randomized in the sense that the action taken does not depend deterministically on the current system state, but rather is chosen according to a probability distribution that is specified by the policy according to the system history. However, in some cases, it may be desirable as a matter of doctrine to restrict the space of feasible policies to non-randomized policies....
View Full Document

Page1 / 16

ChenFeinberg2 - Mathematical Methods of Operations Research...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online