This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Mathematical Methods of Operations Research manuscript No. (will be inserted by the editor) Compactness of the Space of Non-Randomized Policies in Countable-State Sequential Decision Processes Richard C. Chen · Eugene A. Feinberg Received: date / Accepted: date Abstract For sequential decision processes with countable state spaces, we prove compactness of the set of strategic measures corresponding to nonrandomized poli- cies. For the Borel state case, this set may not be compact [14, p. 170] in spite of compactness of the set of strategic measures corresponding to all policies [17,2]. We use the compactness result from this paper to show the existence of optimal policies for countable-state constrained optimization of expected discounted and nonpositive rewards, when the optimality is considered within the class of nonrandomized poli- cies. This paper also studies the convergence of a value-iteration algorithm for such constrained problems. Keywords Markov Decision Processes · Compactness · Non-Randomized Policies 1 Introduction In many fields including engineering, physics, and economics, mathematical models are used to predict behaviors of systems so that they can be controlled in ways which realize desirable levels of performance. A Markov decision process (MDP) is such a mathematical construct that incorporates the notions of a system state and system dy- namics. In MDPs, the system dynamics are probabilistic in the sense that transitions from state to state are specified by given transition probabilities. These transition probabilities can be controlled according to a feedback control policy, whose aim is to optimize system performance according to given criteria. A sequential decision process (SDP) is a generalization of an MDP in which transition probabilities de- pend not only on the current state, but the entire past history of states and actions. Richard Chen Naval Research Laboratory, Code 5341, 4555 Overlook Ave. SW, Washington, DC 20375 Tel.: 202-767-3417, Fax: 202-404-8687, E-mail: firstname.lastname@example.org Eugene Feinberg Dept. of Applied Math. and Stat., State University of New York, Stony Brook, NY 11794-3600 2 MDPs and SDPs can be applied to diverse areas such as production, signal process- ing, telecommunications, and speech recognition. In general, a policy for an MDP or SDP may be randomized in the sense that the action taken does not depend deterministically on the current system state, but rather is chosen according to a probability distribution that is specified by the policy according to the system history. However, in some cases, it may be desirable as a matter of doctrine to restrict the space of feasible policies to non-randomized policies....
View Full Document
- Fall '11