This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Optimality of Deterministic Policies for Certain Stochastic Control Problems with Multiple Criteria and Constraints Eugene A. Feinberg 1 State University of New York at Stony Brook, Stony Brook, NY 11794-3600 [email protected] For single-criterion stochastic control and sequential decision problems, op- timal policies, if they exist, are typically nonrandomized. For problems with multiple criteria and constraints, optimal nonrandomized policies may not exist and, if optimal policies exist, they are typically randomized. In this pa- per we discuss certain conditions that lead to optimality of nonrandomized policies. In the most interesting situations, these conditions do not impose convexity assumptions on the action sets and reward functions. 1 Introduction In many applications, the system performance is measured by multiple cri- teria. For example, in finance, such criteria measure returns and risks, in manufacturing such criteria may be production volumes, quality of outputs, and costs, in service operations performance criteria include service levels and operating costs. For problems with multiple criteria, the natural approach is to optimize one of the criteria subject to the inequality constraints on the other criteria. In other words, for a problem with K + 1 criteria W ( π ) ,W 1 ( π ) ,...,W K ( π ) , where π is a policy, the natural approach is to find a policy π that is a solution to the following problem maximize W ( π ) (1) subject to W k ( π ) ≥ C k , k = 1 ,...,K, (2) where C 1 ,...,C K are given numbers. For example, since it is possible to con- sider W k +1 ( π ) =- W k ( π ), this approach can be used to find policies satisfying interval constraints a ≤ W k ( π ) ≤ b. Optimal solutions of problem (1, 2), if they exist, are typically random- ized with the number of randomization procedures limited by the number of 2 Eugene A. Feinberg constraints K ; see [1, 16]. If there are no constraints, i.e. K = 0, optimal policies are nonrandomized. The following simple example illustrates that it is possible that any optimal policy for a constrained problem is randomized. Consider a one-step problem when a decision-maker chooses among two decisions a and b . There are two reward functions r and r 1 defined as r ( a ) = r 1 ( b ) = 0 and r 1 ( a ) = r ( b ) = 1 . The decision-maker selects action a with probability π ( a ) and action b with probability π ( b ), where π ( a ) + π ( b ) = 1 . The criteria are W k ( π ) = π ( a ) r k ( a ) + π ( b ) r k ( b ) , k = 0 , 1 . Then the problem maximize W ( π ) (3) subject to W 1 ( π ) ≥ 1 / 2 (4) is equivalent to the following linear program (LP) maximize π ( b ) subject to π ( a ) ≥ 1 / 2 , π ( a ) + π ( b ) = 1 , π ( a ) ≥ ,π ( b ) ≥ ....
View Full Document
This note was uploaded on 12/06/2011 for the course MATH 101 taught by Professor Eugenea.feinberg during the Fall '11 term at State University of New York.
- Fall '11