{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

FeinbergDet

FeinbergDet - Optimality of Deterministic Policies for...

This preview shows pages 1–3. Sign up to view the full content.

Optimality of Deterministic Policies for Certain Stochastic Control Problems with Multiple Criteria and Constraints Eugene A. Feinberg 1 State University of New York at Stony Brook, Stony Brook, NY 11794-3600 [email protected] For single-criterion stochastic control and sequential decision problems, op- timal policies, if they exist, are typically nonrandomized. For problems with multiple criteria and constraints, optimal nonrandomized policies may not exist and, if optimal policies exist, they are typically randomized. In this pa- per we discuss certain conditions that lead to optimality of nonrandomized policies. In the most interesting situations, these conditions do not impose convexity assumptions on the action sets and reward functions. 1 Introduction In many applications, the system performance is measured by multiple cri- teria. For example, in finance, such criteria measure returns and risks, in manufacturing such criteria may be production volumes, quality of outputs, and costs, in service operations performance criteria include service levels and operating costs. For problems with multiple criteria, the natural approach is to optimize one of the criteria subject to the inequality constraints on the other criteria. In other words, for a problem with K + 1 criteria W 0 ( π ) , W 1 ( π ) , . . . , W K ( π ) , where π is a policy, the natural approach is to find a policy π that is a solution to the following problem maximize W 0 ( π ) (1) subject to W k ( π ) C k , k = 1 , . . . , K, (2) where C 1 , . . . , C K are given numbers. For example, since it is possible to con- sider W k +1 ( π ) = - W k ( π ), this approach can be used to find policies satisfying interval constraints a W k ( π ) b. Optimal solutions of problem (1, 2), if they exist, are typically random- ized with the number of randomization procedures limited by the number of

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
2 Eugene A. Feinberg constraints K ; see [1, 16]. If there are no constraints, i.e. K = 0, optimal policies are nonrandomized. The following simple example illustrates that it is possible that any optimal policy for a constrained problem is randomized. Consider a one-step problem when a decision-maker chooses among two decisions a and b . There are two reward functions r 0 and r 1 defined as r 0 ( a ) = r 1 ( b ) = 0 and r 1 ( a ) = r 0 ( b ) = 1 . The decision-maker selects action a with probability π ( a ) and action b with probability π ( b ), where π ( a ) + π ( b ) = 1 . The criteria are W k ( π ) = π ( a ) r k ( a ) + π ( b ) r k ( b ) , k = 0 , 1 . Then the problem maximize W 0 ( π ) (3) subject to W 1 ( π ) 1 / 2 (4) is equivalent to the following linear program (LP) maximize π ( b ) subject to π ( a ) 1 / 2 , π ( a ) + π ( b ) = 1 , π ( a ) 0 , π ( b ) 0 . This LP has the unique optimal solution π ( a ) = π ( b ) = 1 / 2. Therefore, the optimal policy is randomized. In many applications, implementation of randomized policies is not nat- ural. In many cases, it is more natural to apply nonrandomized policies when they are optimal. In addition, it appears that the use of randomization pro- cedures increases the variance of the performance criteria. Also, from the computational point of view, finding the best randomized policy in many cases is easy, because this can be done by using linear programming. Finding
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}