Optimality of Deterministic Policies for
Certain Stochastic Control Problems with
Multiple Criteria and Constraints
Eugene A. Feinberg
1
State University of New York at Stony Brook, Stony Brook, NY 117943600
[email protected]
For singlecriterion stochastic control and sequential decision problems, op
timal policies, if they exist, are typically nonrandomized. For problems with
multiple criteria and constraints, optimal nonrandomized policies may not
exist and, if optimal policies exist, they are typically randomized. In this pa
per we discuss certain conditions that lead to optimality of nonrandomized
policies. In the most interesting situations, these conditions do not impose
convexity assumptions on the action sets and reward functions.
1 Introduction
In many applications, the system performance is measured by multiple cri
teria. For example, in finance, such criteria measure returns and risks, in
manufacturing such criteria may be production volumes, quality of outputs,
and costs, in service operations performance criteria include service levels and
operating costs.
For problems with multiple criteria, the natural approach is to optimize
one of the criteria subject to the inequality constraints on the other criteria.
In other words, for a problem with
K
+ 1 criteria
W
0
(
π
)
, W
1
(
π
)
, . . . , W
K
(
π
)
,
where
π
is a policy, the natural approach is to find a policy
π
that is a solution
to the following problem
maximize
W
0
(
π
)
(1)
subject to
W
k
(
π
)
≥
C
k
,
k
= 1
, . . . , K,
(2)
where
C
1
, . . . , C
K
are given numbers. For example, since it is possible to con
sider
W
k
+1
(
π
) =

W
k
(
π
), this approach can be used to find policies satisfying
interval constraints
a
≤
W
k
(
π
)
≤
b.
Optimal solutions of problem (1, 2), if they exist, are typically random
ized with the number of randomization procedures limited by the number of
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
2
Eugene A. Feinberg
constraints
K
; see [1, 16]. If there are no constraints, i.e.
K
= 0, optimal
policies are nonrandomized. The following simple example illustrates that it
is possible that any optimal policy for a constrained problem is randomized.
Consider a onestep problem when a decisionmaker chooses among two
decisions
a
and
b
. There are two reward functions
r
0
and
r
1
defined as
r
0
(
a
) =
r
1
(
b
) = 0 and
r
1
(
a
) =
r
0
(
b
) = 1
.
The decisionmaker selects action
a
with
probability
π
(
a
) and action
b
with probability
π
(
b
), where
π
(
a
) +
π
(
b
) = 1
.
The criteria are
W
k
(
π
) =
π
(
a
)
r
k
(
a
) +
π
(
b
)
r
k
(
b
)
, k
= 0
,
1
.
Then the problem
maximize
W
0
(
π
)
(3)
subject to
W
1
(
π
)
≥
1
/
2
(4)
is equivalent to the following linear program (LP)
maximize
π
(
b
)
subject to
π
(
a
)
≥
1
/
2
,
π
(
a
) +
π
(
b
) = 1
,
π
(
a
)
≥
0
, π
(
b
)
≥
0
.
This LP has the unique optimal solution
π
(
a
) =
π
(
b
) = 1
/
2. Therefore, the
optimal policy is randomized.
In many applications, implementation of randomized policies is not nat
ural. In many cases, it is more natural to apply nonrandomized policies when
they are optimal. In addition, it appears that the use of randomization pro
cedures increases the variance of the performance criteria. Also, from the
computational point of view, finding the best randomized policy in many
cases is easy, because this can be done by using linear programming. Finding
This is the end of the preview.
Sign up
to
access the rest of the document.
 Fall '11
 EugeneA.Feinberg
 Markov chain, Markov decision process, Eugene A. Feinberg, stationary policy

Click to edit the document details