MATHEMATICS OF OPERATIONS RESEARCH
Vol. 32, No. 4, November 2007, pp. 769–783
issn
0364-765X
±
eissn
1526-5471
±
07
±
3204
±
0769
inf
orms
®
doi
10.1287/moor.1070.0269
© 2007 INFORMS
Optimality Inequalities for Average Cost Markov Decision
Processes and the Stochastic Cash Balance Problem
Eugene A. Feinberg
Department of Applied Mathematics & Statistics, State University of New York at Stony Brook,
Stony Brook, New York 11794,
efeinberg@notes.cc.sunysb.edu
,
http://www.ams.sunysb.edu/
~
feinberg/
Mark E. Lewis
School of Operations Research & Industrial Engineering, Cornell University, 226 Rhodes Hall, Ithaca, New York 14853,
mark.lewis@cornell.edu
,
http://www.orie.cornell.edu/orie/people/faculty/pro±le.cfm?netid=mark.lewis
For general state and action space Markov decision processes, we present suf±cient conditions for the existence of solutions
of the average cost optimality inequalities. These conditions also imply the convergence of both the optimal discounted cost
value function and policies to the corresponding objects for the average costs per unit time case. Inventory models are natural
applications of our results. We describe structural properties of average cost optimal policies for the cash balance problem;
an inventory control problem where the demand may be negative and the decision-maker can produce or scrap inventory.
We also show the convergence of optimal thresholds in the ±nite horizon case to those under the expected discounted cost
criterion and those under the expected discounted costs to those under the average costs per unit time criterion.
Key words
: Markov decision process; average cost per unit time; optimality inequality; optimal policy; inventory control
MSC2000 subject classiﬁcation
: Primary: 90C40 (Markov and semi-Markov decision processes); secondary: 90B05
(inventory, storage, reservoirs)
OR/MS subject classiﬁcation
: Primary: dynamic programming/optimal control/Markov/in±nite state; secondary:
inventory/production/uncertainty/stochastic
History
: Received April 27, 2006; revised September 7, 2006.
1. Introduction.
In a discrete-time Markov decision process (MDP) the usual method to study the average
cost criterion is to ±nd a solution to the average cost optimality equations. A policy that achieves the minimum
in this system of equations is then average cost optimal. When the state and action spaces are in±nite, one
may be required to replace the equations with inequalities, yet the conclusions are the same; a policy that
achieves the minimum in the inequalities is average cost optimal. Schäl [
27
] provides two groups of general
conditions that imply the existence of a solution to the average cost optimality
inequalities
(ACOI). The ±rst
group, referred to as Assumptions
(W)
in Schäl [
27
], requires weak continuity of the transition probabilities.
The second group, Assumptions
(S)
, requires setwise continuity of the transition probabilities. In either case, for
each state a compact action set was assumed in Schäl [
27
]. The purpose of this paper is to adapt Schäl’s [
27
]
conditions to problems with noncompact action sets, in particular to those related to inventory control. As was