Volume
XXHI
THEORY
OF
PROBABILITY
AND
ITS
APPLICATIONS
1978
Number
2
THE
EXISTENCE
OF
A
STATIONARY
eOPTIMAL
POLICY
FOR
A
FINITE
MARKOV
CHAIN
E.
A.
FAINBERG
(Translated
by
K.
Durr)
In
this
paper
we
investigate
the
problem
of
optimal
control
of
a
Markov
chain
with
a
finite
number
of
states
when
the
control
sets
are
compact
in
the
metric
space.
The
goal
of
the
control
is
to
maximize
the
average
reward
per
unit
step.
For
the
case
of
finite
control
and
state
sets
the
existence
of
a
stationary
optimal
policy
was
proved
in
[1]
and
[2].
In
[3][5]
it
was
proved
that
for
a
controlled
Markov
process
with
finite
state
space,
compact
control
sets
and
continuous
reward
and
transition
functions
there
may
not
exist
an
optimal
policy.
In
this
paper
it
is
proved
that
if
the
state
space
is
finite,
the
control
sets
are
compact,
the
transition
functions
are
continuous
and
the
reward
functions are
upper
semicontinuous,
then
for
any
positive
e
there
exists
a
stationary
eoptimal
policy.
By
the
average
reward
one
can
understand
here
the
lower
as
well
as
the
upper
limit
of
the
average
reward
per
unit
step.
For
the
case
of
the
lower
limit
the
existence
of
the
stationary
eoptimal
policy
was
proved
in
[4].
Examples
in
[3]
and
[4]
show
that
if
the
above
restrictions
are
not
satisfied
on
the
control
sets,
the
transition
functions
and
the
reward
functions,
there
may
not
exist
a
stationary
eoptimal
policy
for
some
positive
e.
If
the
state
space
is
not
finite
then,
as
shows
the
example
in
[6],
there
may
not
be
a
stationary
eoptimal
policy
even
in
the
case
of
finite
control
sets.
Observe
that
if
the
number
of
states
is
two,
then,
according
to
[7],
under
the
assumptions
made
in
this
paper
there
exists
a
stationary
optimal
policy.
In
[7][9]
were
studied
sufficient
conditions
for
the
existence
of stationary
optimal
policies
imposing
certain
additional
(in
relation
to
the
requirements
of
the
present
paper)
restrictions
on
the
control
sets.
In
[8]
it
was
proved
that
for
compact
convex
control
sets
coinciding
with
the
sets
of
transition
probabilities
and
concave
continuous
reward
functions
there
exists
a
stationary
optimal
policy
if
any
stationary
policy
defines
an
ergodic
Markov
chain
without
transient
states.
In
[9]
it
was
shown
that
under
the
condition
derived
in
[8]
it
is
sufficient
to
require
that
not
any
but
rather
that
at
least
one
stationary
policy
define
an
ergodic
Markov
chain
without
transient
states.
In
[7]
two
sufficient
conditions
were
given.
These
conditions
consist
in
the
fact
that
in
addition
to
the
assumptions
of
the
present
paper
one
should
add
one
of
the
following
restrictions:
(i)
any
stationary
policy
defines
an
ergodic
Markov
chain
with
one
ergodic
class
and
possibly
with
transient
states;
(ii)
for
each
state
the
set
of
transition
probabilities
contains
a
finite
number
of
extreme
points.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
This is the end of the preview.
Sign up
to
access the rest of the document.
 Fall '11
 EugeneA.Feinberg
 Probability, Markov chain, ax, stationary eoptimal policy, E. A. FAINBERG

Click to edit the document details