THEORY
OF
PROBABILITY
Vot,,m
XXVH
A
N
D
I
T
S
A
P
P
L
I
C
A
T
I
0
N
S
m
t
1982
NON-RANDOMIZED
MARKOV
AND
SEMI-MARKOV
STRATEGIES
IN
DYNAMIC
PROGRAMMING
E.
A.
FAINBERG
(Translated
by
W.
U.
Sirk
1.
Introduction
In
a
non-homogeneous
controllable
Markov
model
with
a
total
reward
criterion,
discrete
time,
infinite
horizon
and
Borel
spaces
of
states
and
controls,
let
a
certain
strategy
7r
and
an
initial
measure
/x
be
given.
In
the
paper
the
following
two
statements
are
proved:
(a)
(Theorem
3)
for
any
K
<
+oo,
there
exists
a
non-randomized
Markov
strategy
q
such
that
>
w(,
7r)
if
w(/x,
rr)<
+,
1)
w
(/.,
K
if
w(tx,
7r)=
(b)
(Theorem
4)
for
any
measurable
function
K
(x)<
+oo
given
on
a
set
of
initial
states
X0,
there
exists
a
non-randomized
semi-Markov
strategy
q’
such
that,
for
any
x
X0,
>
J
w(x,
r)
if
w(x,
7r)
<
+o,
(2)
w(x,
q
)
[
K
(x),
if
w
(x,
r)
+c.
The
quantities
w(/,
r)
and
w(x,
7r)
are
the
expectations
of
total
reward
in
the
case
of
the
strategy
7r
and
initial
measure/x,
and
initial
state
x,
respectively.
Controllable
Markov
models
with
Borel
state
spaces,
as
well
as
problems
of
existence
of
Markov
and
semi-Markov
strategies
in
such
models
which
majorize
arbitrary
strategies,
were
studied
for
the
first
time
by
Blackwall
[1],
[2].
These
investigations
were
continued
by
Strauch
[3],
where
three
cases
were
considered:
positive
(P)
and
negative
(N)
dynamic
programming,
as
well
as
dynamic
programming
with
discounting
(D).
For
the
cases
D
and
N
it
was
proved,
as
one
of
the
fundamental
results
of
the
investigation
[3],
Theorem
4.3],
that
non-randomized
Markov
strategies
q
and
semi-Markov
strategies
q’
such
that
w
(ix,
q)
->
w
(/x,
r)
and
w
(x,
o’)
=>
w
(x,
r)
for
all
initial
states
x
exist.
In
all
three
cases,
D,
N
and
P,
it
was
assumed
in
[3]
that
w
(,
r)<
+o
for
all/x
and
zr,
and
in
view
of
this
the
constant
K
and
the
function
K
(x)
were
not
considered.
For
the
case
P
(cf.
[3],
Theorem
4.4),
existence
of
non-randomized
Markov
strategies
q
and
semi-Markov
strategies
q’,
such
that
w
(,
0)->
w
(/x,
zr)-e
and
w(x,
o’)>=w(x,
zr)-e
for
all
initial
states
x,
was
proved
for
any
e
>0.
In
[3]
it
116