THEORY
PROBAB.
APPL.
Vol.
32,
No.
Translated
from
Russian
Journal
SUFFICIENT
CLASSES
OF
STRATEGIES
IN
DISCRETE
DYNAMIC
PROGRAMMING
II.
LOCALLY
STATIONARY
STRATEGIES*
E.
A.
FAINBERG
(Translated
by
Merle
Ellis)
6.
The
main
results.
This
paper
is
a
continuation
of
[1].
Throughout
we
examine
a
homogeneous
controlled
Markov
model
d
{X,
A(.
),
p,
r}
with
discrete
time,
count
able
state
space
X,
sets
of
controls
A(x),
x
X,
transition
function
p
and
payoff
function
r.
For
the
initial
state
x
X
and
strategy
r
H,
where
H
is
the
set
of
all
strategies,
the
value
of
the
criterion
w(x)
is
the
expectation
of
the
total
payoff
on
the
infinite
horizon.
The
price
of
the
model
is
denoted
by
v,
and
the
price
of
the
class
of
stationary
strategies
S
is
denoted
by
s.
If
the
payoff
function
is
replaced
by
its
negative
part
r,
then
the
value
of
the
criterion
is
denoted
by
w;
the
corresponding
prices
are
denoted
by
v_
and
s_.
v+
is
defined
similarly
when
r
is
replaced
by
r
+
(according
to
[2],
v+
s+).
As
before,
we
assume
fulfilled
the
general
convergence
condition
(4.7),
which
according
to
[3],
[4]
is
equivalent
to
v+
<
eo
(if
in
some
relation for
the
functions
the
argument
is
omitted,
this
means
that
the
relation
holds
for
all
values
of
the
argument).
Since
the
price
of
the
model
coincides
with
the
price
of
the
class
of
nonrandomized
strategies
(this
follows
from
Corollary
4.3),
we
shall
understand
throughout
what
follows
by
II
the
set
of
nonrandomized
strategies.
For
a
strategy
r
II
and
fn
Xoao"
xnan
/
(X
A)
n+l,
n
0,
1,.
,
we
define
the
strategy/nr
obtained
from
r
if
the
control
is
performed
at
time
n
+
1
and
prior
to
this
time
the
sequence
of
states
and
controls
hn
was
observed,
i.e.,
hr
and
for
any
prehistory
hi
Xoao"
xi
Hi
(X
x
A)
x
X,
O,
1,
,
cri(
h
i)
Tl’n+i+l(nhti),
where
h,,hi=xoao’"x,a,
xoao’.’xi.
By
definition
it
is
assumed
that
h_l=
and
For
rII
and
h,Hn,
n=0,
1,...,
we
put
w(h,)
w;’.’(x,),
where
/,_
is
the
projection
of
hn
onto
H,_,
i.e.,
h,=
h,,_xn.
Thus
w=(h,)
is
the
expected
total
return
under
the
strategy
r
from
the
step
n
under
the
condition
of
the
prehistory
h,.
Let
g"
H
[0;
+[,
where
H
n=o
H,.
The
strategy
r
is
said
to
be
persistently
goptimal
if
w=(h,)>=v(x,)g(h,)
forall
h,=xoao.
x,
eH.
With
the
exception
of
Theorem
8.4,
this
paper
considers
throughout
persistently
goptimal
strategies
for
g(h,)=
g(x,),
i.e.,
g’X
[0;
+[.
For
a
function
g"
X
[;
+]
we
introduce
the
operators
Pag(x)
E
P(ZlX,
a)g(z),
zGX
Pg(x)
sup
{pag(x)"
a
A(x)},
Tag(x)
r(x,
a)+
Pag(x),
Received
by
the
editors
February
22,
1984.
435
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
436
E.A.
FAINBERG
Tg(x)
sup
T’g(x):
a
A(x)},
which
are
assumed
to
be
defined
for
Pg+
<
oo.
We
denote
by
q
the
set
of nonnegative
functions
g
on
X
satisfying
Pg
<
oo.
It
is
well
known
that
if
v+
<
oo,
then
v
and
v
Tv.
For
g:X
[oo;
+oo[,
g+
q3
and
Y_
X
we
denote
by
Lo(g,
Y)
the
set
of
all
nonnegative
functions
on
X
such
that:
(i)
l(x)
0
for
x
Y,
(ii)
l(x)
>
0
and
l(x)
>=
max
{g(x),
Pl(x)}
for
x
X\
Y.
We
denote
by
L(g)
Lo(g,
)
the
set
of
positive
excessive
majorants
of
g.
This is the end of the preview.
Sign up
to
access the rest of the document.
 Fall '11
 EugeneA.Feinberg
 Dynamic Programming, Surround sound, Lemma, Corollary, l Lo

Click to edit the document details