TV1987 - THEORY PROBAB. APPL. Vol. 32, No. Translated from...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: THEORY PROBAB. APPL. Vol. 32, No. Translated from Russian Journal SUFFICIENT CLASSES OF STRATEGIES IN DISCRETE DYNAMIC PROGRAMMING II. LOCALLY STATIONARY STRATEGIES* E. A. FAINBERG (Translatedby Merle Ellis) 6. The main results. This paper isa continuation of[1].Throughout we examine ahomogeneous controlled Markov model d {X,A(.),p,r}withdiscretetime,count- able state space X, sets of controls A(x), x X, transition function p and payoff function r. For the initial state x X and strategy r H, where H is the set of all strategies,thevalue ofthecriterion w(x) istheexpectation ofthetotalpayoffon the infinite horizon. The price ofthe model is denoted by v,and the price ofthe class ofstationary strategies S isdenoted by s. Ifthe payoff function isreplaced by itsnegative part r-, thenthevalue ofthecriterionisdenoted by w-;thecorresponding pricesare denoted by v_ and s_. v+ is defined similarly when r is replaced by r + (according to [2], v+ s+).Asbefore, we assume fulfilledthegeneralconvergence condition (4.7),which according to [3], [4] isequivalentto v+ < eo (if in some relation forthe functions the argument isomitted,thismeans thattherelationholds forallvaluesofthe argument). Since the price ofthe model coincides with the price ofthe class of nonrandomized strategies (this follows from Corollary 4.3), we shall understand throughout what follows by II the set ofnonrandomized strategies. For a strategy r II and fn Xoao" xnan / (X A) n+l, n 0, 1,. , we define the strategy/nrobtained from r ifthe control isperformed attime n + 1 and prior to this time the sequence of states and controls hn was observed, i.e., hr and forany prehistory hi Xoao" xi Hi (X xA) xX, O,1, , cri(hi)-- Tl’n+i+l(nhti), where h,,hi=xoao’"x,a, xoao’.’xi. By definition it is assumed that h_l= and For rII and h,Hn, n=0, 1,..., we put w(h,) w;’.-’(x,), where /,_ is the projection of hn onto H,_, i.e., h,= h,,_xn. Thus w=(h,) is the expected total return under the strategy r from the step n under the condition of the prehistory h,. Let g" H [0;+[, where H n=o H,.The strategy r issaid to be persistently g-optimal if w=(h,)>=v(x,)-g(h,) forallh,=xoao. x,eH. With the exception of Theorem 8.4, this paper considers throughout persistently g-optimal strategiesfor g(h,)= g(x,), i.e.,g’X [0;+[. For a function g" X - [-; + ] we introduce the operators Pag(x) E P(ZlX,a)g(z), z G X Pg(x) sup{pag(x)" a A(x)}, Tag(x) r(x,a)+Pag(x), Received by the editors February 22, 1984. 435 436 E.A. FAINBERG Tg(x) sup T’g(x): a A(x)}, which are assumed to be defined for Pg+< oo.We denoteby q thesetofnon-negative functions g on X satisfying Pg< oo.It is well known that if v+ < oo,then v and v Tv. For g:X [-oo;+oo[, g+ q3 and Y _ X we denote by Lo(g, Y) the set of all non-negative functions on X such that: (i) l(x) 0 forx Y, (ii) l(x)> 0 and l(x) >= max {g(x),Pl(x)} for x X\ Y....
View Full Document

This note was uploaded on 12/06/2011 for the course MATH 101 taught by Professor Eugenea.feinberg during the Fall '11 term at State University of New York.

Page1 / 14

TV1987 - THEORY PROBAB. APPL. Vol. 32, No. Translated from...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online