This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: THEORY PROBAB. APPL. Vol. 32, No. Translated from Russian Journal SUFFICIENT CLASSES OF STRATEGIES IN DISCRETE DYNAMIC PROGRAMMING II. LOCALLY STATIONARY STRATEGIES* E. A. FAINBERG (Translatedby Merle Ellis) 6. The main results. This paper isa continuation of[1].Throughout we examine ahomogeneous controlled Markov model d {X,A(.),p,r}withdiscretetime,count able state space X, sets of controls A(x), x X, transition function p and payoff function r. For the initial state x X and strategy r H, where H is the set of all strategies,thevalue ofthecriterion w(x) istheexpectation ofthetotalpayoffon the infinite horizon. The price ofthe model is denoted by v,and the price ofthe class ofstationary strategies S isdenoted by s. Ifthe payoff function isreplaced by itsnegative part r, thenthevalue ofthecriterionisdenoted by w;thecorresponding pricesare denoted by v_ and s_. v+ is defined similarly when r is replaced by r + (according to [2], v+ s+).Asbefore, we assume fulfilledthegeneralconvergence condition (4.7),which according to [3], [4] isequivalentto v+ < eo (if in some relation forthe functions the argument isomitted,thismeans thattherelationholds forallvaluesofthe argument). Since the price ofthe model coincides with the price ofthe class of nonrandomized strategies (this follows from Corollary 4.3), we shall understand throughout what follows by II the set ofnonrandomized strategies. For a strategy r II and fn Xoao" xnan / (X A) n+l, n 0, 1,. , we define the strategy/nrobtained from r ifthe control isperformed attime n + 1 and prior to this time the sequence of states and controls hn was observed, i.e., hr and forany prehistory hi Xoao" xi Hi (X xA) xX, O,1, , cri(hi) Tl’n+i+l(nhti), where h,,hi=xoao’"x,a, xoao’.’xi. By definition it is assumed that h_l= and For rII and h,Hn, n=0, 1,..., we put w(h,) w;’.’(x,), where /,_ is the projection of hn onto H,_, i.e., h,= h,,_xn. Thus w=(h,) is the expected total return under the strategy r from the step n under the condition of the prehistory h,. Let g" H [0;+[, where H n=o H,.The strategy r issaid to be persistently goptimal if w=(h,)>=v(x,)g(h,) forallh,=xoao. x,eH. With the exception of Theorem 8.4, this paper considers throughout persistently goptimal strategiesfor g(h,)= g(x,), i.e.,g’X [0;+[. For a function g" X  [; + ] we introduce the operators Pag(x) E P(ZlX,a)g(z), z G X Pg(x) sup{pag(x)" a A(x)}, Tag(x) r(x,a)+Pag(x), Received by the editors February 22, 1984. 435 436 E.A. FAINBERG Tg(x) sup T’g(x): a A(x)}, which are assumed to be defined for Pg+< oo.We denoteby q thesetofnonnegative functions g on X satisfying Pg< oo.It is well known that if v+ < oo,then v and v Tv. For g:X [oo;+oo[, g+ q3 and Y _ X we denote by Lo(g, Y) the set of all nonnegative functions on X such that: (i) l(x) 0 forx Y, (ii) l(x)> 0 and l(x) >= max {g(x),Pl(x)} for x X\ Y....
View
Full
Document
This note was uploaded on 12/06/2011 for the course MATH 101 taught by Professor Eugenea.feinberg during the Fall '11 term at State University of New York.
 Fall '11
 EugeneA.Feinberg
 Dynamic Programming

Click to edit the document details