This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: T H E O R Y OF PROBABILITY Vot,,mXXVH A N D IT S A PP L I C A TI N S m t 1982 NONRANDOMIZED MARKOV AND SEMIMARKOV STRATEGIES IN DYNAMIC PROGRAMMING E. A. FAINBERG (Translatedby W. U.Sirk 1. Introduction In a nonhomogeneous controllable Markov model with a total reward criterion,discretetime, infinitehorizon and Borel spaces ofstates and controls, let a certain strategy 7r and an initial measure /x be given. In the paper the followingtwo statements areproved: (a) (Theorem 3) for any K < +oo, there exists a nonrandomized Markov strategyq such that > w(, 7r) if w(/x,rr)<+ , 1) w (/., K if w(tx,7r)= (b) (Theorem 4)forany measurable function K (x)<+oo given on a set of initial states X0, there exists a nonrandomized semiMarkov strategy q’ such that,for any x X0, > J w(x,r) if w(x,7r)< +o, (2) w(x, q ) [ K (x), ifw (x, r) +c. The quantities w(/,r) and w(x,7r)are the expectations of totalreward inthe caseofthestrategy 7r and initialmeasure/x, and initialstatex,respectively. ControllableMarkov models with Borelstatespaces,aswellasproblems of existenceofMarkov and semiMarkov strategiesinsuchmodels which majorize arbitrary strategies, were studied for the first time by Blackwall [1], [2]. These investigations were continued by Strauch [3], where three cases were considered: positive (P) and negative (N) dynamic programming, as well as dynamic programming with discounting (D). For the cases D and N it was proved, asone ofthefundamentalresultsoftheinvestigation[3],Theorem 4.3], that nonrandomized Markov strategies q and semiMarkov strategies q’ such that w (ix,q)> w (/x,r) and w (x,o’) => w (x,r) for allinitial states x exist. In all three cases, D, N and P,itwas assumed in[3]that w (, r)< +o for all/x and zr, and inviewofthistheconstantK andthefunctionK (x)were not considered. For the case P (cf. [3],Theorem 4.4), existence of nonrandomized Markov strategiesq and semiMarkov strategiesq’,suchthat w (, 0)>w (/x,zr)e and w(x,o’)>=w(x,zr)e for allinitialstatesx,was proved for any e >0. In [3]it 116 N O N  R A N D O M I Z E D M A R K O V A N D S E M I  M A R K O V STRATEGIES 117 was pointed out that itisnot known whether the last result istrue for e 0. (We note thatinthe formulation ofthe problem itwas assumed in [3]that the initialmeasure isconcentratedatasinglepoint.The caseofan arbitraryinitial measure/x, for the firsttime considered by Hinderer [4], does not introduce additional difficulties.) Homogeneous models were considered in [1][3]. The concept of non homogeneous controllablemodels arose asaresultoftheinvestigations [5][7]. In [4],[8] and [9]a considerable partoftheinvestigations [1][3] was extended tothecaseofnonhomogeneousmodels,with abroaderclassofincomefunctions beinginvestigatedin [4] and [9]than in[1][3].Forweak conditionstheresults onexistenceofanonrandomizedMarkovstrategyinthenonhomogeneouscase, which majorizesan arbitrarystrategy,ispresentedin[9]Chapt. 5, 1,Statement II.Alsothere, forthecasew(/x, 7r) < +, thequestion...
View
Full
Document
This note was uploaded on 12/06/2011 for the course MATH 101 taught by Professor Eugenea.feinberg during the Fall '11 term at State University of New York.
 Fall '11
 EugeneA.Feinberg
 Dynamic Programming, Probability

Click to edit the document details