This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: T H E O R Y OF PROBABILITY Vot.mxxv AND ITS APPLICATIONS V,,,nt, ert 1980 AN eOPTIMAL CONTROL OF A FINITE MARKOV CHAIN WITH AN AVERAGE REWARD CRITERION E. A. FAINBERG (TranslatedbyA. R.Kraiman 1. A controlledMarkov processwith discretetime isconsidered. The aim of the control isto maximize the hverage reward over one step. Two criteria are studied" thelowerlimitoftheaveragerewardover one step(Criterion 1) andthe upper limitoftheaverage reward overone step (Criterion2). InA. N. Shiryaev’s paper [1]a summary of theorems on the existence of optimal and eoptimal policies, depending on the properties of the state and control sets,was given in the form ofa table.This tableispresented below (for Criteria (1)and (2))takingintoaccount the resultsofwork inrecent years. The state setisdenoted by X, and thecontrol setinstate x by Ax. The resultsofour paper are indicatedinsquares IIIand IV ( 3). In [5], [7] examples were cited showing that in Case III, a randomized stationary eoptimal policy may not exist. InCase V the uniform boundedness from above ofthereward functionisassumed. The proofforthiscasecoincides with proofs in [11] (Lemma 4.1 and Theorem 8.1).This proof isalso valid for processeswhich areinhomogeneous intime (seethedefinition in [5]).In case V the question of existence for Criterion 2 of a nonrandomized semiMarkov (p,e)optimalpolicy isnot clear. In Case II the assumption is made that the reward functions are upper semicontinuous, and thetransitionprobabilitiesarecontinuous ina s Ax. Inthis paper, for the proof of assertions marked by the number III,the result of [8] (squareIIofthetable)isextendedtothecasewhen therewardfunctionscantake on a value equal to o. In a series of papers ([5],[12][17], and others) sufficient conditions are investigatedfortheexistenceofstationaryoptimalandeoptimalpoliciesinCases II,IV,V.In 18] an exampleiscitedshowingthatinCase IVtheremay notexista randomized stationaryeoptimalpolicy. 2. A controlledMarkovprocess(CMP)Z isdefinedby{X,A, qx(a),p (z ]a)}, where X isthe stateset;the controlset instate x isdenoted by A qx(a)isthe rewardreceivedinstatex with controla,and px(z[a) isthetransitionprobability over one step from x to z with control a,x,zsX,a s Ax. Itis assumed that X {1,2,...,s} isfinite, the sets Ax are Borel subsets of a Polish (complete separablemetric)space,the rewardlunctions qx(a andtransition]unctionsp (z la 7o A N eOPTIMAL C O N T R O L OF A FINITE M A R K O V CHAIN 71 TABLE Summary of Theorems on theExistence of Optimaland eOptimalPolicies Finite Countable Borelsubset of a Polish space Finite There exists a stationary optimal policy [2][4](also see [53,[6]). Compact Subsets ofa Polish Space 1. There exists a stationary eoptimal policy [7], [8]....
View
Full
Document
This note was uploaded on 12/06/2011 for the course MATH 101 taught by Professor Eugenea.feinberg during the Fall '11 term at State University of New York.
 Fall '11
 EugeneA.Feinberg
 Probability

Click to edit the document details