TV1980 - T H E O R Y OF PROBABILITY Vot.mxxv AND ITS...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: T H E O R Y OF PROBABILITY Vot.mxxv AND ITS APPLICATIONS V,,,nt, ert 1980 AN e-OPTIMAL CONTROL OF A FINITE MARKOV CHAIN WITH AN AVERAGE REWARD CRITERION E. A. FAINBERG (TranslatedbyA. R.Kraiman 1. A controlledMarkov processwith discretetime isconsidered. The aim of the control isto maximize the hverage reward over one step. Two criteria are studied" thelowerlimitoftheaveragerewardover one step(Criterion 1) andthe upper limitoftheaverage reward overone step (Criterion2). InA. N. Shiryaev’s paper [1]a summary of theorems on the existence of optimal and e-optimal policies, depending on the properties of the state and control sets,was given in the form ofa table.This tableispresented below (for Criteria (1)and (2))takingintoaccount the resultsofwork inrecent years. The state setisdenoted by X, and thecontrol setinstate x by Ax. The resultsofour paper are indicatedinsquares IIIand IV ( 3). In [5], [7] examples were cited showing that in Case III, a randomized stationary e-optimal policy may not exist. InCase V the uniform boundedness from above ofthereward functionisassumed. The proofforthiscasecoincides with proofs in [11] (Lemma 4.1 and Theorem 8.1).This proof isalso valid for processeswhich areinhomogeneous intime (seethedefinition in [5]).In case V the question of existence for Criterion 2 of a nonrandomized semi-Markov (p,e)-optimalpolicy isnot clear. In Case II the assumption is made that the reward functions are upper semicontinuous, and thetransitionprobabilitiesarecontinuous ina s Ax. Inthis paper, for the proof of assertions marked by the number III,the result of [8] (squareIIofthetable)isextendedtothecasewhen therewardfunctionscantake on a value equal to -o. In a series of papers ([5],[12]-[17], and others) sufficient conditions are investigatedfortheexistenceofstationaryoptimalande-optimalpoliciesinCases II,IV,V.In 18] an exampleiscitedshowingthatinCase IVtheremay notexista randomized stationarye-optimalpolicy. 2. A controlledMarkovprocess(CMP)Z isdefinedby{X,A, qx(a),p (z ]a)}, where X isthe stateset;the controlset instate x isdenoted by A qx(a)isthe rewardreceivedinstatex with controla,and px(z[a) isthetransitionprobability over one step from x to z with control a,x,zsX,a s Ax. Itis assumed that X {1,2,...,s} isfinite, the sets Ax are Borel subsets of a Polish (complete separablemetric)space,the rewardlunctions qx(a andtransition]unctionsp (z la 7o A N e-OPTIMAL C O N T R O L OF A FINITE M A R K O V CHAIN 71 TABLE Summary of Theorems on theExistence of Optimaland e-OptimalPolicies Finite Countable Borelsubset of a Polish space Finite There exists a stationary optimal policy [2]-[4](also see [53,[6]). Compact Subsets ofa Polish Space 1. There exists a stationary e-optimal policy [7], [8]....
View Full Document

This note was uploaded on 12/06/2011 for the course MATH 101 taught by Professor Eugenea.feinberg during the Fall '11 term at State University of New York.

Page1 / 12

TV1980 - T H E O R Y OF PROBABILITY Vot.mxxv AND ITS...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online