This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: VolumeXXHI T H E O R Y OF PROBABILITY A N D ITS APPLICATIONS 1978 Number2 THE EXISTENCE OF A STATIONARY eOPTIMAL POLICY FOR A FINITE MARKOV CHAIN E. A. FAINBERG (TranslatedbyK.Durr) Inthis paperwe investigatetheproblemofoptimalcontrolofaMarkov chain with a finite number of states when the control sets are compact in the metric space. The goal ofthe control isto maximize the average reward per unit step. For the case of finite control and state sets the existence of a stationary optimal policy was proved in [1] and [2]. In [3][5] it was proved that for a controlled Markov process with finite state space, compact control sets and continuous rewardandtransitionfunctionstheremay not existan optimalpolicy. In this paper it is proved that ifthe state space isfinite, the control sets are compact, the transition functions are continuous and the reward functions are uppersemicontinuous,then foranypositivee there existsastationaryeoptimal policy.By the average reward one can understand here thelower as well as the upperlimitoftheaveragerewardperunitstep.Forthecaseofthelower limitthe existence ofthestationaryeoptimal policywas proved in [4]. Examplesin[3]and[4]showthatiftheabove restrictionsarenotsatisfiedon thecontrolsets,thetransitionfunctions and therewardfunctions,theremay not exist a stationary eoptimal policy for some positive e. Ifthe state space isnot finitethen, as shows theexample in [6],theremay not be astationaryeoptimal policyeven inthecaseoffinitecontrolsets.Observethatifthenumberofstatesis two,then,accordingto [7], undertheassumptionsmade inthispaperthereexistsa stationaryoptimalpolicy. In[7][9]were studied sufficient conditions forthe existence ofstationary optimalpoliciesimposingcertainadditional(inrelationtotherequirementsofthe present paper) restrictions on the control sets. In [8] it was proved that for compactconvexcontrolsetscoincidingwiththesetsoftransitionprobabilitiesand concavecontinuousrewardfunctionsthereexistsastationaryoptimalpolicyifany stationarypolicydefinesanergodicMarkovchainwithouttransientstates.In[9]it wasshown thatundertheconditionderivedin[8]itissufficienttorequirethatnot any butratherthatatleastone stationarypolicydefine an ergodicMarkov chain without transientstates. In[7]twosufficientconditionswere given.Theseconditionsconsistinthefact thatinadditiontotheassumptionsofthepresentpaperone shouldaddoneofthe followingrestrictions: (i)any stationarypolicydefines an ergodicMarkov chain with one ergodicclassand possiblywith transientstates;(ii)foreachstatetheset oftransition probabilitiescontains a finitenumber of extreme points. 297 298 F. A. FAINBERG 1. Basic Definitions Let X {1, 2,..., s} be the state space. For each state xX letthere be given a control set Ax....
View
Full
Document
This note was uploaded on 12/06/2011 for the course MATH 101 taught by Professor Eugenea.feinberg during the Fall '11 term at State University of New York.
 Fall '11
 EugeneA.Feinberg
 Probability

Click to edit the document details