This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: VolumeXXVII T H E O R Y OF PROBABILITY AND ITS APPLICATIONS 1982 Number3 CONTROLLED MARKOV PROCESSES WITH ARBITRARY NUMERICAL CRITERIA E. A. FAINBERG (Translatedby W. U.Sirk 1. Introduction Inthe theory of controlled Markov processes with discrete time we study, as a rule, controlled processes either with the total reward criterion or with criteriaformean reward perunittime. The theory of controlled Markov processes with Borel state and control spaces inthe case of the totalreward criterionwas developed by D. Blackwell , and R.Strauch.In -thebasicresultsoftheinvestigations- were extended to nonhomogeneous models, the foundations of the theory of which were laidin-.The study ofcontrolledprocesses with mean reward criteriawas started at the end ofthe 1950s. The firstfundamental resultswere obtained in the publications by R. Howard ,C. Derman ,,O. V. Viskov and A. N. Shiryaev . Inaddition, there existpublications (forexample, -)inwhich con- trolledprocesseswithothernumericalcriteriaareinvestigated.There alsoexists a number of works in which the value of the criterion constitutes a finite- dimensionalorinfinite-dimensionalvector (forexample, -),orthevalue of the criterion is not computed, but a rule is given according to which one strategyismore preferable than others (forexample, -). In connection with the existing variety of criteria and methods of their investigation, the problem arises, concerning development of general methods fortheinvestigationofallor individualgroups ofcriteria. One such group of criteria, the so-called expected utility criteria, were studied in , -. In this case the criterion is the expectation of a functional specified on the trajectory space of the process. The total reward criterion is a particular case of the expected utility criterion. When expected utilitycriteriaare investigated, additionalconditionsto those of - guaran- teeing existence of optimal strategies are imposed as a rule on the model. Regrettably, criteria of mean reward per single step are not expected utility criteria. In the present paper we consider nonhomogeneous Borel models with discrete time and nonbounded horizon. We investigate arbitrary numerical criteria,i.e.,criteriathevaluesofwhich aregivenbynumericalfunctionsdefined 486 M A R K O V PROCESSES WITH N U M E R I C A L CRITERIA 487 on thespace ofstrategicmeasures. We introducethreepropertiesofa criterion" measurability,convexityanddecomposability(Definitions2.1-2.3).We establish thatfrom thesepropertiesofacriterionfollowstheexistenceofnonrandomized strategiesand nonrandomized Markov strategies,whileinthecaseofa specified initialmeasure therefollowsexistenceofnonrandomized Markov strategiesthat arecloseto optimalstrategies.Thus, inthecaseofaparticularcriterion,forthe proof of existence of the strategies mentioned above, itissufficient to verify thatthecriterionpossessescertainproperties....
View Full Document
- Fall '11
- Probability, Probability theory, Markov chain, Andrey Markov, Markov decision process, Lebesgue integration