# TV1980 - Vot.mxxv THEORY OF PROBABILITY AND ITS...

This preview shows pages 1–3. Sign up to view the full content.

THEORY OF PROBABILITY Vot.mxxv AND ITS APPLICATIONS V,,,nt, ert 1980 AN e-OPTIMAL CONTROL OF A FINITE MARKOV CHAIN WITH AN AVERAGE REWARD CRITERION E. A. FAINBERG (Translated by A. R. Kraiman 1. A controlled Markov process with discrete time is considered. The aim of the control is to maximize the hverage reward over one step. Two criteria are studied" the lower limit of the average reward over one step (Criterion 1) and the upper limit of the average reward over one step (Criterion 2). In A. N. Shiryaev’s paper [1] a summary of theorems on the existence of optimal and e-optimal policies, depending on the properties of the state and control sets, was given in the form of a table. This table is presented below (for Criteria (1) and (2)) taking into account the results of work in recent years. The state set is denoted by X, and the control set in state x by Ax. The results of our paper are indicated in squares III and IV ( 3). In [5], [7] examples were cited showing that in Case III, a randomized stationary e-optimal policy may not exist. In Case V the uniform boundedness from above of the reward function is assumed. The proof for this case coincides with proofs in [11] (Lemma 4.1 and Theorem 8.1). This proof is also valid for processes which are inhomogeneous in time (see the definition in [5]). In case V the question of existence for Criterion 2 of a nonrandomized semi-Markov (p, e)-optimal policy is not clear. In Case II the assumption is made that the reward functions are upper semicontinuous, and the transition probabilities are continuous in a s Ax. In this paper, for the proof of assertions marked by the number III, the result of [8] (square II of the table) is extended to the case when the reward functions can take on a value equal to -o. In a series of papers ([5], [12]-[17], and others) sufficient conditions are investigated for the existence of stationary optimal and e-optimal policies in Cases II, IV, V. In 18] an example is cited showing that in Case IV there may not exist a randomized stationary e-optimal policy. 2. A controlled Markov process (CMP) Z is defined by {X, A, qx (a ), p (z ]a )}, where X is the state set; the control set in state x is denoted by A qx(a) is the reward received in state x with control a, and px(z[a) is the transition probability over one step from x to z with control a, x, z s X, a s Ax. It is assumed that X {1, 2,..., s} is finite, the sets Ax are Borel subsets of a Polish (complete separable metric)space, the rewardlunctions qx (a and transition ]unctions p (z la 7o

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
AN e-OPTIMAL CONTROL OF A FINITE MARKOV CHAIN 71 TABLE Summary of Theorems on the Existence of Optimal and e-Optimal Policies Finite Countable Borel subset of a Polish space Finite There exists a stationary optimal policy [2]-[4] (also see [53, [6]). Compact Subsets of a Polish Space 1. There exists a stationary e-optimal policy [7], [8].
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### What students are saying

• As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

Kiran Temple University Fox School of Business ‘17, Course Hero Intern

• I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

Dana University of Pennsylvania ‘17, Course Hero Intern

• The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

Jill Tulane University ‘16, Course Hero Intern