Baykal Gursoy - Proceedings of the 38" Conference on...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
TuMO5 15:OO Proceedings of the 38" Conference on Decision & Control Phoenix, Arizona USA December 1999 UNDISCOUNTED TWO-PERSON ZERO-SUM COMMUNICATING STOCHASTIC GAMES M. Baykal-Gursoy Department of IE Rutgers, The State University of New Jersey gursoy@rci.rutgers.edu Abstract Consider tw-person zero-sum communicating stochas- tic games with finite state and finite action spaces un- der the long-run average payoff criterion. A commu- nicating game is irreducible on a restricted strategy space where every pair of action is taken with positive probability. The proposed approach applies Hoffman and Karp's algorithm for irreducible games successively over a sequence of restricted strategy spaces that gets larger until an €-optimal stationary policy pair is ob- tained for any E > 0. This algorithm is convergent for the games that have optimal strategies with a value independent of the initial state. 1 Introduction Two-person zero-sum communicating stochastic games are studied under the average payoff criterion. Stochas- tic games are played sequentially. At each epoch, the game visits one of finitely many states and the play- ers play a matrix game that depends on the current state. In particular, after observing the current state each player takes one of finitely many actions that are available in that state. An instantaneous payoff made by player I1 to player I and the transition probabilities to the state visited in the next epoch are all determined as a function of the current state and the action pair taken by the players. At every state, each player is aware of his opponent's available actions, and payoff amounts and transition probabilities corresponding to the alternative action pairs. Depending on the payoff criterion, the objective of player I is to maximize his re- ward whereas player I1 aims to minimize his loss. The amount of payoff on which the players agree is called the value of the game and the corresponding strate- gies are called the equilibrium strategies. The optimal strategies of the players are such that a player can not make his payoff better by a unilateral change in his strategy. Optimal strategies are known to be located in the class of behavior strategies [l]. A subclass of behavior strategies is called stationary strategies. Sta- tionary strategies depend only on the current state. 0-7803-5250-5/99/$10.00 0 1999 IEEE 576 and 2. M. Avgar Dept. of Operations Planning and Control Eindhoven University of Technology Z.M.Avsar@tm.tue.nl A stationary strategy that assigns only one action at each state is called a pure strategy. Just as in Markov chains, games display ergodic structures. A stochas tic game is called irreducible(unichain) if the Markov chain generated under every pure strategy pair is ir- reducible(unichain). A stochastic game is called com- municating if for each origin destination pair of states (i,j) there exists at least one pure strategy pair that makes destination j accessible from origin i.
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 6

Baykal Gursoy - Proceedings of the 38" Conference on...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online