This preview shows pages 1–2. Sign up to view the full content.
TuMO5
15:OO
Proceedings of the 38"
Conference on Decision
&
Control
Phoenix, Arizona
USA
December 1999
UNDISCOUNTED TWOPERSON ZEROSUM
COMMUNICATING STOCHASTIC
GAMES
M. BaykalGursoy
Department of IE
Rutgers, The State University of New Jersey
gursoy@rci.rutgers.edu
Abstract
Consider twperson zerosum communicating stochas
tic games with finite state and finite action spaces un
der the longrun average payoff criterion.
A
commu
nicating game is irreducible on a restricted strategy
space where every pair of action is taken with positive
probability. The proposed approach applies Hoffman
and Karp's algorithm for irreducible games successively
over a sequence of restricted strategy spaces that gets
larger until an €optimal stationary policy pair is ob
tained for any
E
>
0. This algorithm is convergent for
the games that have optimal strategies with a value
independent of the initial state.
1 Introduction
Twoperson zerosum communicating stochastic games
are studied under the average payoff criterion. Stochas
tic games are played sequentially. At each epoch, the
game visits one of finitely many states and the play
ers play a matrix game that depends on the current
state. In particular, after observing the current state
each player takes one of finitely many actions that are
available in that state. An instantaneous payoff made
by player I1 to player I and the transition probabilities
to the state visited in the next epoch are all determined
as
a function of the current state and the action pair
taken by the players.
At
every state, each player is
aware of his opponent's available actions, and payoff
amounts and transition probabilities corresponding to
the alternative action pairs. Depending on the payoff
criterion, the objective of player I is to maximize his re
ward whereas player I1 aims to minimize his loss. The
amount of payoff on which the players agree
is
called
the value of the game and the corresponding strate
gies are called the equilibrium strategies. The optimal
strategies of the players are such that a player can not
make his payoff better by a unilateral change in his
strategy. Optimal strategies are known to be located
in the class of behavior strategies
[l].
A subclass of
behavior strategies is called stationary strategies. Sta
tionary strategies depend only on the current state.
0780352505/99/$10.00
0
1999 IEEE 576
and
2.
M. Avgar
Dept. of Operations Planning and Control
Eindhoven University of Technology
Z.M.Avsar@tm.tue.nl
A
stationary strategy that assigns only one action at
each state is called a pure strategy. Just as in Markov
chains, games display ergodic structures. A stochas
tic game is called irreducible(unichain) if the Markov
chain generated under every pure strategy pair is ir
reducible(unichain). A stochastic game is called com
municating if for each origin destination pair of states
(i,j)
there exists at least one pure strategy pair that
makes destination
j
accessible from origin
i.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
This is the end of the preview. Sign up
to
access the rest of the document.
 Spring '09
 R.Srikant

Click to edit the document details