This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Nau: Game Theory 1 Updated 11/2/11 CMSC 498T, Game Theory 6b. Stochastic Games Dana Nau University of Maryland Nau: Game Theory 2 Updated 11/2/11 Motivation We’ve assumed that actions have deterministic outcomes Ø But some actions may have stochastic coutcomes Model for singleagent environments: a stochastic system Ø also called a Markov Decision Process (MDP) A 5tuple = ( Q, A, P, R, C ) Ø Q = finite set of states Ø A = finite set of actions Ø P ( q, a, q ʹ ) = Pr[go to state q ʹ if we execute a in state q ] • ∑ q ʹ∈ Q P ( q, a, q ʹ ) = 1 Ø R ( q ) = reward for being in state q Ø C ( q,a ) = cost of performing action a in state q Ø Still get same expressive power if we use only one of R and C There are many different algorithms for solving MDPs Nau: Game Theory 3 Updated 11/2/11 If we know the opponents’ strategies, we can model a game as an MDP in which the cost of each action is 0 Example: infinitely repeated Prisoner’s Dilemma: Suppose you’re agent 1, and suppose 2’s strategy is Ø Cooperate if you cooperated on the last move Ø Defect with probability 0.9 if you defected on the last move For the name of each node, use the action profile that led to it The usual Prisoners Dilemma rewards: R(CC) = 3; R(CD) = 0; R(DC) = 5; R(DD) = 1 Modeling Games as MDPs CC DC CD DD C D C D C D D 1.0 0.1 0.9 1.0 0.9 0.1 1.0 1.0 0.1 0.9 0.1 0.9 C Nau: Game Theory 4 Updated 11/2/11 Markov Games A stochastic (or Markov ) game is a 5tuple ( Q, N, A, P, R ) Ø Q is a set of states Ø N = {1, …, n } is a set of agents, Ø A = A 1 ×· · ·× A n , where each A i is the set of possible actions for agent i Ø P ( q, a 1 , …, a n , q ʹ ) = probability that the action profile ( a 1 , …, a n ) will take us from state q to state q' Ø R = R 1 ×· · ·× R n , where R i ( q, a 1 , …, a n ) is i ’s reward function At each state q , there’s a normalform game and a stochastic statetransition Ø Each agent i chooses an action a i , and the agents’ utilities are determined by their reward functions and the action profile Ø The next state, q' , depends stochastically on the action profile Nau: Game Theory 5 Updated 11/2/11 The infinitely repeated Prisoner’s Dilemma is a stochastic game Ø Q = { q } Ø N = {1, 2} Ø A 1 = A 2 = { C, D } Ø P ( q,a 1 ,a 2 ,q ) = 1 for all a 1 and a 2 Ø R 1 ( q,C,C ) = 3; R 2 ( q,C,C ) = 3 Ø R 1 ( q,C,D ) = 0; R 2 ( q,C,D ) = 5 Ø R 1 ( q,D,C ) = 5; R 2 ( q,D,C ) = 0 Ø R 1 ( q,D,D ) = 1; R 2 ( q,D,D ) = 1 In general, Ø An MDP is a stochastic game with only 1 player Ø A repeated game is a stochastic game with only 1 state Example q ( D,C ) ( D,D ) ( C,C ) ( C,D ) Nau: Game Theory 6 Updated 11/2/11 Histories Before, a history was just a sequence of actions Ø Didn’t need to include the states, because each sequence of actions led to just one possible state But now we have Ø action profiles rather than individual actions Ø several possible outcomes for each action profile...
View
Full
Document
 Fall '11
 staff

Click to edit the document details