{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

lecture6b

# lecture6b - CMSC 498T Game Theory 6b Stochastic Games Dana...

This preview shows pages 1–7. Sign up to view the full content.

Nau: Game Theory 1 Updated 11/2/11 CMSC 498T, Game Theory 6b. Stochastic Games Dana Nau University of Maryland

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Nau: Game Theory 2 Updated 11/2/11 Motivation We’ve assumed that actions have deterministic outcomes Øඏ But some actions may have stochastic coutcomes Model for single -agent environments: a stochastic system Øඏ also called a Markov Decision Process (MDP) A 5-tuple = ( Q, A, P, R, C ) Øඏ Q = finite set of states Øඏ A = finite set of actions Øඏ P ( q, a, q ʹஒ ) = Pr[go to state q ʹஒ if we execute a in state q ] q ʹஒ∈ Q P ( q, a, q ʹஒ ) = 1 Øඏ R ( q ) = reward for being in state q Øඏ C ( q,a ) = cost of performing action a in state q Øඏ Still get same expressive power if we use only one of R and C There are many different algorithms for solving MDPs
Nau: Game Theory 3 Updated 11/2/11 If we know the opponents’ strategies, we can model a game as an MDP in which the cost of each action is 0 Example: infinitely repeated Prisoner’s Dilemma: Suppose you’re agent 1, and suppose 2’s strategy is Øඏ Cooperate if you cooperated on the last move Øඏ Defect with probability 0.9 if you defected on the last move For the name of each node, use the action profile that led to it The usual Prisoners Dilemma rewards: R(CC) = 3; R(CD) = 0; R(DC) = 5; R(DD) = 1 Modeling Games as MDPs CC DC CD DD C D C D C D D 1.0 0.1 0.9 1.0 0.9 0.1 1.0 1.0 0.1 0.9 0.1 0.9 C

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Nau: Game Theory 4 Updated 11/2/11 Markov Games A stochastic (or Markov ) game is a 5-tuple ( Q, N, A, P, R ) Øඏ Q is a set of states Øඏ N = {1, …, n } is a set of agents, Øඏ A = A 1 ×· · ·× A n , where each A i is the set of possible actions for agent i Øඏ P ( q, a 1 , …, a n , q ʹஒ ) = probability that the action profile ( a 1 , …, a n ) will take us from state q to state q' Øඏ R = R 1 ×· · ·× R n , where R i ( q, a 1 , …, a n ) is i ’s reward function At each state q , there’s a normal-form game and a stochastic state-transition Øඏ Each agent i chooses an action a i , and the agents’ utilities are determined by their reward functions and the action profile Øඏ The next state, q' , depends stochastically on the action profile
Nau: Game Theory 5 Updated 11/2/11 The infinitely repeated Prisoner’s Dilemma is a stochastic game Øඏ Q = { q } Øඏ N = {1, 2} Øඏ A 1 = A 2 = { C, D } Øඏ P ( q,a 1 ,a 2 ,q ) = 1 for all a 1 and a 2 Øඏ R 1 ( q,C,C ) = 3; R 2 ( q,C,C ) = 3 Øඏ R 1 ( q,C,D ) = 0; R 2 ( q,C,D ) = 5 Øඏ R 1 ( q,D,C ) = 5; R 2 ( q,D,C ) = 0 Øඏ R 1 ( q,D,D ) = 1; R 2 ( q,D,D ) = 1 In general, Øඏ An MDP is a stochastic game with only 1 player Øඏ A repeated game is a stochastic game with only 1 state Example q ( D,C ) ( D,D ) ( C,C ) ( C,D )

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Nau: Game Theory 6 Updated 11/2/11 Histories Before, a history was just a sequence of actions Øඏ Didn’t need to include the states, because each sequence of actions led to just one possible state But now we have Øඏ action profiles rather than individual actions Øඏ several possible outcomes for each action profile Thus a history is a sequence h t = ( q 0 , a 0 , q 1 , a 1 , …, a t 1 , q t ), Øඏ q k is the state at stage k Øඏ a k is the action profile at stage k Øඏ t is the number of stages in the history
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### Page1 / 29

lecture6b - CMSC 498T Game Theory 6b Stochastic Games Dana...

This preview shows document pages 1 - 7. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online