lecture6b - Nau: Game Theory 1 Updated 11/2/11 CMSC 498T,...

Info iconThis preview shows pages 1–7. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Nau: Game Theory 1 Updated 11/2/11 CMSC 498T, Game Theory 6b. Stochastic Games Dana Nau University of Maryland Nau: Game Theory 2 Updated 11/2/11 Motivation Weve assumed that actions have deterministic outcomes But some actions may have stochastic coutcomes Model for single-agent environments: a stochastic system also called a Markov Decision Process (MDP) A 5-tuple = ( Q, A, P, R, C ) Q = finite set of states A = finite set of actions P ( q, a, q ) = Pr[go to state q if we execute a in state q ] q Q P ( q, a, q ) = 1 R ( q ) = reward for being in state q C ( q,a ) = cost of performing action a in state q Still get same expressive power if we use only one of R and C There are many different algorithms for solving MDPs Nau: Game Theory 3 Updated 11/2/11 If we know the opponents strategies, we can model a game as an MDP in which the cost of each action is 0 Example: infinitely repeated Prisoners Dilemma: Suppose youre agent 1, and suppose 2s strategy is Cooperate if you cooperated on the last move Defect with probability 0.9 if you defected on the last move For the name of each node, use the action profile that led to it The usual Prisoners Dilemma rewards: R(CC) = 3; R(CD) = 0; R(DC) = 5; R(DD) = 1 Modeling Games as MDPs CC DC CD DD C D C D C D D 1.0 0.1 0.9 1.0 0.9 0.1 1.0 1.0 0.1 0.9 0.1 0.9 C Nau: Game Theory 4 Updated 11/2/11 Markov Games A stochastic (or Markov ) game is a 5-tuple ( Q, N, A, P, R ) Q is a set of states N = {1, , n } is a set of agents, A = A 1 A n , where each A i is the set of possible actions for agent i P ( q, a 1 , , a n , q ) = probability that the action profile ( a 1 , , a n ) will take us from state q to state q' R = R 1 R n , where R i ( q, a 1 , , a n ) is i s reward function At each state q , theres a normal-form game and a stochastic state-transition Each agent i chooses an action a i , and the agents utilities are determined by their reward functions and the action profile The next state, q' , depends stochastically on the action profile Nau: Game Theory 5 Updated 11/2/11 The infinitely repeated Prisoners Dilemma is a stochastic game Q = { q } N = {1, 2} A 1 = A 2 = { C, D } P ( q,a 1 ,a 2 ,q ) = 1 for all a 1 and a 2 R 1 ( q,C,C ) = 3; R 2 ( q,C,C ) = 3 R 1 ( q,C,D ) = 0; R 2 ( q,C,D ) = 5 R 1 ( q,D,C ) = 5; R 2 ( q,D,C ) = 0 R 1 ( q,D,D ) = 1; R 2 ( q,D,D ) = 1 In general, An MDP is a stochastic game with only 1 player A repeated game is a stochastic game with only 1 state Example q ( D,C ) ( D,D ) ( C,C ) ( C,D ) Nau: Game Theory 6 Updated 11/2/11 Histories Before, a history was just a sequence of actions Didnt need to include the states, because each sequence of actions led to just one possible state But now we have action profiles rather than individual actions several possible outcomes for each action profile...
View Full Document

Page1 / 29

lecture6b - Nau: Game Theory 1 Updated 11/2/11 CMSC 498T,...

This preview shows document pages 1 - 7. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online