This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Lecture ??: 04/23/2007 Recall:
General comments / criticisms about ”classical” game theory. 1. Requires hyper-rational players (over the years, people have tried a variety of approaches to bounded probability.
2. Games can have multiple Nash equilibria—how to pick one (or a few) as ”solution” to the game? Approach has been to refuse equilibria—identifying ”sketchy” ones and throwing those out.
Huge industry in game theory—just one example (of many)—R. Sellen’s trembling hand perfect
equilibrium. Saw in this example: Claim: (MN) is ”sketchy” —if column player were to play a mixed strategy (1-e)N+eY—where e is
small-then M is no longer a best reply for Player 1. This equilibrium (ie. (M, N)) fails Selter’s criterion—not trembling-hand perfect. Other refinements: 0 Proper equilibria (subset of tremblind hand perfect) (Myerism)
o In ”dynamic games” (eg. repeated or extensive-form games) have sequential equilibria
and subgame-perfect equilibria.
3. ”Learning how to play” is not built in to classical game theory (cf. Nash’s thesis quote about
populations—tatonnment process, etc.) Especially relevant for la_rg§ games, wehre players need
to learn not only how to play but what the structure of game. Theory of repeated games is one approach to issues that (3) raises—also of independent interest. Idea
of repeated game: 0 Same ”base game” (standard one-shot strategic-form game) 0 Play it repeatedly with same opponents (finitely or infinitely many time or a random number of
times) Example: IPD (iterated prisoner's dilemma)
Nb: in books the discussion of IPD is n_ot about ”evolutionary game theory”—it’s about using evolution
computation to find ”decent” strategies for IPD. Consider twice-repeated PD:
Base game: IiI-- Hyper-Rational Player:
Thinks: ”I can’t affect the past, only what’s to come. On second play, therefore no matter what
happened on first play, |’|| Play D.” Hence, both players play D on round 2. So, on round 1, they’re facing this one-shot game: I” [because each player knows he’ ll get 1 on round 2] This is another PD—and both will play D—hence only NE for 2-time PD is ”both defect on both stages.”
Reasoning technique above is called backward induction. E For any finite, fixed # of stages, only Nash eq. for IPD is ”everybody always defects.”
Observation: as number of stages grows, the number of strategies grows astronomically Consider 2 stages: each player has these strategies: 0 Choice of action at stage 1
o A mapping [there are 24 of these] from possible outcomes of stage 1 into choices for stage 2.
o 25 total strategies for each player. More stages:
0 huge # of options
0 Higher ”rationality requirements” on players
0 Backward-induction arguments are ”questionable” from a practicality standpoint (similar to
curse of dimensionality in dynamic programming) Turns out: for infinitely-repeated PD (with appropriately defined payoffs,) there exist Nash equilibria
where in, if both players adhere to the equilibrium strategies, the sequence of outcomes that results is
”everybody cooperates in every round (ie. play C always.)” Two Examples:
1. Both players play 0* defined as follows: Play C; if other player plays C, continue in Phase 1. If other player plays D, enter Phase 2.
m Play D forever—this is ”Unrelenting punishment” equilibrium.
*If both players follow this strategy, game stays in Phase 1 forever. 2. Both players play 0,; as follows: Phase 2 Play D for one step, then return to 1. *If both players adhere, then never enter Phase 2. Why are these Nash equilibria? Assume time-averaged payoffs, ie. 1
= lim— ( ) _) Look at (1): If other player plays 0*, and | play 0*, then I get payoff 3. If I deviate from 0* (ie, if at some point I play
D,) then my payoff is 1 (everything before me deviations gets washed out.) Bottom line: 0* is a best
reply to 0* => (0*, 0*) is NE. Look at (2): If other player plays op*, and so do I, my payoff is 3. If I deviate finitely often from 0;, I’ll still get 3. The
only way I could conceivably increase my payoff is to play D infinitely often. But every time | play D, the
game looks like this: _
_CCDC [5+1 max I could get for first two—same as I could have gotten staying with C.] Hence, can’t improve
long-term average by deviating => (0;, op*) is NE. Subgame-perfect: at any stage, what players play thereafter is a Nash equilibrium for the ”subgame”
beginning at that stage. Comment: 0* in (1) is also subgame-perfect Back to itemized list—recall that repeated games theory was one approach to the ”learning” issue.
However, it’s unsatisfying—backward induction, especially. Game theorists hunger(ed) for an approach
to games involving ”forward induction.” So, ”rational game theory”developed along these lines over the
View Full Document
- Spring '07