This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Lecture ??: 04/23/2007 Recall:
General comments / criticisms about ”classical” game theory. 1. Requires hyperrational players (over the years, people have tried a variety of approaches to bounded probability.
2. Games can have multiple Nash equilibria—how to pick one (or a few) as ”solution” to the game? Approach has been to refuse equilibria—identifying ”sketchy” ones and throwing those out.
Huge industry in game theory—just one example (of many)—R. Sellen’s trembling hand perfect
equilibrium. Saw in this example: Claim: (MN) is ”sketchy” —if column player were to play a mixed strategy (1e)N+eY—where e is
smallthen M is no longer a best reply for Player 1. This equilibrium (ie. (M, N)) fails Selter’s criterion—not tremblinghand perfect. Other refinements: 0 Proper equilibria (subset of tremblind hand perfect) (Myerism)
o In ”dynamic games” (eg. repeated or extensiveform games) have sequential equilibria
and subgameperfect equilibria.
3. ”Learning how to play” is not built in to classical game theory (cf. Nash’s thesis quote about
populations—tatonnment process, etc.) Especially relevant for la_rg§ games, wehre players need
to learn not only how to play but what the structure of game. Theory of repeated games is one approach to issues that (3) raises—also of independent interest. Idea
of repeated game: 0 Same ”base game” (standard oneshot strategicform game) 0 Play it repeatedly with same opponents (finitely or infinitely many time or a random number of
times) Example: IPD (iterated prisoner's dilemma)
Nb: in books the discussion of IPD is n_ot about ”evolutionary game theory”—it’s about using evolution
computation to find ”decent” strategies for IPD. Consider twicerepeated PD:
Base game: IiI HyperRational Player:
Thinks: ”I can’t affect the past, only what’s to come. On second play, therefore no matter what
happened on first play, ’ Play D.” Hence, both players play D on round 2. So, on round 1, they’re facing this oneshot game: I” [because each player knows he’ ll get 1 on round 2] This is another PD—and both will play D—hence only NE for 2time PD is ”both defect on both stages.”
Reasoning technique above is called backward induction. E For any finite, fixed # of stages, only Nash eq. for IPD is ”everybody always defects.”
Observation: as number of stages grows, the number of strategies grows astronomically Consider 2 stages: each player has these strategies: 0 Choice of action at stage 1
o A mapping [there are 24 of these] from possible outcomes of stage 1 into choices for stage 2.
o 25 total strategies for each player. More stages:
0 huge # of options
0 Higher ”rationality requirements” on players
0 Backwardinduction arguments are ”questionable” from a practicality standpoint (similar to
curse of dimensionality in dynamic programming) Turns out: for infinitelyrepeated PD (with appropriately defined payoffs,) there exist Nash equilibria
where in, if both players adhere to the equilibrium strategies, the sequence of outcomes that results is
”everybody cooperates in every round (ie. play C always.)” Two Examples:
1. Both players play 0* defined as follows: Play C; if other player plays C, continue in Phase 1. If other player plays D, enter Phase 2.
m Play D forever—this is ”Unrelenting punishment” equilibrium.
*If both players follow this strategy, game stays in Phase 1 forever. 2. Both players play 0,; as follows: Phase 2 Play D for one step, then return to 1. *If both players adhere, then never enter Phase 2. Why are these Nash equilibria? Assume timeaveraged payoffs, ie. 1
= lim— ( ) _) Look at (1): If other player plays 0*, and  play 0*, then I get payoff 3. If I deviate from 0* (ie, if at some point I play
D,) then my payoff is 1 (everything before me deviations gets washed out.) Bottom line: 0* is a best
reply to 0* => (0*, 0*) is NE. Look at (2): If other player plays op*, and so do I, my payoff is 3. If I deviate finitely often from 0;, I’ll still get 3. The
only way I could conceivably increase my payoff is to play D infinitely often. But every time  play D, the
game looks like this: _
_CCDC [5+1 max I could get for first two—same as I could have gotten staying with C.] Hence, can’t improve
longterm average by deviating => (0;, op*) is NE. Subgameperfect: at any stage, what players play thereafter is a Nash equilibrium for the ”subgame”
beginning at that stage. Comment: 0* in (1) is also subgameperfect Back to itemized list—recall that repeated games theory was one approach to the ”learning” issue.
However, it’s unsatisfying—backward induction, especially. Game theorists hunger(ed) for an approach
to games involving ”forward induction.” So, ”rational game theory”developed along these lines over the
years... ...
View
Full Document
 Spring '07
 DELCHAMPS
 Algorithms

Click to edit the document details