Lecturea - Lecture Recall General comments criticisms about...

Info icon This preview shows pages 1–3. Sign up to view the full content.

Image of page 1

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

Image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Lecture ??: 04/23/2007 Recall: General comments / criticisms about ”classical” game theory. 1. Requires hyper-rational players (over the years, people have tried a variety of approaches to bounded probability. 2. Games can have multiple Nash equilibria—how to pick one (or a few) as ”solution” to the game? Approach has been to refuse equilibria—identifying ”sketchy” ones and throwing those out. Huge industry in game theory—just one example (of many)—R. Sellen’s trembling hand perfect equilibrium. Saw in this example: Claim: (MN) is ”sketchy” —if column player were to play a mixed strategy (1-e)N+eY—where e is small-then M is no longer a best reply for Player 1. This equilibrium (ie. (M, N)) fails Selter’s criterion—not trembling-hand perfect. Other refinements: 0 Proper equilibria (subset of tremblind hand perfect) (Myerism) o In ”dynamic games” (eg. repeated or extensive-form games) have sequential equilibria and subgame-perfect equilibria. 3. ”Learning how to play” is not built in to classical game theory (cf. Nash’s thesis quote about populations—tatonnment process, etc.) Especially relevant for la_rg§ games, wehre players need to learn not only how to play but what the structure of game. Theory of repeated games is one approach to issues that (3) raises—also of independent interest. Idea of repeated game: 0 Same ”base game” (standard one-shot strategic-form game) 0 Play it repeatedly with same opponents (finitely or infinitely many time or a random number of times) Example: IPD (iterated prisoner's dilemma) Nb: in books the discussion of IPD is n_ot about ”evolutionary game theory”—it’s about using evolution computation to find ”decent” strategies for IPD. Consider twice-repeated PD: Base game: IiI-- Hyper-Rational Player: Thinks: ”I can’t affect the past, only what’s to come. On second play, therefore no matter what happened on first play, |’|| Play D.” Hence, both players play D on round 2. So, on round 1, they’re facing this one-shot game: I” [because each player knows he’ ll get 1 on round 2] This is another PD—and both will play D—hence only NE for 2-time PD is ”both defect on both stages.” Reasoning technique above is called backward induction. E For any finite, fixed # of stages, only Nash eq. for IPD is ”everybody always defects.” Observation: as number of stages grows, the number of strategies grows astronomically Consider 2 stages: each player has these strategies: 0 Choice of action at stage 1 o A mapping [there are 24 of these] from possible outcomes of stage 1 into choices for stage 2. o 25 total strategies for each player. More stages: 0 huge # of options 0 Higher ”rationality requirements” on players 0 Backward-induction arguments are ”questionable” from a practicality standpoint (similar to curse of dimensionality in dynamic programming) Turns out: for infinitely-repeated PD (with appropriately defined payoffs,) there exist Nash equilibria where in, if both players adhere to the equilibrium strategies, the sequence of outcomes that results is ”everybody cooperates in every round (ie. play C always.)” Two Examples: 1. Both players play 0* defined as follows: Play C; if other player plays C, continue in Phase 1. If other player plays D, enter Phase 2. m Play D forever—this is ”Unrelenting punishment” equilibrium. *If both players follow this strategy, game stays in Phase 1 forever. 2. Both players play 0,; as follows: Phase 2 Play D for one step, then return to 1. *If both players adhere, then never enter Phase 2. Why are these Nash equilibria? Assume time-averaged payoffs, ie. 1 = lim— ( ) _) Look at (1): If other player plays 0*, and | play 0*, then I get payoff 3. If I deviate from 0* (ie, if at some point I play D,) then my payoff is 1 (everything before me deviations gets washed out.) Bottom line: 0* is a best reply to 0* => (0*, 0*) is NE. Look at (2): If other player plays op*, and so do I, my payoff is 3. If I deviate finitely often from 0;, I’ll still get 3. The only way I could conceivably increase my payoff is to play D infinitely often. But every time | play D, the game looks like this: _ _CCDC [5+1 max I could get for first two—same as I could have gotten staying with C.] Hence, can’t improve long-term average by deviating => (0;, op*) is NE. Subgame-perfect: at any stage, what players play thereafter is a Nash equilibrium for the ”subgame” beginning at that stage. Comment: 0* in (1) is also subgame-perfect Back to itemized list—recall that repeated games theory was one approach to the ”learning” issue. However, it’s unsatisfying—backward induction, especially. Game theorists hunger(ed) for an approach to games involving ”forward induction.” So, ”rational game theory”developed along these lines over the years... ...
View Full Document

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern