{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

Lecture12Notes

# Lecture12Notes - C&O 355 Mathematical Programming Fall 2010...

This preview shows pages 1–2. Sign up to view the full content.

C&O 355: Mathematical Programming Fall 2010 Lecture 12 Notes Nicholas Harvey http://www.math.uwaterloo.ca/~harvey/ 1 Zero-Sum Games Let M be any m × n real matrix, which we use as the payoff matrix for a two-player, zero-sum game. Von Neumann’s theorem states that max x min y x T My = min y max x x T My, where the max and min are over distributions x R m and y R n . Recall that “distribution” means that x 0, m i =1 x i = 1. Consequently, there exist distributions x * R m and y * R n such that max x min y x T My = x * T My * = min y max x x T My. (1) This quantity is called the value of the game and is denoted by v . Observation 1. Note that for any fixed x , we have min y x T My v . (In particular, x T My * v .) Similarly, for any particular y , we have max x x T My v . (In particular, x * T My v .) Observation 2. For any fixed x , there is a y achieving min y x T My such that y has only one non-zero coordinate (which must have value 1). Such a y corresponds to Bob choosing a single action, rather than a randomized choice of actions. Fix any desired error δ (0 , 1). We will give a method to find distributions ˆ x and ˆ y such that | min y ˆ x T My - v | ≤ δ and | max x x T M ˆ y - v | ≤ δ. (2) Due to Observation 1, we see that (2) is equivalent to min y ˆ x T My v - δ and max x x T M ˆ y v + δ. (3) In other words, if Alice plays according to distribution ˆ x , then no matter how Bob plays, she is guar- anteed a payoff of at least v - δ . Conversely, if Bob plays according to distribution ˆ y , then no matter how Alice plays, he is guaranteed to pay her at most v + δ . 2 The Multiplicative Weights Update Method By scaling, we may assume that M i,j [ - 1 , 1] for every i, j . Set = δ/ 3. Alice will assign some “weights” to each of her actions, then simulate the game by herself for T = (ln m ) / rounds, modifying her weights between each round. These weights are essentially a probability distribution, except they

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}