SP11 cs188 lecture 8 -- MDPs 6PP

SP11 cs188 lecture 8 -- MDPs 6PP - Outline CS 188:...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
1 CS 188: Artificial Intelligence Spring 2011 Lecture 8: Games, MDPs 2/14/2010 Pieter Abbeel – UC Berkeley Many slides adapted from Dan Klein 1 Outline s Zero-sum deterministic two player games s Minimax s Evaluation functions for non-terminal states s Alpha-Beta pruning s Stochastic games s Single player: expectimax s Two player: expectiminimax s Non-zero sum s Markov decision processes (MDPs) 2 Minimax Example 3 12 8 5 2 3 2 14 4 1 Speeding Up Game Tree Search s Evaluation functions for non-terminal states s Pruning: not search parts of the tree s Alpha-Beta pruning does so without losing accuracy, O(b d ) b O(b d/2 ) 4 Pruning 5 3 12 8 2 14 5 2 Alpha-Beta Pruning s General configuration s We’re computing the MIN- VALUE at n s We’re looping over n ’s children s n ’s value estimate is dropping s a is the best value that MAX can get at any choice point along the current path s If n becomes worse than a , MAX will avoid it, so can stop considering n ’s other children s Define b similarly for MIN MAX MIN MAX MIN a n 7
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
2 Alpha-Beta Pruning Example 12 5 1 3 2 8 14 8 3 2 1 3 a is MAX’s best alternative here or above b is MIN’s best alternative here or above Alpha-Beta Pruning Example 12 5 1 3 2 8 14 8 3 2 1 3 a is MAX’s best alternative here or above b is MIN’s best alternative here or above a=- b=+ a=- b=+ a=- b=+ a=- b=3 a=- b=3 a=- b=3 a=- b=3 a=8 b=3 a=3 b=+ a=3 b=+ a=3 b=+ a=3 b=+ a=3 b=2 a=3 b=+ a=3 b=14 a=3 b=5 a=3 b=1 Starting a/b Raising a Lowering b Raising a Alpha-Beta Pseudocode b v Alpha-Beta Pruning Properties s This pruning has no effect on final result at the root s Values of intermediate nodes might be wrong! s Good child ordering improves effectiveness of pruning s Heuristic: order by evaluation function or based on previous search s With “perfect ordering”: s Time complexity drops to O(b m/2 ) s Doubles solvable depth! s Full search of, e.g. chess, is still hopeless… s This is a simple example of metareasoning (computing about what to compute) 11 Outline s Zero-sum deterministic two player games s Minimax s Evaluation functions for non-terminal states s Alpha-Beta pruning s Stochastic games s Single player: expectimax s Two player: expectiminimax s Non-zero sum s Markov decision processes (MDPs) 12 Expectimax Search Trees s What if we don’t know what the result of an action will be? E.g., s In solitaire, next card is unknown s In minesweeper, mine locations s In pacman, the ghosts act randomly s Can do expectimax search to maximize average score s Chance nodes, like min nodes, except the outcome is uncertain s Calculate expected utilities s Max nodes as in minimax search s Chance nodes take average
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 08/26/2011 for the course CS 188 taught by Professor Staff during the Spring '08 term at University of California, Berkeley.

Page1 / 7

SP11 cs188 lecture 8 -- MDPs 6PP - Outline CS 188:...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online