games[1] - Search in Games CPS 170 Ron Parr Why...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Search in Games CPS 170 Ron Parr Why Study Games? •  Many human ac<vi<es can be modeled as games –  Nego<a<ons –  Bidding –  TCP/IP –  Military confronta<ons –  Pursuit/Evasion •  Games are used to train the mind –  Human game ­playing, animal play ­figh<ng 1 Why Are Games Good for AI? •  Games typically have concise rules •  Well ­defined star<ng and end points •  Sensing and effec<ng are simplified –  Not true for sports games –  See robocup •  Games are fun! •  Downside: GeTng taken seriously (not) –  See robo search and rescue Some History of Games in AI •  Computer games have been around almost as long as computers (perhaps longer) –  Chess: Turing (and others) in the 1950s –  Checkers: Samuel, 1950s learning program •  •  •  •  Usually start with naïve op<mism Follow with naïve pessimism Simon: Predicted pomputer chess champ by 1967 Many, e.g., Kasparov, predicted that a computer would never be champion 2 Games Today •  Computers perform at champion level –  Backgammon, Checkers (solved), Chess, Othello •  Computers perform well –  Bridge, poker •  Computers s<ll do badly (but recent breakthroughs show promise) –  Go, Hex Simple Game Setup •  Most commonly, we study games that are: –  –  –  –  2 player Alterna<ng Zero ­sum Perfect informa<on •  Examples: Checkers, chess, backgammon •  Assump<ons can be relaxed at some expense •  Economics studies case where #of agents is very large –  Individual ac<ons don’t change the dynamics 3 Zero Sum Games •  Assign values to different outcomes •  Win = 1, Loss =  ­1 •  With zero sum games every gain comes at the other player’s expense •  Sum of both player’s scores must be 0 •  Are any games truly zero sum? Characterizing Games •  Two ­player alterna<ng move games are very much like search –  Ini<al state –  Successor func<on –  Terminal test –  Objec<ve func<on (heuris<c func<on) •  Unlike search –  Terminal states are ojen a large set –  Full search to terminal states usually impossible 4 Game Trees x o x o x o x o x o x x o x o x o x x o o Player 1 x o x o x o x x o x o x x o o x o x x o x o o x o x o x o o x Player 1 x o x o x Player 2 o x x o x o o x o x x o x o x x o o Game Trees (abstracted) Max nodes A1 A3 A2 Min nodes A11 A12 A21 A22 A31 A32 Terminal Nodes 5 Minimax •  Max player tries to maximize his return •  Min player tries to minimize his return •  This is op<mal for both (assuming zero sum) minimax(nmax ) = max s∈succesors( n) minimax( s) minimax(nmin ) = min s∈succesors( n ) minimax( s) € € Minimax Values Max nodes 3 3 Min nodes 12 2 3 2 2 4 15 2 6 Minimax Proper<es •  Minimax can be run depth first –  Time O(bm) –  Space O(bm) •  Assumes that opponent plays op<mally •  Based on a worst ­case analysis •  What if this is incorrect? Minimax in the Real World •  Search trees are too big •  Alterna<ng turns double depth of the search –  2 ply = 1 full turn •  Branching factors are too high –  Chess: 35 –  Go: 361 •  Full search from start to end never terminates in non ­trivial games 7 Evalua<on Func<ons •  Like heuris<c func<ons •  Try to es<mate value of a node without expanding all the way to termina<on •  Using evalua<on func<ons –  Do a depth ­limited search –  Treat evalua<on func<on as if it were terminal •  What’s wrong with this? •  How do you pick the depth? •  How do you manage your <me? •  Itera<ve deepening, quiescence Desiderata for Evalua<on Func<ons •  Would like to put the same ordering on nodes (even if values aren’t totally right) •  Is this a reasonable thing to ask for? •  What if you have a perfect evalua<on func<on? •  How are evalua<on func<ons made in prac<ce? –  Buckets –  Linear combina<ons •  Chess pieces (material) •  Board control (posi<onal, strategic) 8 Search Control Issues •  Horizon effects –  Something interes<ng is just beyond the horizon? –  How do you know? •  When to generate more nodes? •  If you selec<vely extend your fron<er, how do you decide where? •  If you have a fixed amount of total game <me, how do you allocate this? Pruning •  The most important search control method is figuring out which nodes you don’t need to expand •  Use the fact that we are doing a worst ­case analysis to our advantage –  Max player cuts off search when he knows min player can force a provably bad outcome –  Min player cuts of search when he knows max can force a provably good (for max) outcome 9 Alpha ­beta pruning Max nodes 3 3 Min nodes 12 2 3 2 2 4 15 2 How to prune •  We s<ll do (bounded) DFS •  Expand at least one path to the “bopom” •  If current node is max node, and min can force a lower value, then prune siblings •  If current node is min node, and max can force a higher value, then prune siblings 10 Max node pruning 2 Max nodes 2 4 4 Implemen<ng alpha ­beta max_value(state, alpha, beta) if cutoff(state) then return eval(state) for each s in successors(state) do alpha = max(alpha, min_value(s, alpha, beta)) if alpha >= beta the return beta end return alpha alpha=value of best alternative available to max player beta=value of best alterna<ve available to min player min_value(state, alpha, beta) if cutoff(state) then return eval(state) for each s in successors(state) do beta = min(beta, max_value(s, alpha, beta)) if beta <= alpha the return alpha end return beta 11 Amazing facts about alpha ­beta •  Empirically, alpha ­beta has the effect of reducing the branching factor by half for many problems •  Effec<vely doubles search horizon •  Alpha ­beta makes the difference between novice and expert computer players What About Probabili<es? Max nodes Chance nodes P=0.5 P=0.5 P=0.6 P=0.4 P=0.9 P=0.1 Min nodes 12 Expec<minimax •  n random outcomes per chance node •  O(bmnm) <me € € eminimax(nmax ) = max s∈succesors( n ) eminimax( s) eminimax(nmin ) = min s∈succesors( n ) eminimax( s) eminimax(nchance ) = ∑s∈succesors( n ) eminimax( s)p( s) € Expec<minimax is nasty •  High branching factor •  Randomness makes evalua<on fns difficult –  Hard to predict many steps into future –  Values tend to smear together –  Preserving order is not sufficient •  Pruning is problema<c –  Need to prune based upon bound on an expecta<on –  Need a priori bounds on the evalua<on func<on 13 Mul<player Games •  Things sort ­of generalize, but can get complicated •  Maintain vector of possible values for each player at each node •  Might assume that each player acts greedily, but what’s wrong with this? •  Correct treatment requires the full machinery of game theory Conclusions •  Game tree search is a special kind of search •  Rely heavily on heuris<c evalua<on func<ons •  Alpha ­beta is a big win •  Most successful players use alpha ­beta •  Final thought: Tradeoff between search effort and evalua<on func<on effort •  When is it beper to invest in your evalua<on func<on? 14 ...
View Full Document

This note was uploaded on 02/17/2012 for the course COMPSCI 170 taught by Professor Parr during the Spring '11 term at Duke.

Ask a homework question - tutors are online