This preview shows page 1. Sign up to view the full content.
Unformatted text preview: CS 61B: Lecture 18
Friday, October 8, 2010 one, the computer is guaranteed a win, which is obtained by choosing the second of the three possible moves.
The following pseudocode computes a grid’s score and the best
move from that grid (which determines the grid’s score). A Best
object holds a record of the best move and its score. Game tree search
How could we design a program that plays Tic Tac Toe? The standard technique searches for the best moves by using a game tree,
which looks much like a family tree. A game tree is not a data
structure; it is the structure of a sequence of recursive method calls.
The ancestor at the top of the tree is the current grid, accompanied by a record of whose turn it is. The children of the ancestor
form the set of possible grids that result from considering each legal
move. These children have children of their own, which represent
each of the opponent’s possible countermoves. public class Grid {
public Best chooseMove(boolean side) {
Best myBest = new Best();
// My best move
Best reply;
// Opponent’s best reply
if (the current Grid is full or has a win) {
return a Best with Grid’s score, no move;
}
if (side == COMPUTER) {
myBest.score = 2;
} else {
myBest.score = 2;
}
for (each legal move m) {
perform move m;
// Modifies "this" Grid
reply = chooseMove(! side);
undo move m;
// Restores "this" Grid
if (((side == COMPUTER) &&
(reply.score > myBest.Score)) 
((side == HUMAN) &&
(reply.score < myBest.Score))) {
myBest.move = m;
myBest.score = reply.score;
}
}
return myBest; 1
Computer
Move MAX 1111111111111111111111111111
0000000000000000000000000000
1
1
1
1111111111111111111111111111
0000000000000000000000000000
1111111111111111111111111111
0000000000000000000000000000
1111111111111111111111111111
0000000000000000000000000000
Human
1111111111111111111111111111
0000000000000000000000000000
MIN
1111111111111111111111111111
0000000000000000000000000000
Move
1111111111111111111111111111
0000000000000000000000000000
1111111111111111111111111111
0000000000000000000000000000
1111111111111111111111111111
0000000000000000000000000000
1
1
1
1
} Computer
Move } MAX
1 Why is myBest.score initially set to −2 or 2? By setting
the initial score out of range, we ensure that at least one legal move
will be assigned to myBest. It is a simple way, albeit not the most
transparent way, to make sure that chooseMove always accepts
the ﬁrst move investigated.
Each grid in the game tree at left represents one invocation of
chooseMove. The children of each grid represent recursive calls,
which are executed in order from left to right. When any particular
grid is invoked, that grid and its ancestors (parent, grandparent,
etc.) are stack frames on the program stack. 1 If we assume that both opponents are inﬁnitely intelligent and
will always choose the best move, then we can determine the computer’s best move using the game tree. Each legal grid (coupled
with information about whose turn it is) is assigned a numerical
score that indicates how optimistic we are about winning. For concreteness, give a grid a score of 1 if the computer is guaranteed
a win (assuming it plays perfectly), a score of −1 if the computer’s opponent is guaranteed a win, and a score of 0 if perfect
players will draw. (It’s better to give these constants names like
COMPUTER_WIN, HUMAN_WIN, and DRAW, but these names are
too long for the ﬁgures herein.)
How do we score a given grid? It’s easy if the game is over.
Suppose the computer is playing X. If there are three X’s in a row,
assign the grid a 1; for three collinear O’s, assign it a −1. If the
grid has no empty squares left, and nobody has won, assign it a 0.
The score of any other grid is computed by a minimax algorithm. We consider each possible move, and determine the child
grid that each move creates. We assign a score to each child grid by
calling the minimax algorithm recursively. Then we assign a score
to the current (parent) grid. If it’s the computer’s turn, we choose
the move that yields the maximum score, and assign the same score
to the current grid. If it’s the opponent’s turn, we assume that the
opponent plays perfectly. So we choose the move that yields the
minimum score, and assign that score to the current grid.
Minimax is recursive, and the ﬁgure above illustrates the recursive method calls that minimax executes when evaluating several
grids’ scores. Again, no tree data structure is created! At any one
point in time, the program stack represents one path down the tree
(but the root of the tree is at the bottom of the stack). Each grid’s
score is printed to its left. Because the toplevel grid has a score of Simple pruning
If, at any grid in the game tree, a player discovers a guaranteed
winning move, there’s no reason to continue to search for a better
move. Hence, we can save time by pruning away some recursive
calls. The previous example now looks like this. 1
Computer
Move MAX
No need to continue
−1
1
1111111111111111111111111111
0000000000000000000000000000 1111111111111111111111111111
0000000000000000000000000000
1111111111111111111111111111
0000000000000000000000000000
Human
1111111111111111111111111111
0000000000000000000000000000
MIN
1111111111111111111111111111
0000000000000000000000000000
Move
1111111111111111111111111111
0000000000000000000000000000
No need to
1111111111111111111111111111
0000000000000000000000000000
1111111111111111111111111111
0000000000000000000000000000
continue
−1 1 Values of [ α,β ] when
chooseMove is called. Computer
Move [−2,2]<−2,0> Values of < α,β > when
chooseMove ends. [−2,2]<0,2> 0
B A C [0, 2] <0,0> MAX
[0, 2] <0,0> <0
<0
0
1111111111111111111111111111111111111111111111
0000000000000000000000000000000000000000000000
1111111111111111111111111111111111111111111111
0000000000000000000000000000000000000000000000
1111111111111111111111111111111111111111111111
0000000000000000000000000000000000000000000000
MIN
Human
1111111111111111111111111111111111111111111111
0000000000000000000000000000000000000000000000
X
[0, 2] <0,2>
[0, 0]
Move [−2,2]<0,2> [−2,0]<0,0> [0, 2] <0,2> [0, 0]
1111111111111111111111111111111111111111111111
0000000000000000000000000000000000000000000000
1111111111111111111111111111111111111111111111
0000000000000000000000000000000000000000000000
No need to
No need to
1111111111111111111111111111111111111111111111
0000000000000000000000000000000000000000000000
1111111111111111111111111111111111111111111111
0000000000000000000000000000000000000000000000
0
0
0
continue 0
continue
MAX Computer
Move [−2,2]
0 [−2,0] 0 [0,2] [0, 2] 0 0 Alphabeta pruning public Best chooseMove(boolean side,
int alpha, int beta) {
Best myBest = new Best();
// My best move
Best reply;
// Opponent’s best reply A more aggressive pruning technique (which subsumes simple pruning) is based on the following observation. (See the ﬁgure above.)
Suppose the computer has discovered a move (A) that guarantees
a draw, and is investigating an alternative move (B). The computer
begins to consider all the moves its opponent could make from grid
B. It discovers that if the opponent makes move X in response, the
opponent can force a draw.
It would be a waste of the computer’s time to continue investigating moves the opponent could make from grid B. Since the
opponent can, at the very least, force a draw, move B is no better
for the computer than move A—though it might be worse. The
computer should simply go on and consider move C.
To turn this insight into an algorithm, we pass two additional
parameters to the chooseMove() method: α and β . The parameter α is a score that the computer knows with certainty it can
achieve; for instance, if α = 0, then the computer knows it can
force a draw, and is only interested in searching for moves guaranteed to do better. Conversely, β is a guarantee that the opponent
can achieve a score of β or lower. For any grid, we maintain values
of α and β based on our current knowledge of the best moves discovered thus far. If β becomes equal to or less than α, then further
investigation of the current grid is useless.
In the ﬁgure above, for instance, grid A has a score of 0. The
top grid is a MAX grid, so the computer cannot get lower than zero
(worse than a draw), and it sets α = 0 at the topmost grid and uses
the parameters [α, β ] = [0, 2] to begin investigating grid B.
Minimax computes that Grid X has a score of 0. Because the
computer recognizes that its opponent can force a draw from grid
B (which is a “MIN” grid), it sets β = 0 for grid B. Because the
parameters [α, β ] = [0, 0] have met in the middle, there is no point
in investigating other children of grid B. We never compute the exact score of B, but we know it is less than or equal to zero (because
B is a “MIN” grid). Hence, move B cannot be better than move A,
so we go on to move C.
After the computer considers move C’s ﬁrst child, C’s remaining child is likewise pruned.
The following pseudocode formalizes game tree search with
alphabeta pruning. Note that α only changes during a computer
(MAX) move, and β only changes during an opponent (MIN)
move. The toplevel chooseMove invocation should be called with
[α, β ] = [−2, 2] to make sure that some move is always selected. if (the current Grid is full or has a win) {
return a Best with the Grid’s score, no move;
}
if (side == COMPUTER) {
myBest.score = alpha;
} else {
myBest.score = beta;
}
for (each legal move m) {
perform move m;
// Modifies "this" Grid
reply = chooseMove(! side, alpha, beta);
undo move m;
// Restores "this" Grid
if ((side == COMPUTER) &&
(reply.score > myBest.Score)) {
myBest.move = m;
myBest.score = reply.score;
alpha = reply.score;
} else if ((side == HUMAN) &&
(reply.score < myBest.Score)) {
myBest.move = m;
myBest.score = reply.score;
beta = reply.score;
}
if (alpha >= beta) { return myBest; }
}
return myBest;
} Combinatorially huge game trees
Game trees grow exponentially with tree depth. Even with every
technique at our disposal, we cannot hope to explore the entire tree
for a game of chess. Hence, game tree search typically limits its
recursion to a speciﬁed depth. Positions at the maximum depth are
evaluated not by recursive search, but by less accurate heuristics
called evaluation functions. Evaluation functions compute numerical estimates of the computer’s optimism. These scores are not just
−1, 0, or 1, but can take on a continuous range between (for example) −1.0 and 1.0. Even with this inﬁnitevalued (rather than threevalued) scoring system, alphabeta search still works correctly; in
fact, it is at its best in such circumstances.
2 ...
View
Full
Document
This note was uploaded on 01/10/2012 for the course CS 61B taught by Professor Canny during the Fall '01 term at University of California, Berkeley.
 Fall '01
 Canny
 Data Structures

Click to edit the document details