This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: An Analysis of UCT in Multi-Player Games Nathan R. Sturtevant Department of Computing Science, University of Alberta, Edmonton, AB, Canada, T6G 2E8 [email protected] Abstract. The UCT algorithm has been exceedingly popular for Go, a two-player game, significantly increasing the playing strength of Go programs in a very short time. This paper provides an analysis of the UCT algorithm in multi-player games, showing that UCT, when run in a multi-player game, is computing a mixed-strategy equilibrium, as op- posed to max n , which computes a pure-strategy equilibrium. We analyze the performance of UCT in several known domains and show that it performs as well or better than existing algorithms. 1 Introduction Monte-Carlo methods have become popular in the game of Go over the last few years, and even more so with the introduction of the UCT algorithm . Go is probably the best-known two-player game in which computer players are still significantly weaker than humans. UCT works particularly well in Go for several reasons. First, in Go it is difficult to evaluate states in the middle of a game, but UCT only evaluates endgames states, which is relatively easy. Second, the game of Go converges for random play, meaning that it is not very difficult to get to an end-game state. Multi-player games are also difficult for computers to play well. First, it is more difficult to prune in multi-player games, meaning that normal search algo- rithms are less effective at obtaining deep lookahead. While alpha-beta pruning reduces the size of a game tree from O ( b d ) to O ( b d/ 2 ), the best techniques in multi-player games only reduce the size of the game tree to O ( b n- 1 n d ), where n is the number of players in the game . A second reason why multi-player games are difficult is because of opponent modeling. In two-player zero-sum games op- ponent modeling has never been shown to be necessary for high-quality play, while in multi-player games, opponent modeling is a necessity for robust play versus unknown opponents in some domains . As a result, it is worth investigating UCT to see how it performs in multi- player games. We first present a theoretical analysis, where we show that UCT computes a mixed-strategy equilibrium in multi-player games and discuss the im- plications of this. Then, we analyze UCT’s performance in a variety of domains, showing that it performs as well or better as the best previous approaches. An Analysis of UCT in Multi-Player Games 39 1 2 2 3 3 3 3 (3, 7, 2) (5, 3, 4) (2, 5, 5) (6, 5, 1) (a) (3, 7, 2) (b) (6, 5, 1) (6, 5, 1) Fig.1. A sample max n tree 2 Background The max n algorithm  was developed to play multi-player games. Max n searches a game tree and finds a strategy which is in equilibrium. That is, if all players were to use this strategy, no player could unilaterally gain by changing their strategy. In every perfect information extensive form game (e.g., tree search) there is guaranteed to be at least one pure-strategy equilibrium, that is, one...
View Full Document
This note was uploaded on 10/23/2011 for the course ENCS ENCS5 taught by Professor Abdelsalam during the Spring '10 term at Birzeit University.
- Spring '10