gomoku solution (1994)

gomoku solution (1994) - Searching for Solutions in Games...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Searching for Solutions in Games and Arti cial Intelligence ii Voor Petra en Cindy iv Searching for Solutions in Games and Arti cial Intelligence PROEFSCHRIFT ter verkrijging van de graad van doctor aan de Rijksuniversiteit Limburg te Maastricht, op gezag van de Rector Magni cus, Prof. dr. H. Philipsen, volgens het besluit van het College van Dekanen, in het openbaar te verdedigen op vrijdag 23 september 1994 om 14.00 uur door Louis Victor Allis Promotor: Prof. dr. H.J. van den Herik Leden van de beoordelingscommissie: Prof. dr. P.T.W. Hudson (voorzitter) Prof. dr. ir. J.L.G. Dietz Prof. dr. ir. W.L. van der Poel (Technische Universiteit Delft) Prof. dr. S.H. Tijs Dr. E. Wattel (Vrije Universiteit Amsterdam) CIP-GEGEVENS KONINKLIJKE BIBLIOTHEEK, DEN HAAG Allis, L. Victor Searching for Solutions in Games and Arti cial Intelligence / L. Victor Allis ill. by the author]. - S.l. : s.n.] (Wageningen : Ponsen & Looijen). - Ill. Thesis Maastricht. - With references. - With summary in Dutch ISBN 90-9007488-0 NUGI 855 Subject headings: arti cial intelligence / games / search. Cover design: Rob Ferwerda Contents List of Tables List of Figures Preface 1 Introduction 1.1 1.2 1.3 1.4 1.5 1.6 Speculations and AI : : : : : Identifying the obstacles : : : Uncovering hidden obstacles : The problem statement : : : Solving games : : : : : : : : : Thesis outline : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : xi xiii xv 1 1 3 4 5 7 9 2 Proof-Number Search 2.1 Knowledge representation and search : : : : 2.2 Pn-search: the algorithm : : : : : : : : : : : 2.2.1 The AND/OR-tree model : : : : : : 2.2.2 Main assumptions of pn-search : : : 2.2.3 Informal description of pn-search : : 2.2.4 The pn-search algorithm : : : : : : : 2.3 Enhancements : : : : : : : : : : : : : : : : 2.3.1 Reducing memory requirements : : : 2.3.2 Reducing execution time : : : : : : : 2.3.3 Applying domain-speci c knowledge 2.3.4 Transpositions : : : : : : : : : : : : 2.4 Results : : : : : : : : : : : : : : : : : : : : : 2.4.1 Introduction : : : : : : : : : : : : : 2.4.2 The rules of awari : : : : : : : : : : 13 13 17 17 18 22 25 29 29 31 32 39 43 43 43 vii viii 2.4.3 Tournament programs : : : : 2.4.4 The algorithms compared : : 2.4.5 Comparing the performances 2.4.6 Test positions : : : : : : : : : 2.4.7 Results : : : : : : : : : : : : 2.4.8 Conclusions : : : : : : : : : : 2.5 Related algorithms : : : : : : : : : : 2.5.1 Conspiracy-number search : : 2.5.2 SSS* : : : : : : : : : : : : : : 2.5.3 B* : : : : : : : : : : : : : : : 2.5.4 A* : : : : : : : : : : : : : : : CONTENTS : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 45 48 50 50 52 59 60 60 61 62 63 3 Dependency-Based Search 3.1 Introduction : : : : : : : : : : : : : 3.2 The double-letter puzzle : : : : : : 3.3 A formal framework for db-search : 3.3.1 States and operators : : : : 3.3.2 Paths : : : : : : : : : : : : 3.3.3 Key classes : : : : : : : : : 3.3.4 Traversing Uk : : : : : : : : 3.3.5 Summary : : : : : : : : : : 3.4 Informal description of db-search : 3.5 Algorithms : : : : : : : : : : : : : 3.6 Test results : : : : : : : : : : : : : 3.7 Applicability : : : : : : : : : : : : : : : : : : : : : : : : 65 65 68 69 69 72 75 77 84 85 88 90 93 4 Qubic 4.1 Background : : : : : : : : : : : : : : : : : : : : : 4.2 Rules and strategies : : : : : : : : : : : : : : : : 4.2.1 Rules : : : : : : : : : : : : : : : : : : : : 4.2.2 Threats and threat sequences : : : : : : : 4.2.3 Cube types and automorphisms : : : : : : 4.3 Applying db-search : : : : : : : : : : : : : : : : : 4.3.1 A single-agent search in qubic : : : : : : : 4.3.2 A db-search framework for qubic : : : : : 4.3.3 Qubic-speci c enhancements to db-search 4.4 Applying pn-search : : : : : : : : : : : : : : : : : 4.4.1 Qubic as an AND/OR tree : : : : : : : : 4.4.2 Enhancements : : : : : : : : : : : : : : : 96 97 97 97 99 99 100 102 104 107 107 107 95 CONTENTS 4.5 Solving qubic : : : : : : : : : : : : 4.5.1 Subdividing the game tree : 4.5.2 Statistics : : : : : : : : : : 4.5.3 Comparison with Patashnik 4.5.4 Reliability : : : : : : : : : : ix : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 109 109 113 116 118 5 Go-Moku 5.1 Background : : : : : : : : : : : : : : : : : : : : : : : : : : 5.2 Rules and strategies : : : : : : : : : : : : : : : : : : : : : 5.2.1 Rules : : : : : : : : : : : : : : : : : : : : : : : : : 5.2.2 Threats and threat trees : : : : : : : : : : : : : : : 5.2.3 Human strategies : : : : : : : : : : : : : : : : : : : 5.3 Applying db-search : : : : : : : : : : : : : : : : : : : : : : 5.3.1 A single-agent search in go-moku : : : : : : : : : : 5.3.2 A db-search framework for go-moku : : : : : : : : 5.3.3 Go-moku speci c enhancements to db-search : : : 5.3.4 Heuristically improving the e ciency of db-search 5.3.5 Additional requirements for standard go-moku : : 5.4 Applying pn-search : : : : : : : : : : : : : : : : : : : : : : 5.4.1 Go-moku as an AND/OR tree : : : : : : : : : : : 5.4.2 Enhancements : : : : : : : : : : : : : : : : : : : : 5.5 Solving go-moku : : : : : : : : : : : : : : : : : : : : : : : 5.5.1 Victoria's I/O : : : : : : : : : : : : : : : : : : : : : 5.5.2 Subdividing the game tree : : : : : : : : : : : : : : 5.5.3 Statistics : : : : : : : : : : : : : : : : : : : : : : : 5.5.4 Reliability : : : : : : : : : : : : : : : : : : : : : : : 121 121 122 123 124 128 129 130 132 135 139 141 143 143 143 148 148 149 149 152 6 Which Games Will Survive? 6.1 Scope : : : : : : : : : : : : : : 6.2 Game properties : : : : : : : : 6.2.1 Perfect information : : : 6.2.2 Convergence : : : : : : 6.2.3 Sudden death : : : : : : 6.2.4 Complexity : : : : : : : 6.3 The games of the Olympic List 6.3.1 Qubic : : : : : : : : : : 6.3.2 Connect-Four : : : : : : 6.3.3 Go-moku : : : : : : : : 6.3.4 Nine men's morris : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 155 155 156 156 157 158 158 161 162 163 164 165 x 6.3.5 Awari : : : : : : : : : : : : 6.3.6 Othello : : : : : : : : : : : 6.3.7 Checkers : : : : : : : : : : : 6.3.8 Draughts : : : : : : : : : : 6.3.9 Chess : : : : : : : : : : : : 6.3.10 Chinese chess : : : : : : : : 6.3.11 Renju : : : : : : : : : : : : 6.3.12 Go : : : : : : : : : : : : : : 6.3.13 Scrabble : : : : : : : : : : : 6.3.14 Backgammon : : : : : : : : 6.3.15 Bridge : : : : : : : : : : : : 6.4 Reviewing the problem statement : 6.4.1 The research questions : : : 6.4.2 The problem statement : : 6.5 Predictions : : : : : : : : : : : : : 6.5.1 Future playing strength : : 6.5.2 The future of games : : : : CONTENTS : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 166 167 168 169 171 172 173 174 175 176 177 179 179 181 182 182 183 A Domain-speci c solution to DLP Summary Samenvatting Curriculum Vitae Bibliography Index 185 187 189 191 193 203 List of Tables 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11 2.12 2.13 2.14 3.1 3.2 3.3 3.4 3.5 3.6 Pn-search algorithm. : : : : : : : : : : : : : : : : : : Most-proving node selection algorithm. : : : : : : : : Proof and disproof numbers calculation algorithm. : Node-development algorithm. : : : : : : : : : : : : : Ancestor-updating algorithm. : : : : : : : : : : : : : Pn-search algorithm (with current node). : : : : : : Ancestor updating algorithm (enhanced). : : : : : : Give-away chess results. : : : : : : : : : : : : : : : : Number of times an algorithm performed best of all. Comparing pairs of algorithms on easy positions. : : Comparing pairs of algorithms on hard positions. : : Test gures per algorithm. : : : : : : : : : : : : : : : Positions per group, per grouping algorithm. : : : : Positions per group, grouped by all four algorithms. Symbols used in db-search framework : : Main db-search algorithm. : : : : : : : : : Dependency-stage algorithm. : : : : : : : Dependent-children algorithm. : : : : : : Combination-level algorithm. : : : : : : : Algorithm to nd combinations of nodes. : : : : : : : 4.2 Number of positions in qubic solution per depth. : 5.1 Nodes per tree depth in go-moku solutions. : : : : : : : : : : 4.1 Number of positions in the qubic solution : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 6.1 Predictions for the Olympic Games in the year 2000 : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 25 26 27 28 28 33 34 36 53 53 54 54 55 58 70 88 88 89 89 90 114 117 151 182 xi xii LIST OF TABLES List of Figures 1.1 The interdependencies of chapters : : : : : : : : : : : : : : : 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11 2.12 2.13 3.1 3.2 3.3 3.4 3.5 4.1 4.2 4.3 4.4 4.5 5.1 5.2 10 and/or tree with proof numbers. : : : : : and/or tree with disproof numbers. : : : and/or tree with most-proving node R. : and/or dag with practical solution. : : : Cyclic and/or graph. : : : : : : : : : : : : : : : : : : : : : : 20 : : : : : : : : : : : 21 : : : : : : : : : : : 24 : : : : : : : : : : : 40 : : : : : : : : : : : 41 Tree version of the graph of gure 2.5. : : : : : : : : : : : : : 42 A position with legal moves A1, C 4 2, D19 7, E 4 and F 2 4. 44 1: B 1 f 1 wins. After 1: E 1? f 1 south must play 2: F 1: : : : : 45 Comparison based on grouping by - : : : : : : : : : : : : : 56 Comparison based on grouping by transpositions : : : : : : 56 Comparison based on grouping by basic pn : : : : : : : : : : 57 Comparison based on grouping by stones pn : : : : : : : : : 57 Comparison based on grouping using all four algorithms. : : : 59 Search graph after 1st dependency stage for theorem aaccadd. 86 Search graph after 1st combination stage for theorem aaccadd. 86 Search graph after 2nd dependency stage for theorem aaccadd. 87 Complete dependency-based search graph for theorem aaccadd. 87 Tree size per algorithm applied to the double-letter puzzle. : 92 Three types of groups in qubic. : : : : : : : : : : : : : : : : : 98 An 11-ply winning threat sequence. : : : : : : : : : : : : : : : 99 The 12 two-ply moves. : : : : : : : : : : : : : : : : : : : : : : 100 Cube numbers on the qubic board. : : : : : : : : : : : : : : : 111 A deep winning line. : : : : : : : : : : : : : : : : : : : : : : : 115 Threats in go-moku. : : : : : : : : : : : : : : : : : : : : : : : 125 Complicated threats. : : : : : : : : : : : : : : : : : : : : : : : 126 xiii xiv 5.3 5.4 5.5 5.6 5.7 5.8 5.9 LIST OF FIGURES Winning threat variations : : : : : : : : : : : : : : White defending with multiple-stone replies : : : : White refutes a potential winning threat sequence. Global refutation of all potential winning lines. : : Black threatens to win by moves 1 through 7. : : : Replies to the threat sequence of gure 5.7 : : : : : Deep variations : : : : : : : : : : : : : : : : : : : : : : : : : : : 6.1 Estimated game complexities. : : : : : : : : : : : : : A.1 Solution to instance aabdcbbdcaa of dlp. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 127 132 138 140 145 146 152 161 186 Preface The research presented in this thesis would have been impossible without the help of many persons, whom I want to recognize here. First of all, I would like to thank Jaap van den Herik for being my teacher. Jaap created an environment generously providing an abundance of learning opportunities. His e orts have been manifold, notably those aimed at teaching me how to write up scienti c research as re ected in this thesis. Still, any mistakes remaining are my own. Administrative complications unfortunately prevented my two auxiliary thesis advisors, Jonathan Schae er and Bob Herschberg, from due mention for their essential e orts. I would like to redress the balance. Jonathan Schae er's in uence on this thesis has several facets. His work on cn-search forms the foundation on which pn-search has been built. During that process, his constant interest in pn-search has led to an increased understanding of the strengths and weaknesses of the algorithm. Furthermore, his comments on earlier versions of this thesis have led to major improvements, most notably in chapter 3. Bob Herschberg has scrutinized many draft versions of this thesis. The ensuing comments and explanations, regarding all di erent levels of the art of writing, can best be compared with a chess Grand Master introducing a young player to the many intricacies of the game. Bob's e orts have not only greatly improved the thesis at countless points, his guidance has shown me that much remains to be learned. For guiding me along this path, arduous as it may have been, I owe Bob Herschberg sincere thanks. Besides my three thesis advisors, I want to thank Maarten van der Meulen for the research we did together. His contribution to the development of pnsearch has been indispensable. I also want to thank my room mate Dennis Breuker for always being available to discuss new ideas and for all the games he beat me at over lunch. I would like to thank Matty Huntjens for creating order in the chaos of my experiments and Loek Schoenmaker for creating the X-interface for our go-moku program. I want to thank Patrick Schoo for our xv xvi Preface collaboration on the qubic program. Many thanks also go to Barney Pell. Our email discussions, as well as the times we met in person have been a source of inspiration. I would like to thank my colleagues at the Department of Computer Science of the University of Limburg, for making me feel at home. Furthermore, I would like to thank my colleagues from the ai-group at the Vrije Universiteit, who enabled me to nish this thesis. Moreover, I would like to thank the Foundation for Computer Science Research in the Netherlands (sion) and the Netherlands Organization for Scienti c Research (nwo) for their nancial support. Besides the e orts of my thesis advisors, the nal version of this dissertation has bene ted from valuable suggestions by several people: Ingo Althofer, Barney Pell, Loek Schoenmaker, Mark Willems and the members of the beoordelingscommissie. Finally, I want to thank my family and friends for their stimulating interest in my research. Most important of all, however, has been the continuous support and stimulation of my wife Petra and my daughter Cindy. Victor Allis Boukoul, July 1994 Chapter 1 Introduction In this thesis "intelligent" games are investigated from the perspective of Arti cial Intelligence (ai) research. In this chapter the relevance of such investigations is discussed, leading to the formulation of a problem statement. 1.1 Speculations and AI All through history, mankind has been fascinated by the thought of creating machines to perform the most di cult of tasks. Men of every era have dreamt of and speculated about achievements beyond the scope of the technology of their time. Yet, when confronted with a machine performing tasks at an unexplained high level, many willingly believed that science and technology had made it possible, instead of doubting the genuineness of the machine's results. For example: In 1769, Wolfgang von Kempelen demonstrated his chess-playing automaton, the Turk, to the world (Carroll, 1975). It was the rst machine to create the illusion of having mental abilities: playing chess at a high level. Among its successes was a victory over the Prussian king Frederick the Great. For many years, large numbers of people believed that the Turk was a true thinking machine, even though the technology of the 18th century did not hint at how such a machine could have been created. For exactly that reason, many others believed that the Turk had to be a fraud. Nevertheless, the secret of the small human chess-player hidden inside the Turk was well-kept until 1834. With the creation of modern computers, the eld of Arti cial Intelligence emerged as a new focal point for speculations. Some of these speculations 1 2 Chapter 1. Introduction have been made by scientists working within the eld, while others have been made by laymen, such as those working in the motion-picture industry. For instance, movies such as 2001: A Space Odyssey, Star Wars and War Games feature computers (resp. hal, r2-d2 and c-3po, and wopr) which seem to have minds of their own. The impact of these truly arti cially intelligent entities, ctitious as they may be, on the perception of ai research by the public at large is considerable. Predictions presented by leading scientists in the eld reinforce the image created by movies and science- ction authors. As an example we refer to the Inaugural Lecture delivered by Van den Herik in which he raised the question whether computers will be able to decide issues of law (Van den Herik, 1991). Irrespective of Van den Herik's estimation of several hundreds of years necessary to create an arti cial judge, the spin-o of such speeches in terms of nation-wide coverage by newspapers, radio and television strengthens the general public's idea that the creation of arti cially intelligent entities is within close range. It is important to distinguish clearly between the state-of-the-art in ai and speculations concerning future achievements. We present three well-known examples of progress in ai, each of which has led to unjusti ed speculations: 1. Newell et al. (1957) created the General Problem Solver, a new control metaphor for representing and solving problems. The name of their system led to speculations concerning the creation of a truly general problem solver. More than three decades later ai has not produced anything near such a goal. 2. In 1959, Samuel created his learning checkers1 program which won a game against a human master player (Samuel, 1959 Samuel, 1967). From this single game, it has been wrongly concluded by many that an arti cial master checkers player had been created, while some even believed that the game of checkers had been "solved" (Schae er et al., 1991). Samuel's work on learning is classical within ai but only recently have programs begun to compete with the best human checkers players (Schae er et al., 1992). 3. The medical diagnostic expert system mycin determines the infectious agent in a patient's blood, and speci es a treatment for the infection (Shortli e, 1976 Buchanan and Shortli e, 1984). Despite the promise 1 In this thesis we shall use the name checkers for the game played on an 8 8 board, which is called checkers in the United States of America, and draughts in Great Britain. We reserve the term draughts for the game played on a 10 10 board. 1.2 Identifying the obstacles 3 created by successes such as mycin, the development of expert systems has been hindered by many problems, such as the knowledge-acquisition bottleneck (Feigenbaum, 1979). Speculations regarding machines replacing doctors of medicine so far lack a scienti c basis. The three examples illustrate that ai research in the last decades of the twentieth century is not directly involved in creating true intelligence. Instead, many of the stumbling blocks on the road to such a goal are now themselves the main subject of investigation. Only when these obstacles are removed may we start looking for the goal implicit in the name of the eld. 1.2 Identifying the obstacles It is believed by many scientists that the main hurdle to be cleared when creating arti cial experts in practical domains is common-sense knowledge (Marr, 1977). Where humans are extraordinarily well equipped to acquire common-sense knowledge with their ve senses, computers are de cient in this area. Despite e orts in areas such as computer vision, robotics, speech processing etc., no computer program exists which exhibits even a basic understanding of the real world (Marr, 1977). This lack of knowledge severely handicaps computers in becoming experts in any real-world domain, such as medicine, law, manufacturing etc. A direct consequence is the failure in dealing with natural languages. In conversations between human beings many things are left unsaid without hindering the participants. The gaps are lled by common-sense knowledge and sentences are interpreted within the context of our world view. Many ai researchers thus believe that common-sense knowledge is a vital ingredient for natural language processing (Charniak, 1978). Another area where nature has been generous to humans is learning. Humans continuously learn from their experiences, much unlike computer programs. Whereas learning is an automatic built-in feature of infants, it is di cult to realize in computer programs, despite the e orts spent on machine learning (Michalski et al., 1983 Michalski et al., 1986). The lack of common-sense knowledge and of learning have a large impact on what computers can and cannot do. Besides these known obstacles, we may wonder whether other, hidden obstacles hinder progress in ai. For instance, some argue that intuition is a human quality which cannot be implemented (De Groot, 1965), while others believe that intuition is simply a name for rule-based behavior where the rules are not accessible 4 Chapter 1. Introduction to consciousness (Michie, 1982). Thus, while some consider intuition to be unattainable for computers, others stress that to implement intuition, all we need to do is to uncover the rules at its basis. In general, it is of interest to know as many of the main obstacles hindering progress in ai as possible. It remains in dispute whether intuition should be regarded as such. 1.3 Uncovering hidden obstacles Some new obstacles for ai research may become visible only after we have successfully dealt with the obstacles apparent today. Others may be discovered by concentrating on a set of domains where known obstacles play no role of importance, such as the domain of games. Many games, such as chess, checkers, go and bridge possess the property that they create a micro world (Van den Herik, 1983), in which common-sense knowledge and natural languages are not relevant. Instead, a small set of rules determines all possible states within the micro world. And yet, in most of these games, humans are (still) superior to their arti cial counterparts. The game of go is a striking example: today's strongest go programs have reached a mere novice level. By investigating a game, we envision two possible outcomes. If we achieve a playing strength su cient to defeat the best human players, analysis of the means which led to this improvement may uncover new ai techniques. If the playing strength keeps falling short, even after prolonged attempts, of that of the best human players, a better understanding of the problems inherent in playing the game at a high level may be acquired. We remark that the possibility remains that the results do not lead to progress (i.e., to new ai techniques or a better understanding of the inherent problems). In the rst case, the improvement may be due to entirely domainspeci c techniques which cannot be generalized to ai techniques (Dreyfus, 1980). In the second case, we may nd that we have di culty in isolating the problems from our failed attempts. Although a lack of progress may occur in some cases, by investigating a representative set of games in this way the probability increases that new ai techniques are developed or insight into problems hindering progress is obtained. If similar problems are found in several di erent games, it may help us to uncover obstacles which are likely to exist in real-world domains as well. 1.4 The problem statement 5 It could also lead to an understanding of the restrictions of the techniques applied. We list two examples of this last phenomenon. After the rapid increase in playing strength of computer chess programs in the seventies and eighties, it was suggested that an increase of the search depth by an extra ply (i.e., a move by one player), was equivalent to an increase in playing strength of approximately 200 elo points (Thompson, 1982). Now that progress in playing strength has slowed down, investigations in the relation between search depth and playing strength for checkers indicate that the added strength per ply diminishes for deeper searches (Schae er, 1993b). Furthermore, positions have occurred in tournament games where a search of 60 ply would be necessary to stand up against human knowledge (Schae er, 1993b). Because such searches are by far out of reach of current technology, it has become clear that added knowledge is a vital ingredient to world-champion level checkers and chess programs. In the early days of ai research, many new weak methods (i.e., using little domain-speci c knowledge) were demonstrated to succeed on toy problems (Winston, 1992). It was believed that through deeper search the results on toy problems could be extrapolated to real-world problems. This has proved to be more di cult than anticipated. Using su cient domain knowledge, state spaces can be reduced such that problems become solvable. However, when vital knowledge is excluded the explosion of possibilities makes many such problems intractable. We postulate that when investigating su ciently complex games with the goal of outperforming human beings, success is likely to yield new ai techniques as their products, while failure presents a better understanding of problems and obstacles encountered. This observation is the basis of the problem statement presented in the next section. 1.4 The problem statement In this thesis, we consider games which have the following ve properties. Examples of games which have these properties include chess, checkers, go and bridge. 1. Two-player. Most games are two-player games, as opposed to zeroplayer games (e.g., Conway's life (Berlekamp et al., 1982)), one-player 6 Chapter 1. Introduction games (e.g., the 15-puzzle (Korf, 1985), Rubik's cube and peg solitaire (Beasley, 1985)) and multi-player games (e.g., poker and diplomacy (Hall and Loeb, 1992)). 2. Zero-sum. These are games where one player's loss is the other player's gain. The prisoners' dilemma (Hofstadter, 1985) when considered as a game is not zero-sum. 3. Non-trivial. A best playing strategy should not be trivially establishable through enumeration or mathematical analysis. Examples of trivial games are tic-tac-toe and nim. 4. Well-known. These are games which have been played by large numbers of people, resulting in the game being known in several countries. This excludes many mathematical games, and obscure variations on wellknown games (such as give-away chess). 5. Requiring skill. Some games serve mainly as a pastime, not requiring much skill. The more experienced player has no real advantage in those games, except maybe against novices (examples are many simple card games played by children). The games included here should exhibit a strong relation between skill and winning chances. Such a relation also exists in some games which are in uenced by a chance element, such as backgammon and bridge, which are thus included. The rst two properties (two-player and zero-sum) are selected to ensure that cooperation between players can be excluded from the investigations. The third property (non-trivial) is necessary for us to have something to investigate. The last two properties (well-known and requiring skill) ensure that the results of our investigations can be checked (for instance against strong human players) and evaluated. To be more speci c, we list the set of games played at the Computer Olympiads which ful ll all these criteria (Levy and Beal, 1989 Levy and Beal, 1991 Van den Herik and Allis, 1992). This list of games will henceforth be called the Olympic List. awari, backgammon, bridge, chess, Chinese chess, checkers, connectfour, draughts, go, go-moku, nine men's morris, othello, qubic, renju, scrabble. We do not claim that the fteen games of the Olympic List are the only games satisfying the ve properties listed above. However, as long as 1.5 Solving games 7 su cient challenges exist for the listed games, there is no need to try to be complete. We are now ready to present our problem statement, consisting of two questions. Through an investigation of games of the Olympic List, 1. which new ai techniques can be developed and 2. which obstacles for ai research will emerge? The goal of this thesis is to nd an answer to these questions. To this end, we list below three detailed research questions, distinguishing between performance levels of systems which may be the result of investigating games of the Olympic List. 1. Which games can be solved (see section 1.5) and what techniques may contribute to the solution? 2. For which games can we create programs outperforming the best human players in the near future, and what techniques contribute to their performance? 3. In which games will humans continue to reign in the near future (say, at least the next decade) and what are the main obstacles to progress for computer programs? Our attempts to answer these three questions have guided the research e orts described in this thesis. Before we give an outline of the thesis in section 1.6, we must clarify the term solved in relation to games. As there is no consensus about this term, we will give a de nition in section 1.5. 1.5 Solving games Stating that a game is solved usually indicates in common parlance that a property with regard to the outcome of the game has been determined. Even for two-player zero-sum games with perfect information (see section 6.2), at least three di erent de nitions could be meant, which we name ultraweakly solved, weakly solved and strongly solved. The rst two terms have been suggested by Paul Colley, while the third term has been suggested by Donald Michie. 8 ultra-weakly solved Chapter 1. Introduction For the initial position(s), the game-theoretic value has been determined. For the initial position(s), a strategy has been determined to obtain at least the game-theoretic value of the game, for both players, under reasonable resources. For all legal positions, a strategy has been determined to obtain the game-theoretic value of the position, for both players, under reasonable resources. weakly solved strongly solved We remark that the reasonable resources mentioned may be a subject of discussion. The size of the resources is meant only to give an approximate indication of the time and computing equipment allowed for reproducing a solving strategy. Without these restrictions, it could be argued that, for instance, chess could be weakly solved. As a strategy to solve chess, an - search through the full game tree su ces. The reasonable resources mentioned should typically allow the use of a state-of-the-art computer and several minutes of computation time per move. The de nition of ultra-weakly solved indicates that, at the start of the game, it is known what the outcome of the game would be with optimal play by both sides. It is not necessarily known how either player can achieve the optimal outcome. The game of hex, for instance, is known to be a rst-player win on all diamond-shaped boards, although no constructive strategy has been determined. The game-theoretic value has been established by noting that the game does not permit draws and that having an extra move cannot be a disadvantage. Thus, since the rst player does not need to lose, hex is a rst-player win. This reasoning has not (yet) led to a winning strategy for the rst player, which makes it of little use to practical play. It is well-known that tic-tac-toe is a game-theoretic draw. A player who has weakly solved tic-tac-toe only needs to be able to achieve a draw, in every game she2 plays. It is not necessary for her to win against a non-optimally playing opponent, when she is given a winning opportunity. The de nition of strongly solved demands a strategy not just from the initial position(s), but from all legal positions. Thus, against a non-optimally playing opponent, each mistake must be capitalized upon. Examples of strongly-solved games are tic-tac-toe, nim (Knuth, 1969) and many chess In contexts where the gender of a non-neutral third person is irrelevant, we will always use "she" and "her" to avoid the more cumbersome "s(he)" and "her/his". 2 1.6 Thesis outline 9 endgames (Van den Herik and Herschberg, 1985 Thompson, 1986 Stiller, 1989). An ordering exists between the three de nitions. Any strongly-solved game, is also weakly solved, while a weakly-solved game is also ultra-weakly solved. To see the latter, it su ces to play a single game from each initial position of the game, with both sides played by the system which solved the game. The outcome of such a game is guaranteed to be the best attainable by both players, equaling the game-theoretic value of the game. In any domain for ai research, evaluation of the practical performance of the systems produced is essential. The natural performance test of a gameplaying system is a match consisting of a large number of games against a rated opponent. When claiming that a program has solved a game, it seems reasonable to require the program to exhibit skill in such a match. A program which has ultra-weakly solved a game does not guarantee being capable of playing the game at all. A program which has weakly solved a game will at least draw every match it plays (while it plays both sides equally often). Note, however, that for games where the program has shown the game to be a win for the stronger side, it is not guaranteed to exhibit any skill when playing the weaker side. The guaranteed performance level, i.e., ensuring that no single match is lost, is in our opinion su cient to declare a game solved. In this thesis, we consider a game solved when it is at least weakly solved. 1.6 Thesis outline In 1988, research performed for a Master's thesis (Allis, 1988) led to solving connect-four, published as (Uiterwijk et al., 1989a). Inspired by this result, we decided to start with the rst research question, i.e., determining which other games of the Olympic List can be solved, and identifying techniques which contribute to their solution. In particular, of the fourteen remaining games of the Olympic List (i.e., excluding connect-four), we have selected four which seemed eligible for solution. These games are awari, qubic, nine men's morris and go-moku. Awari and nine men's morris are selected for their relatively small state-space complexity (see chapter 6), while qubic and gomoku are selected since human experience indicates that the rst player has an overwhelming advantage. As Ralph Gasser has been investigating nine men's morris concurrently with our research (Gasser, 1991), we have concentrated on awari, qubic and go-moku. During investigation of these games, two new search techniques have been 10 1. Introduction Chapter 1. Introduction 2. Proof-number search 3. Dependencybased search 4. Qubic 5. Go-Moku 6. Which Games will Survive? Figure 1.1: The interdependencies of chapters developed, viz. proof-number search (pn-search) and dependency-based search (db-search). While db-search forms the basis for solving qubic and go-moku, pn-search is an important contributing factor. Although our investigations showed that applying pn-search to awari leads to promising results, awari has not (yet) been solved. The results of our investigation of the rst research question are described in chapters 2 through 5. In chapter 6 the second and third research questions are investigated leading to an evaluation of the problem statement. The thesis is organized as follows. It consists of four parts, the rst of which is this introduction. The second part consists of chapters 2 (ProofNumber Search) and 3 (Dependency-Based Search), containing descriptions of the two search techniques developed in the course of this research. Both techniques are presented independently of their application to games. Chapters 2 and 3 can each be read independently of other parts of the thesis and are of special interest to those researchers who would like to apply the techniques to their own research domains. The third part of the thesis consists of chapters 4 (Qubic) and 5 (Go-Moku), each describing the solution to the game under investigation. Although it is recommended to read chapter 2 before any of the game-speci c chapters, proof-number search is not essential foreknowledge. Dependencybased search forms the basis for solving qubic and go-moku. It is, therefore, necessary to read chapter 3 before starting on chapters 4 and 5. 1.6 Thesis outline 11 The fourth part of the thesis consists of chapter 6 (Which Games Will Survive?), in which all games of the Olympic List are investigated. For each game, we determine the value of four game properties, describe the state of the art in game-playing programs, list the techniques applied and the obstacles to progress. Next we evaluate our research with respect to the problem statement. Predictions regarding the future of games conclude the chapter. The fourth part of the thesis can be read independently of the second and third parts, although it is recommended that the reader rst obtains some knowledge of the contents of these parts. The interdependencies between the chapters are pictured in gure 1.1. An arrow from chapter A to chapter B indicates that A is essential foreknowledge for B . A dashed arrow between chapters A and B indicates that it is recommended, but not essential, to read A before B . 12 Chapter 1. Introduction Chapter 2 Proof-Number Search 2.1 Knowledge representation and search Problem solving is one of the corner-stones of ai research. Within problem solving, we distinguish two subprocesses: choosing a knowledge re-presentation and performing a search. We remark that the term knowledge representation is meant to include analysis, conceptualization and formalisation. A well-chosen representation may considerably reduce the amount of search needed to solve a problem, while a badly chosen representation may render solving a problem (virtually) impossible. As an example we present the mu-puzzle (Hofstadter, 1979). A production system consisting of four rewriting rules generates theorems consisting of the letters m, i and u. In each production, x and y denote any string of letters. 1. xi ! xiu 2. mx ! mxx 3. xiiiy ! xuy 4. xuuy ! xy The goal of the mu-puzzle is to determine whether mu is a theorem in the above system, given that mi is the only axiom. In a rst attempt to solve the puzzle, we represent a theorem simply by its string of letters. The rewriting rules are used to expand nodes of the search tree, where each node represents a theorem. We are now faced with a tree-search problem: to 13 14 Chapter 2. Proof-Number Search nd a path of rewriting rules leading from the initial state mi to the goal state mu. A suitable tree-search algorithm is selected to perform the search, such as breadth- rst search or depthrst search. To select a search algorithm, various criteria may be applied. For instance, breadth- rst search guarantees that the rst solution found is also the shortest solution (Nilsson, 1980). A disadvantage of breadth- rst search is that it requires more working memory than an algorithm such as depth- rst search (Nilsson, 1980). Generally, each of the applicable search algorithms has its own advantages and disadvantages. In case no solution exists, these algorithms have the disadvantage that the search will not terminate, as the set of theorems is in nite. Instead of concentrating on the selection of the best possible search algorithm, we may rst try to optimize the chosen representation. For the mu-puzzle, a better representation involves an extra item of knowledge per theorem. This Boolean item, which we name IsTripleI, indicates whether the theorem's total number of is is a multiple of three. We can now verify that each of the four rewriting rules creates new theorems with IsTripleI's value equal to that of the theorem it is created from. The observation that mi (false) and mu (true) have unequal IsTripleI values is su cient to prove that mu is not a theorem. In the mu-puzzle example, it was possible to eliminate all search by enhancing the representation of the puzzle. It illustrates that choosing a representation should have the highest priority when solving problems. Choosing a knowledge representation in problem solving is mostly domainspeci c. Even though general techniques (such as abstraction, here applied to the mu-puzzle) exist, their successful application remains the fruit of a thorough understanding of the domain under investigation. For problems more complex than the mu-puzzle, a good representation generally does not eliminate all search it merely reduces the size of the state space to, hopefully, manageable proportions. It is then important to select a search algorithm which will nd a solution, if it exists, in an e cient manner. The e cient manner is to be understood here in a broad sense, including programming time, calculation time and the required amount of working memory. The weighting of these resources depends on the circumstances in which the problem has to be solved. 2.1 Knowledge representation and search 15 Thus, the domain-speci c task of nding a suitable knowledge representation is performed in combination with the selection of a search algorithm well-suited for the state space. In the course of a considerable number of years of research in ai, many di erent search algorithms have been developed. We distinguish between several categories of search problems, such as those represented by single-agent trees, and/or trees and game trees (Nilsson, 1971). While the category that a search problem belongs to restricts our choice of search algorithms, within each category several search algorithms exist, each with its own characteristics. These characteristics determine the scope of problems for which the algorithm may be preferred over the other algorithms within the same category. We remark that the division into search categories is not strict. An example relevant to this thesis is that two-valued game-tree searches can also be performed by search algorithms for and/or trees. For the category of game trees, many di erent search algorithms have been developed. We name the best known algorithms and mention the type of search problems for which we believe they are best suited: By far the best-known game-tree search algorithm is - search (Knuth and Moore, 1975). It is a directional (also known as depth- rst) algorithm, having working-memory requirements linear in the depth of the tree investigated. Knuth and Moore (1975) have shown that - search achieves optimal e ciency on perfectly-ordered uniform trees. Application of iterative deepening to - search ensures for many application domains that strongly-ordered trees are traversed, resulting in close-to-optimal e ciency on uniform trees (Campbell and Marsland, 1983). Sss* is a best- rst search algorithm (Stockman, 1979). It will never investigate a node pruned by - search (Campbell and Marsland, 1983). The algorithm has two drawbacks. First, as with all best- rst search algorithms, the working-memory requirements are linear in the number of nodes created, thus exponential in the depth of the tree. However, recently variants requiring less working memory have been developed (Reinefeld, 1994). Second, the reduction in the number of nodes searched compared with iterative-deepening - search does not outweigh the cost of maintaining the search tree (or open list) in working memory for most practical applications. However, if the cost 16 Chapter 2. Proof-Number Search of heuristic evaluation is large compared to the cost of traversing the tree, or if obtaining a good ordering through iterative deepening for - search is di cult for the domain under investigation, sss* may be an alternative to be preferred. Another best- rst search algorithm is b* (Berliner, 1979). It depends on the availability of reliable heuristic estimates for the upper and lower bounds on the value of internal nodes. For chess, the algorithm has been implemented in hitech, but it remains unclear whether for this domain su ciently accurate upper and lower bounds can be estimated to result in better move selection than by algorithms based on - search. Conspiracy-number search (cn-search) (McAllester, 1988 Schae er, 1989) is a best- rst search algorithm which determines the cardinality of the smallest sets of (terminal) nodes which must change their value in order to change the value of the root. Once this cardinality grows beyond a pre-speci ed bound, it is considered unlikely that the root value will change, and the search is terminated. Cn-search has shown its merits in tactical chess positions (Schae er, 1989), but has failed in a comparison with - search in a tournament chess program (Van der Meulen, 1990). Cn-search has as disadvantages the large amount of bookkeeping necessary at each node, and the subsequent amount of working memory required to perform the search. One of the ideas underlying cn-search is that the distribution of the values over the leaf nodes of the tree, and the shape of the tree, should in uence the selection of the next node to be investigated. The last aspect of cn-search, using the shape of the tree to guide the search, has been singled out in proof-number search (pn-search), which can be seen as a successor to conspiracy-number search. In this chapter we present pnsearch, which has the exploitation of non-uniformity as its main theme. Pnsearch will be presented as an and/or tree search algorithm, even though all applications discussed in this thesis concern game trees. We introduce in section 2.2 the pn-search algorithm for and/or trees. In section 2.3, several enhancements to the algorithm are presented. These include techniques to reduce execution time and usage of working memory, examples of the application of domain-speci c knowledge, and a discussion regarding transpositions within pn-search. Results of applying pn-search to a practical domain, the game of awari, are presented in section 2.4, where its performance is compared with those of sophisticated implementations of 2.2 Pn-search: the algorithm 17 - search. Finally, section 2.5 contains a discussion of related algorithms, analyzing the similarities and di erences between pn-search and conspiracynumber search, sss*, b* and a*. (A* (Hart et al., 1968), a single-agent search algorithm, has been included in this list because of its similarities with pn-search.) 2.2 Pn-search: the algorithm In this section we introduce pn-search for and/or trees. First, in section 2.2.1 we de ne our tree model and a precise terminology for the remainder of the chapter. Then, the main assumptions of pn-search are described in section 2.2.2 and the notions of proof numbers and disproof numbers are introduced. Next, section 2.2.3 informally discusses the order in which the nodes of a pn-search tree should be created. Finally, an algorithm in pseudocode for pn-search is presented in section 2.2.4. 2.2.1 The AND/OR-tree model We de ne our tree model as follows. In the tree, there are two types of nodes: and nodes and or nodes. We assume that each node can be evaluated, leading to one of three values: false, true or unknown. Please note the di erence between nodes which have not yet been evaluated (thus whose evaluation value is not yet known) and nodes which have been evaluated and obtained the value unknown. Nodes with evaluation value unknown can be expanded. When a node J is expanded, a non-empty set of child nodes is created, each having J as parent node. A node which has been expanded is an internal node. There are three kinds of leaf nodes, i.e., nodes without children. First, a node evaluated to false or true is a terminal node. Second, a node which has evaluated to unknown is called a frontier node. Third, a node which has not yet been evaluated is also called a frontier node. There are two tree-creation procedures, which we name immediate evaluation and delayed evaluation. When applying immediate evaluation each node in the tree is immediately evaluated upon creation. The tree is initialized by creating (and evaluating) the root. Then, as long as the tree has not been solved, at each step a frontier node is selected (which, since it has already been evaluated, must have value unknown), expanded and all its children are immediately evaluated. This process of expanding a node J and evaluating J 's children is called developing node J . In case of delayed evaluation, each 18 Chapter 2. Proof-Number Search node is only evaluated when it is selected, instead of at creation. Thus, the tree is initialized by creating the root (without evaluation). Then, at each step a frontier node J is selected (which is guaranteed not to have been evaluated) and evaluated. If the evaluation value of J is unknown, J is expanded (without evaluating J 's children). Here the process of evaluating a node, possibly followed by its expansion, is also called developing node J . We remark that the terms frontier node and developing each have a double meaning. However, once the tree-creating procedure has been speci ed, both terms are unique. This approach has been chosen so that pn-search can be explained independently of the tree-creation procedure. The value of an expanded internal and node A is determined as follows: if A has at least one child with value false, A also has value false otherwise, if A has at least one child with value unknown, A has value unknown otherwise A has value true. The value of an expanded internal or node O is determined as follows: if O has at least one child with value true, O also has value true otherwise, if O has at least one child with value unknown, O has value unknown otherwise O has value false. A tree is solved if the value of its root has been established as either true or false. A solved tree with value true is called proved, while a solved tree with value false is called disproved. Throughout this chapter, we depict and nodes by circles and or nodes by squares in each of the gures. Furthermore, and nodes can be recognized by the arcs linking their children, in accordance with standard conventions for depicting and/or trees. 2.2.2 Main assumptions of pn-search Best- rst search algorithms select a best node (according to some criterion) in the search tree, develop the node and then update such information as is necessary for the algorithm to continue. The distinguishing factor of each best- rst search algorithm is the manner in which a node is characterized as 'best'. For pn-search we assume that we have no knowledge regarding a priori probable values of nodes, nor knowledge regarding correlations between node values, although this knowledge could be added to the program (see section 2.3.3). Instead, only the position of a node in the tree and its possible contribution to solving the tree is considered. First, we formulate the assumptions of pn-search, implying the above. Second, we present some de nitions to aid in the description of pn-search. Third, using an example, we illustrate that some nodes are better in their 2.2 Pn-search: the algorithm 19 contribution to solving the tree than others. Finally, we summarize our ndings. Assumptions While searching and/or trees, we make the following two assumptions. 1. The probability distribution of values (true, false, unknown) for a frontier node is unknown. 2. The probability distribution of values (true, false, unknown) for a frontier node is equal throughout the tree. Even though these assumptions mean that we cannot distinguish between two nodes by looking at them independently of their context, nevertheless their position in the tree may in uence their expected contribution to solving the tree. De nitions When searching and/or trees, developing a single frontier node is often insu cient to solve the tree. In most cases, several frontier nodes must obtain the value true to prove the tree or the value false to disprove it. This observation is re ected in de nitions 2.1 and 2.2. De nition 2.1 For any and/or tree T a set of frontier nodes S is a proof set if proving all nodes within S proves T. De nition 2.2 For any and/or tree T a set of frontier nodes S is a disproof set if disproving all nodes within S disproves T. Since it will turn out that we shall use the cardinality of proof and disproof sets, these are given names in de nition 2.3 and 2.4. De nition 2.3 For any and/or tree T, the proof number of T is de ned as the cardinality of the smallest proof set of T. as the cardinality of the smallest disproof set of T. De nition 2.4 For any and/or tree T, the disproof number of T is de ned 20 Chapter 2. Proof-Number Search A 1 B 2 C 1 D O O E 1 F 1 G 1 H 2 I 1 J O O K true 0 L 1 M 1 N 1 false O O O P 1 Figure 2.1: and/or tree with proof numbers. Examples To illustrate how the context can be used to distinguish between nodes, we have depicted an and/or tree in gure 2.1. With each node, we have associated the proof number of the subtree with that node as its root, as de ned in de nition 2.3. All frontier nodes (E, F, I, L, M, N and P in gure 2.1) have proof number 1. This follows from the fact that only the node itself needs to obtain the value true to prove the whole subtree (consisting of only the node itself). A terminal node with value true (node K in gure 2.1) has proof number 0, since its value has already been proved. Terminal nodes with value false (node O in gure 2.1), have proof number 1, since there is no smallest nite set of nodes which can undo the fact that the node is disproved. Internal and nodes obtain the value true only if all their children are proved. Thus, internal and nodes (B, D, G, H and J in gure 2.1) have proof numbers equal to the sum of the proof numbers of their children. For internal or nodes it su ces to prove one of their children, in order to have the parent obtain the value true. Thus, for internal or nodes (A and C in gure 2.1) we establish the proof number by taking the minimum of the proof numbers of their children. Root A of the tree in gure 2.1 has proof number 1. This indicates that somewhere in the tree a frontier node exists, which, by obtaining the value true, would complete the proof of the tree. The path from the root to this frontier node can be found by examining the proof numbers. To prove the 2.2 Pn-search: the algorithm A 3 21 B 1 C 2 D 0 E 1 F 1 G 1 H 1 I 1 J 0 K true O O L 1 M 1 N 1 false O 0 P 1 Figure 2.2: and/or tree with disproof numbers. root (an or node), it is su cient to prove one of its children. Child C has the smallest proof number among the three children of A. The frontier node we are looking for thus lies within subtree C. In the same way, node G is preferred over node H, since G's proof number is equal to 1, while H's proof number equals 2. To prove node G (an and node), it is necessary to prove all its children. Child K has already been proved, thus only a proof of node L is needed, which is the frontier node we have been looking for. We could now proceed and develop node L, in an attempt to prove the tree. Instead, we will rst determine which nodes may contribute to a potential disproof. In gure 2.2 we have depicted the same tree as in gure 2.1. With each node, we have associated the disproof number of the subtree with that node as root, as de ned in de nition 2.4. The disproof numbers behave analogously to proof numbers, interchanging the roles of and nodes and or nodes, and the cardinalities 0 and 1. Thus, frontier nodes (E, F, I, L, M, N and P in gure 2.2) have disproof number 1. A terminal node with value false (node O in gure 2.2) has disproof number 0, since it is already disproved. Terminal nodes with value true (node K in gure 2.2) have disproof number 1. Internal and nodes (B, D, G, H and J in gure 2.2) have disproof numbers equal to the minimum of the disproof numbers of their children. Internal or nodes (A and C in gure 2.2) have disproof numbers equal to the sum of the disproof numbers of their children. Root A of the tree in gure 2.2 has disproof number 3. This means that 22 Chapter 2. Proof-Number Search at least 3 nodes must obtain the value false to disprove the tree. Analysis of the tree shows that it involves one of the nodes E and F, node L and one of the nodes M and N. Summary The previous paragraphs illustrate that proof numbers and disproof numbers can be used to nd nodes within the smallest subset of frontier nodes in the tree which, by all obtaining the same value, solve the tree. From the assumptions underlying pn-search it follows that the probability that all nodes in a proof set obtain the value true increases with decreasing cardinality of the proof set (except in the trivial cases that the probability of evaluation to true equals either 0 or 1). As a result the total number of node developments needed to solve a tree is (on the average) reduced by rst focusing on potential solutions involving a small number of nodes (i.e. subtrees with small proof and/or disproof numbers), before trying to nd solutions known to require a larger number of nodes. This expectation is the basis for the pn-search algorithm as described in the following sections. Pn-search continuously tries to solve the tree by focusing on the potentially shortest solution, i.e., consisting of the least number of nodes. At each step of the search, a node which is part of the potentially shortest solution available is selected and developed. After the development of a node, its proof number and disproof number are established anew. Then, the proof and disproof numbers of its ancestors are updated. This process of selection, development and ancestor updating is repeated until either the tree is solved or we have run out of resources (time or working memory). The main issue yet to be resolved is to decide (1) to select a node in the smallest proof set, or (2) to select a node in the smallest disproof set. We will show in the following paragraphs that, maybe surprisingly, we can always do both at the same time. This results in the de nition of a most-proving node as in de nition 2.5. De nition 2.5 For any and/or tree T, a most-proving node of T is a frontier node of T, which by obtaining the value true reduces T's proof number by 1, while by obtaining the value false reduces T's disproof number by 1. De nition 2.5 assumes that within each unsolved tree T a frontier node exists, which is an element of the intersection of a smallest proof set and of a 2.2.3 Informal description of pn-search 2.2 Pn-search: the algorithm 23 smallest disproof set of T. A stronger claim is that each pair consisting of a smallest proof set and a smallest disproof set has a non-empty intersection. We prove this stronger claim by induction. Proof Basis For each frontier node J the singleton set containing J is both the only proof set, and the only disproof set. The intersection of these two sets contains node J and thus is not empty. Induction step Suppose that the assumption has been proved for all children J1, .., Jn of an internal and node J. To disprove J, only one child needs to be disproved. Let disp(Jx) be any disproof set of Jx which has minimal cardinality among all disproof sets of children of J. Then disp(Jx) is also a minimal disproof set of J. To prove J, all children must be proved. Let prove(Ji) (1 i nS be arbitrary minimal proof sets for each ) of the children Ji . Then n=1 prove(Ji) is a minimal proof set of J, i which we name prove(J). Thus disp(Jx) is a minimal disproof set of J, and prove(Jx) is contained in a minimal proof set of J. As disp(Jx) and prove(Jx) are minimal disproof and proof sets of Jx, they have a non-empty intersection according to the induction assumption. Thus disp(J) and prove(J) have a non-empty intersection. The proof for internal or nodes proceeds analogously. We conclude that there is no con ict of strategies between trying to prove or to disprove the tree: by repeatedly selecting a most-proving node, both strategies are executed simultaneously, without one strategy delaying the other. How to select the most-proving node using proof and disproof numbers is illustrated with an example tree. Below each node of the tree depicted in gure 2.3, we have depicted its proof number and disproof number (in that order). Thus, the least number of nodes which must be developed to prove the tree is 3. The same number of nodes is needed to disprove the tree. First, let us analyze the e ort necessary to disprove the tree. As node A is an or node, it will only obtain value false if both children obtain value false. In other words, both children must be solved (with value false) to disprove the tree. Thus, in both subtrees frontier nodes exist which are part of the 2 24 Chapter 2. Proof-Number Search A 3,3 B 3,2 C 4,1 D 1,3 E 2,2 F 1,1 G 1,1 H 1,1 I 1,1 J 1,1 K 1,1 L 1,1 M 3,1 N 2,1 O 1,1 P 1,1 Q 1,1 R 1,1 S 1,1 Figure 2.3: and/or tree with most-proving node R. smallest disproof set of A. Second, let us look at the least number of node developments needed to prove the tree. For an or node it is su cient to have one child with value true to prove the or node. In other words, only one child needs to be solved (with value true) to prove the tree. The proof number of child B (3) is one less than the proof number of child C (4). Thus, all frontier nodes of a smallest proof set lie within subtree B. We conclude that all most-proving nodes lie within subtree B. With respect to subtree B an analogous analysis applies. However, since node B is an and node, the roles of proof number and disproof number are interchanged. Thus, to prove B, both its children must be proved. Therefore, in both subtrees D and E, frontier nodes exist which are part of the smallest proof set of B. To disprove B, it is su cient to disprove one child. Node E has disproof number 2, one less than disproof number 3 of node D. Thus, all frontier nodes of a smallest disproof set lie within subtree E. We conclude that all most-proving nodes lie within subtree E. The selection within or node E is based on the disproof numbers, as it was for node A, and thus subtree N is selected. Within and node N no preference exists on the basis of the disproof numbers and both R and S 2.2 Pn-search: the algorithm 25 procedure ProofNumberSearch(root) Evaluate(root) SetProofAndDisproofNumbers(root) while root.proof 6= 0 and root.disproof 6= 0 and ResourcesAvailable() do mostProvingNode := SelectMostProving(root) DevelopNode(mostProvingNode) UpdateAncestors(mostProvingNode) od if root.proof = 0 then root.value := true elseif root.disproof = 0 then root.value := false else root.value := unknown Table 2.1: Pn-search algorithm. end are most-proving nodes according to de nition 2.5. In such cases we will, somewhat arbitrarily, always select the leftmost child. Thus, R is selected to be developed. To summarize, the selection of a most-proving node is based on proof numbers among the children of or nodes and on disproof numbers among the children of and nodes. 2.2.4 The pn-search algorithm In this section the algorithmic details of pn-search are presented in pseudocode, except for three domain-speci c procedures and functions. In each of these three cases, the code for the implementation depends on the domain of investigation. The goal of each of these three, however, is domainindependent and has been speci ed below. 1. Evaluate(node). Assigns to node.value one of the values true, false and unknown. 2. GenerateAllChildren(node). Assigns to node.numberOfChildren the number of children of the node, and to node.children 1..node.numberOfChildren] (pointers to) the children themselves. 26 Chapter 2. Proof-Number Search function SelectMostProving(node) while node.expanded do case node.type of or : i := 1 while node.children i].proof 6= node.proof do i := i+1 od and : od i := 1 while node.children i].disproof 6= node.disproof do i := i+1 od return node end node := node.children i] esac Table 2.2: Most-proving node selection algorithm. 3. ResourcesAvailable(). Returns a Boolean value which indicates whether su cient resources are available to continue the search. This function will typically test the availability of working memory. The algorithm of table 2.1 encodes the main loop of pn-search. The root of the tree is created and evaluated. Then, at each iteration, a most-proving node is selected and developed, followed by updating the proof and disproof numbers of the most-proving node and its ancestors. The algorithm terminates when the tree is solved, or the program runs out of resources. We remark that there is a choice between implementing immediate evaluation and delayed evaluation. The main di erence between these two methods is the amount of information available within trees of the same size: with immediate evaluation, all nodes in the tree have been evaluated, while with delayed evaluation the frontier nodes have not been evaluated. Due to the extra information, under the same working-memory limitations, immediate evaluation is more often able to solve a tree than delayed evaluation. In rare circumstances, however, delayed evaluation may be preferable. Examples of these circumstances include trees with a large 2.2 Pn-search: the algorithm 27 procedure SetProofAndDisproofNumbers(node) if node.expanded then case node.type of and : esac elseif node.evaluated then case node.value of false : node.proof := 1 node.disproof := 0 true : node.proof := 0 node.disproof := 1 esac else node.proof := 1 node.disproof := 1 end node.proof := N 2Children(node) N.proof node.disproof := MinN 2Children(node) N.disproof or : node.proof := MinN 2Children(node) N.proof node.disproof := N 2Children(node) N.disproof unknown : node.proof := 1 node.disproof := 1 Table 2.3: Proof and disproof numbers calculation algorithm. variance in the branching factor, and slow evaluation. We have chosen to implement immediate evaluation, as it is used in all our applications of pnsearch to games. Thus, all frontier nodes in the tree have been evaluated. The algorithm of table 2.2 encodes the selection of a most-proving node, in accordance with the description in section 2.2.3. Thus, at or nodes the child with lowest proof number is selected, while at and nodes the child with lowest disproof number is selected. In case of a tie between children, the leftmost child is selected. Selecting the child with minimal proof number (in an or node) or disproof number (in an and node) is equivalent to selecting a child with proof number or disproof number equal to its father's. We remark that in most applications children will not be ordered by their proof or disproof number, as the cost of updating the ordering may be prohibitive. If the children are unordered, selecting the leftmost child with equal proof or disproof number on the average reduces the selection time of the most-proving node by at least a factor two, compared with determining the minimum over all children. A detailed discussion of enhancements to the algorithm can be 28 Chapter 2. Proof-Number Search procedure DevelopNode(node) od end GenerateAllChildren(node) for i := 1 to node.numberOfChildren do Evaluate(node.children i]) SetProofAndDisproofNumbers(node.children i]) Table 2.4: Node-development algorithm. procedure UpdateAncestors(node) while node 6= nil do od end SetProofAndDisproofNumbers(node) node := node.parent Table 2.5: Ancestor-updating algorithm. found in section 2.3. The algorithm of table 2.3 encodes the calculation of proof and disproof numbers for a given node. It is a direct translation into pseudo-code of the case-by-case observations made in section 2.2.2. We remark that " " in the algorithm indicates that the sum is calculated over all children, while "Min" indicates that the minimum over all children is calculated. The algorithm of table 2.4 encodes the development of a node. As stated before, we have implemented immediate evaluation. The algorithm of table 2.5 updates the proof and disproof numbers of the most-proving node and its ancestors. This is necessary to ensure that all nodes in the tree correctly re ect the new situation after the development of the most-proving node. Starting from the most-proving node, the tree is traversed in the direction of the root, updating the proof and disproof numbers of each ancestor. After the proof and disproof numbers of the root have been updated, the algorithm is terminated (indicated by the fact that the root has no parent). This concludes our formal description of pn-search. 2.3 Enhancements 29 2.3 Enhancements In the previous section we have presented the pn-search algorithm. Several enhancements exist. Some of these should be applied in most practical circumstances, since the added performance outweighs the additional implementation e ort. The advantage associated with the other enhancements depends on the domain of application. In section 2.3.1 we focus on enhancements reducing the amount of working memory needed to execute a search. Section 2.3.2 deals with reducing the execution time necessary to select the most-proving node and to update the proof and disproof numbers of the ancestors. The role of domain-speci c knowledge when enhancing the algorithm is examined in section 2.3.3. Finally, transpositions are discussed in section 2.3.4. Pn-search has working-memory requirements linear in the size (number of nodes) of the search tree. Depth- rst search algorithms, such as - search, only require working memory linear in the depth of the search. As a result, working memory is a possible bottleneck when applying pn-search. We discuss two techniques to reduce memory requirements. The rst technique is concerned with the removal of solved subtrees, while the second technique performs pn-search at two levels. 2.3.1 Reducing memory requirements Deleting solved subtrees A node in a pn-search tree may in uence the search process in two ways: 1. it is (on the path to) the most-proving node 2. its proof and disproof numbers in uence the proof and disproof numbers of its parent. Below, we show that solved subtrees do not in uence the search process in either way, except that they may solve their parent immediately after they were solved themselves. First, we show that a solved node is never on the path to the mostproving node. As long as the search is in progress the root is not solved. We thus start the selection of the most-proving node from an unsolved node. All unsolved nodes have nite proof and disproof numbers unequal to zero. Since at each step of the selection, a child is chosen with a proof or disproof number 30 Chapter 2. Proof-Number Search equal to that of its parent, each subsequent node must also be unsolved. We conclude that a solved node cannot be on the path to the most-proving node. Second, we show that the proof and disproof numbers of a solved node either solves its parent, or does not in uence its parent's values. A solved node with value true has proof number 0 and disproof number in nity. A parent or node is solved by this child, and immediately obtains the value true. A parent and node sums its children's proof numbers, to which the 0 does not contribute, while it minimizes its children's disproof numbers, to which in nity does not contribute. Only if this child were the last unsolved child is the and node solved and obtains the value true. To a solved child with value false an analogous reasoning applies, with false and true, proof number and disproof number, and and node and or node interchanged. We conclude that a solved subtree, once its parent has been updated, no longer in uences the search, and thus may be removed. An e cient way to implement this enhancement in the SetProofAndDisproofNumbers() algorithm of table 2.3 is by deleting solved children when calculating the sum and minimum of the childrens' proof and disproof numbers. For a discussion of the expected gain of this technique, we refer to section 2.4. Pn2-search As a second technique to reduce memory requirements, we present a short description of a recent, so far unpublished, development in pn-search, named pn2-search. The algorithm has been developed in collaboration with Stef Keetman. Pn2-search consists of two levels of pn-search. The rst level consists of a pn-search (pn1), which calls as evaluation of any node J a pn-search at the second level (pn2 ), with a bound N on the maximum tree size. In pn2-search N is chosen to be the current size of the pn1 search tree. The second level of pn-search is a standard pn-search, with a normal (either standard or domainspeci c) evaluation. The result of pn2 on node J is the value true or false in case pn2 solved J , or the proof and disproof numbers of J , if J has not been solved. In the latter case, the proof and disproof numbers are used to initialize J in pn1 . After termination of pn2, its tree is removed from memory. We remark that several enhancements to pn2 -search have been suggested to reduce the overhead associated with recreating deleted parts of the tree. One example of such an enhancement involves storing the M last pn2 trees in a cache, instead of deleting them, as suggested by Schae er (1994). The gain 2.3 Enhancements 31 achieved by such enhancements is a topic of future research. Pn2-search has the following properties. 1. A search resulting in a pn1 tree of size N has searched approximately 1 2 2 N nodes. 2. The memory requirements during the creation of a pn1 tree of size N are approximately 2N nodes. 3. Implementing pn2 -search requires only minor changes to an implementation of standard pn-search It has been established that the memory requirements of pn2-search are on the order of the square root of the number of nodes investigated. Comparisons on awari and draughts have shown experimentally that pn2 -search investigates on the average three times as many nodes as standard pn-search to solve the same problem. This factor of three is independent of problem size within the range investigated. Given the approximate constancy of this factor, it follows that in cases where pn-search is bounded by trees of 106 nodes, pn2 -search, with the same resources of memory may usefully investigate 1012 nodes. This conclusion can be extrapolated to even larger problems only when the factor of three suggested by the experiments remains constant. Whether it does and whether the extrapolation therefore remains valid, is a topic for future research. 2.3.2 Reducing execution time The main di erence in execution time between a best- rst search algorithm, such as pn-search, and a depth- rst search algorithm, such as - search, is the number of node traversals necessary to select the most-proving node. The overhead speci c to pn-search is the calculation of proof and disproof numbers at internal nodes, being linear in the number of node traversals. The enhancement presented in this section reduces the number of node traversals per selection of the most-proving node. We remark that the same enhancement can be and has been applied to conspiracy-number search (Klingbeil, 1989). At each iteration of pn-search we traverse the tree starting at the root and ending at the most-proving node. After developing the most-proving node, we follow the same path backwards until we are at the root. The basis of the enhancement consists of two observations. 32 Chapter 2. Proof-Number Search If the proof and disproof numbers of an ancestor do not change, the updating process can be terminated. If a node J is on the path from the root to the most-proving node, and J's proof and disproof numbers are not changed by the updating process, J also lies on the path from the root to the next most-proving node. From these two observations it follows that at each iteration a node exists where we can terminate the updating process, and start the next mostproving node selection. Such a node is called the current node, which is de ned as follows. De nition 2.6 For any and/or tree T, at any time during the execution of pn-search, the current node of T is de ned as the ancestor of the previous most-proving node J, closest to J, which had no changes to its proof and disproof numbers caused by the development of J. Initially, the current node equals the root. Enhancing the pn-search algorithm to use the notion of current node changes the algorithms for ProofNumberSearch and UpdateAncestors. The new algorithms are shown in the tables 2.6 and 2.7. The current-node enhancement reduces the number of node traversals per iteration from linear in the depth of the search tree to close to constant and should therefore be included in most practical implementations of pn-search. We remark that at the cost of storing a most-proving node per subtree, the selection process can be changed into an instant most-proving node selection. Then, the most-proving nodes of the subtrees are updated during the updating of the proof and disproof numbers within the tree. Since the working memory is the main bottleneck in most applications, we feel that small gains in terms of processing speed do not warrant the extra space requirements. 2.3.3 Applying domain-speci c knowledge Two assumptions underly the formulation of the pn-search algorithm. First, the probability distribution of expected values of frontier nodes is equal throughout the tree. Second, the distribution of probabilities over the three evaluation values (true, false, unknown) is unknown. These two assumptions describe a situation in which no domain-speci c knowledge can be applied to guide the search through the tree. In many practical domains, however, at 2.3 Enhancements 33 procedure ProofNumberSearch(root) Evaluate(root) SetProofAndDisproofNumbers(root) currentNode := root while root.proof 6= 0 and root.disproof 6= 0 and ResourcesAvailable() do mostProvingNode := SelectMostProving(currentNode) ExpandNode(mostProvingNode) currentNode := UpdateAncestors(mostProvingNode) od if root.proof = 0 then root.value := true elseif root.disproof = 0 then root.value := false else root.value := unknown end Table 2.6: Pn-search algorithm (with current node). least some knowledge is available. In this section we show how such knowledge can be applied to pn-search by altering the initialization of the proof and/or disproof numbers of frontier nodes. We can view proof and disproof numbers as lower bounds on the e ort necessary to solve a tree. So far, the e ort has been measured in node developments. We consider three methods to use alternative measures of e ort. First, we use the number of node evaluations as a measure of e ort. Second, a domain-speci c measure of e ort is applied. Third, a function of the tree depth is used to in uence the shape of the tree searched. Each method is illustrated using a particular game, being give-away chess, awari and go-moku, respectively. Finally, we review the three methods applied. Evaluations as a measure of e ort A node development, when using immediate evaluation, consists of expanding the node and evaluating each of its children. Thus, the amount of e ort involved in a node development depends on the number of children. We will use as J 's proof number the least number of node evaluations necessary to prove node J and as its disproof number the least number of node evaluations necessary to disprove J . Let us assume that J will have n children when 34 Chapter 2. Proof-Number Search function UpdateAncestors(node) od return previousNode end changed := true while node 6= nil and changed do oldProof := node.proof oldDisproof := node.disproof SetProofAndDisproofNumbers(node) changed := (oldProof 6= node.proof) or (oldDisproof 6= node.disproof) previousNode := node node := node.parent Table 2.7: Ancestor updating algorithm (enhanced). expanded. J 's proof and disproof numbers can be initialized using that knowledge, even before J is expanded. If J is an or node, only one child needs to evaluate to true to prove J , thus J 's proof number equals 1. To disprove J , all n children must evaluate to false. J 's disproof number is therefore initialized to n. For an and node, the proof number is initialized to n, while the disproof number becomes 1. The advantage of using the number of evaluations as a measure of e ort is that a distinction between frontier nodes can be made which is not present in standard pn-search. It allows pn-search to focus on frontier nodes with fewer children before developing frontier nodes with more children. It is expected that in this way pn-search will nd solutions more quickly. Below we present results from applying this method to give-away chess. Give-away chess is a variant of chess where a player wins as soon as she cannot make a legal move (i.e., she has no pieces left or her remaining pieces are blocked). The pieces move as in chess, with two exceptions: 1. the king has no special status and can be captured like any other piece 2. a player is forced to make a capture move if she can (like in checkers and draughts). Castling and en-passant capturing are extremely rare in give-away chess. To simplify our implementation task, we have omitted these types of moves, thus 2.3 Enhancements 35 rendering them illegal. In collaboration with Barney Pell we created the giveaway chess program Prove-away, solely based on pn-search. A node evaluates to true, if white is to move and has no legal moves, while it evaluates to false if black is to move and has no legal moves. All other nodes evaluate to unknown. Pn-search was implemented in two variants, one variant using the standard initialization, and the other one using node evaluations as measures of e ort. To enable Prove-away to play games against opponents, it selects its moves by performing pn-search with a predetermined bound on the number of nodes to be created. If the tree is not solved within that limit, the 1-ply nodes are inspected and the move leading to a node with the minimal ratio of proof and disproof numbers is selected. If the tree is proved within the limit, the move proving the tree is selected, ensuring a win for Prove-away. If the tree is disproved, the 1-ply node with the largest subtree is selected, speculating on the opponent not seeing her winning line. Although we have no clear indication of the strength of Prove-away, it has beaten its human opponents in all but three of its games (out of several dozen). Most games are decided by Prove-away nding a winning line in which the opponent is forced at each move to capture one of the program's pieces, until the program runs out of moves and wins. The maximum depth of such lines in give-away chess is 32 ply (16 moves by the program and 16 captures by the opponent). We conducted an experiment to compare the two variants of pn-search described above. During the experiment, Prove-away plays random games against itself. At each move in the game, both variants of pn-search (standard initialization and using evaluations as measure of e ort) create a tree, with the current game position as root. As soon as one or both variants solve the tree, the game is terminated. If in a position neither variant solves the tree within 25,000 nodes, Prove-away plays a random legal move to continue the game. A total of 30 games were played, which lasted on the average 5.6 ply (i.e., a little less than three moves by white and three moves by black). Three games where duplicates of other games, due to the fact that the program quickly proved that black wins after the opening moves 1. d2-d4, 1. d2-d3 or 1. e2-e4, and each of these moves was selected twice as opening move during the 30 games. In the following we disregard the three duplicate games. The conditions of the experiment ensure that the nal position of each random game has been proved a win for one of the players by at least one of the pn-search variants. In some games, both variants proved the win, while in others only the pn-search variant with the number of evaluations as the measure of e ort succeeded. In none of the games did only standard 36 Chapter 2. Proof-Number Search standard initialization by improvement initialization no. of moves factor developments 5928 2661 2.2 nodes visited 62323 7838 8.0 branching factor 10.5 2.9 3.6 max tree size 48935 5988 8.2 nodes per sec. 169 132 0.8 Table 2.8: Give-away chess results. pn-search solve the tree of the nal position. To compare the performances of both algorithms, we reran the standard algorithm with unlimited working memory on the positions where that variant had not found the win within the limit of 25,000 nodes. The results of the experiment are presented in table 2.8. Measured in number of node developments, the enhanced algorithm (using evaluations as measure of e ort) gains a factor of a little over 2, while in number of nodes the improvement factor is almost 8. These numbers indicate that the enhanced algorithm develops nodes with, on average, a 4 times smaller branching factor (2.9 vs. 10.5). This clearly indicates that the selection of most-proving nodes is strongly in uenced by the non-standard initialization. The average amount of working memory necessary to complete the search is speci ed in table 2.8 as the maximal tree in memory per search. It is directly related to the total size of the tree created, resulting in an improvement by a factor 8. The extra time spent on counting the number of moves per terminal node slows the algorithm down approximately 20% per node, compared to the standard initialization. Thus, the overall gain in cpu time amounts to a factor of more than 6. We conclude that using the number of node evaluations as a measure of e ort to initialize the proof and disproof numbers may yield a signi cant reduction in node evaluations, node developments and cpu time. Domain-speci c measures of e ort In many domains, domain-speci c properties exist which give an indication of the amount of e ort involved in solving a position (i.e., in solving the and/or tree with the position as root). 2.3 Enhancements 37 For instance, in othello solving a position with only a few empty squares is easier than solving a position with more empty squares. In draughts, it is simpler to solve a position if both players have only four men than if both players have ten men. In these cases, we could select as domain-speci c measures of e ort the number of moves to the end of the game (othello) or the number of men of the opponent to be captured (draughts and checkers). We illustrate the idea on the game awari. In the initial awari position, there are 48 stones on the board. Both players move stones around, with the goal of capturing stones. The goal of awari is to capture more stones than your opponent. It follows that a player who has captured at least 25 out of the total of 48 stones, wins the game (for a de nition of the rules of awari, see section 2.4.2). We use the number of stones a player needs to capture as the measure of e ort. Let us assume that we would like to determine whether north can win, or whether south can obtain at least a draw. Let us furthermore assume that south has so far captured 11 stones, while north has collected 8 stones. We build the tree from the perspective of south, thus proving the tree means showing that south can reach at least a draw. In the given position, south must capture at least another 13 stones to reach a draw, while north needs another 17 stones to obtain the 25 stones necessary for a win. These values, 13 and 17, are then used as proof and disproof numbers of the position. In section 2.4.7 we present test results of applying pn-search to awari for both the standard initialization and the stone-based initialization as suggested here. Depth-related measures of e ort By inspecting trees created by pn-search, we have found some occasions in which the shape of the tree indicated that much e ort was spent on variations which were less likely to succeed quickly than some others. For instance, in mating problems in chess, where the weaker side was restricted to moving one piece between two squares, most variations had proof number one. As a result, variations where the attacker moved a single piece aimlessly over the board were searched very deeply. On one occasion, this resulted in a mate in 114 moves being found, while a mate in 4 moves existed. Instead of putting a hard limit on the depth of the search, examining deep variations can be somewhat discouraged by initializing the proof and disproof numbers of a node using a function of the depth of the node. By assigning higher proof and disproof numbers to nodes deeper in the 38 Chapter 2. Proof-Number Search tree, it is expected that pn-search will create a somewhat shallower and broader tree. Analogously, by assigning smaller proof and disproof numbers to nodes deeper in the tree, pn-search is expected to create deeper and narrower trees. Inspection of trees created by pn-search with such alternative proof-and-disproof-numbers initializations shows that the average node depth is indeed in uenced in accordance with these expectations. Experiments on go-moku (see chapter 5), with each node's proof and disproof numbers initialized to the depth of the node measured in full moves, show that a somewhat broader, shallower tree is created, without losing pnsearch's ability to nd narrow, deep variations leading to a win. Comparisons on go-moku showed that this initialization was an improvement over the standard initialization. The depth-related initialization was used in the search which led to solving go-moku. Despite this example, we do not have much ground for the assumption that such an initialization is an enhancement to pn-search for domains with behavior similar to go-moku. Furthermore, the evaluation function we developed for go-moku also in uenced the success of the non-standard initialization. Although a linear function in the depth of the node worked well in go-moku, more complicated functions may be necessary for other domains. The strongest conclusion we are prepared to draw is that by using a function of the depth of the node, the shape of the tree can be somewhat in uenced (either made broader and shallower, or narrower and deeper). Reviewing the application of domain-speci c knowledge We have presented three ways in which domain-speci c knowledge can be used to change the initialization of the proof and disproof numbers at frontier nodes. Although each of the three methods has been successful in improving the performance in a practical domain, some caution is in order, particularly with the second and third methods. While the use of non-standard proof-anddisproof-numbers initializations may seem useful to guide the search process, the underlying principles of pn-search are violated. Two examples of violated principles are: (1) the assumption that all frontier nodes are indistinguishable and (2) the assumption that the proof and disproof numbers are lower bounds on the e ort required to solve the tree. The positive in uence of di erent initializations may at the same time result in negative e ects. We have found that for some domains, such as othello, it is necessary to perform a large number of experiments to ne-tune the initialization process, akin to the process of ne-tuning evaluation functions in game-playing programs 2.3 Enhancements 39 (Gnodde, 1993). We conclude that as yet we lack a proper understanding of the precise e ects associated with knowledge-driven proof-and-disproofnumbers initializations. The de nition of pn-search depends on the graph searched being a tree. When determining the proof and disproof numbers of an internal node J , the cardinality of the smallest proof set and disproof set must be determined. In a tree, the subtrees rooted at the children of J are disjoint, ensuring that the cardinality of the smallest proof set and disproof set of J can be calculated from the cardinality of the smallest proof sets and disproof sets of the children. In many domains, however, the same subtree may be encountered several times during the search, at di erent places in the tree. The standard pnsearch algorithm will in such cases obtain an upper bound on the cardinality of the smallest proof and disproof sets, instead of the true proof and disproof numbers. Problems and solutions related to the problem of the common subtree (transpositions) in combination with pn-search have been investigated by Schijf (1993) and Schijf et al. (1994). In the following, we shortly describe problems and practical solutions for transpositions in pn-search. We distinguish between directed acyclic graphs, abbreviated as dags and directed cyclic graphs, abbreviated as dcgs. We remark that practical techniques for handling transpositions in game-playing programs using - search have been extensively described in the literature (Greenblatt et al., 1967). 2.3.4 Transpositions Transpositions in DAGs Transpositions resulting in dags necessarily occur in games where each move is a conversion, i.e. an irreversible alteration of the state of the game. In chess, captures and pawn moves are examples of conversions, while noncapture moves by a piece (except for castling, and castling-forbidding moves) are non-conversions. In connect-four, qubic and go-moku, each move is a conversion, as in all three games the number of stones on the board strictly increases. As stated above, in a dag, addition of proof numbers or disproof numbers possibly overestimates the cardinality of the minimal set of nodes needed to solve the tree. Theoretically correct algorithms exist to establish the correct proof and disproof numbers at each node, but these are slow or use 40 Chapter 2. Proof-Number Search A 2,2 B 2,1 C 2,1 D 1,1 E 1,1 F 1,1 G 1,1 Figure 2.4: and/or dag with practical solution. an inordinate amount of working memory, or both, thus barring practical application (Schijf, 1993). A practical solution to this problem is to treat the dag as if it were a tree, thus calculating (incorrect) proof and disproof numbers of a node directly from its children. The main di erence in the algorithm is that while updating ancestors, all parents of a node must be updated recursively. In gure 2.4, a dag is depicted where proof and disproof numbers are calculated directly from their children. It can easily be shown that if node G is solved, root A obtains the same value as G. Thus, the proof and disproof numbers of A should equal 1. Furthermore, G should be the most-proving node. Thus, both numbers in the root are too high, and the selection mechanism incorrectly selects node D as the most-proving node. This example clearly indicates that the practical solution is no longer in accordance with the de nitions of section 2.2.2. Still, our experience with connect-four, qubic and go-moku shows that this practical pn-search algorithm for dags has advantages similar to those of standard pn-search. Transpositions in DCGs Transpositions resulting in dcgs appear in games where a series of nonconversion moves leads to a position which has occurred before. Special rules govern the continuation of games after repetitions, leading by complex regulations to game-speci c outcomes. There is fascination in the diversity 2.3 Enhancements A 41 B C D E 1,1 F 1,1 Figure 2.5: Cyclic and/or graph. of these rules: in Chinese chess, some repetitions are illegal by the operation of complex rules in go, any repetition is outlawed by the ko rule in chess, nally, a repeated position can give rise to a claim of a draw from its third occurrence onwards. Figure 2.5 depicts a dcg in gure 2.6 we have converted that graph into a tree. Each path in the tree terminates at a frontier node of the graph, or at a repetition of positions in the path. In this example we assume that a repetition evaluates to false. The tree contains three duplicates of node D. Among these three, two have the value false, while one has proof number 2 and disproof number 1. The fact that the same node may have di erent proof and disproof numbers depending on the path it lies on forms the basis of the complexity of performing pn-search on dcgs. Node C has properties similar to node D. Moreover, we note that A's proof number (2) is less than the sum of the proof numbers of its children, as subtrees B and C have node E in common. The proof number at the root indicates that to prove the tree, both E and F must be proved. The disproof number 1 of A indicates that disproving either E or F disproves the tree. The dependence of the proof and disproof numbers of a node on the path to that node forms the basis of the di culties of cyclic transpositions. In Schijf (1993), a theoretically correct algorithm for pn-search on dcgs is described. Unfortunately, its time and working-memory requirements are too costly to warrant practical application. 42 Chapter 2. Proof-Number Search A 2,1 B 2,1 C 1,1 D 2,1 D 00,0 E 1,1 C 1,1 F 1,1 false C F 00,0 1,1 D false E 00,0 1,1 Figure 2.6: Tree version of the graph of gure 2.5. Practical methods to apply pn-search to dcgs also exist. First, the practical algorithm for dags may be applied with one modi cation: only positions created after a conversion move are eligible to have more than one parent. As a result, some transpositions are investigated only once, while for others duplicates are created in the graph. Second, for each set of equivalent positions, at most two nodes are created: one for all paths in which the node occurs for the rst time, and the second node when the node is its own ancestor. The second node is initialized to the value associated with a repetition of positions in the game under investigation. In this case, if a node is its own ancestor through at least one path, the repetition of positions is used to update the ancestors on all paths leading to the node, including those in which the node is not a repetition. Therefore, the search may incorrectly deduce that a node must have the value of a repetition of positions. Thus, if the value of the root is proved to equal the value assigned to repetitions of positions, the proof is not fully reliable. If the opposite value is proved, however, the proof is bound to be correct. For a detailed description of these two practical algorithms for pn-search in dcgs, we refer to Schijf (1993). We believe that pn-search on directed cyclic graphs requires further investigation. 2.4 Results 43 2.4 Results 2.4.1 Introduction In this section we compare pn-search's performance with that of a sophisticated implementation of - search, by far the most commonly applied gametree search algorithm in tournament programs for strategic games. As a test domain, we have selected the game of awari, one of the games on the Olympic List. We have chosen awari for two main reasons. First, awari search trees contain non-uniformity, which make them suitable for the application of pnsearch. Second, all strong tournament programs competing in the Computer Olympiads selected their moves using sophisticated implementations of search, establishing that awari search trees are suitable for application of search. It will be shown that, for the purpose of proving the game-theoretic value of a position in awari, pn-search outperforms - search by a wide margin. It proves that a category of search trees exists for which pn-search outperforms - . Further indications of pn-search's strengths can be found in chapters 4 and 5, where pn-search's contribution to solving qubic and go-moku is described. This section is organized as follows. First, we present the rules of awari. Second, we give a description of the strongest existing awari programs, which presents evidence that our implementation of - search is competitive with - search implementations of other authors. Third, we describe in detail the implementations of pn-search and - search and their performances are compared. Fourth, it is explained how the nodes visited by both algorithms are counted, which is important due to the di erent nature of the algorithms. Fifth, we describe the set of awari positions to which the algorithms were applied. Finally, we present and analyze the empirical data. 2.4.2 The rules of awari Awari is a two-player (south and north) zero-sum game with perfect information. It is one instance of a large family of games named mancala, of which some 1200 variants are known. The mancala games originate from Africa. Awari is mainly played in its western regions, such as Nigeria. For the game described here, the names wari or awele are also used (Deledicq and Popova, 1977). Awari is played on a wooden board containing two rows of six pits. Each player controls the row on her side of the board. South's pits (from left to 44 f 0 7 1 A 0 B e 0 Chapter 2. Proof-Number Search North d 1 4 C c 3 19 D b 1 4 E a 1 5 2 F South (to move) Figure 2.7: A position with legal moves A1, C 4 2, D19 7, E 4 and F 2 4. right, as seen by south) are named A through F, while north's pits (from left to right, as seen by north) are named a through f. At the right-hand side of each row, an auxiliary pit is used to contain a player's captured stones. At the start of the game each pit (except the auxiliary pits) contains four stones, for a total of 48 stones on the board. At each move, a player selects a non-empty pit X from her row. Starting with X's neighbor, she then sows all stones from X, one at the time, counterclockwise over the board (omitting the two auxiliary pits). If X contains su cient stones to go around the board (12 stones or more), pit X is skipped and sowing continues. Thus, after the move, X will always be empty. Finally, captured stones, if any, are removed and stored in the auxiliary pit. Stones are captured if the last stone sown lands in an enemy pit which after landing contains 2 or 3 stones. If such a capture is made, and the preceding pit contains 2 or 3 stones and the pit is an enemy pit, those stones are also captured. This procedure is successively repeated for the pits preceding and ends as soon as a pit is encountered containing a number of stones other than 2 or 3, or the end of the opposing row is reached. A move is described by the name of the pit, followed by the number of stones sown (the name of the pit by itself de nes the move, but such a notation is prone to error). The number of stones captured, if any, is indicated by the amount preceded by a " ". In gure 2.7 an example position is shown with south to move. Legal moves for south are: A1, C 4 2, D19 7, E 4 and F 2 4. The goal of awari is to capture more stones than the opponent. The game ends as soon as one of the players has collected 25 or more stones. Two other conditions exist which terminate the game. First, if a player is unable 2.4 Results North f 1 23 0 A 1 B 0 C 0 D 1 E 0 F e 0 d 0 c 0 b 0 a 0 22 45 South (to move) Figure 2.8: 1: B 1 f 1 wins. After 1: E 1? f 1 south must play 2: F 1: to move (i.e., all her pits are empty), the remaining stones are captured by her opponent. Second, if the same position is encountered for the third time, with the same player to move, the remaining stones on the board are evenly divided among the players. In all cases, after the end of the game, the winner is the player who captured the most stones. If both players capture 24 stones, the game is drawn. A last rule exists to prevent players from running out of moves early in the game. Whenever possible, a player is forced to choose a move such that her opponent is able to make a reply move. It is, however, not compulsory to look several moves ahead to ensure that the opponent will continue to be able to reply. For instance, gure 2.8 shows a position in which south by playing 1. B1 can deliberately create a position in which she is unable to o er north any stones on her next move. By doing so, south captures all three stones remaining on the board and wins the game. However, would she have played 1: E 1, then after 1: : : : f 1 she is forced to play 2: F 1, leaving the game for the moment undecided (although after 2: : : : a1 3: B 1 b1 4: C 1 c1 5: D1 d1 6: A1 e1 we are back at the initial position, giving south a second chance to play the winning move). Lithidion 2.4.3 Tournament programs In 1990 Maarten van der Meulen and the author constructed an awari-playing tournament program, named Lithidion (Greek for 'little stone'). Lithidion at the time consisted of an - search algorithm, and an endgame database containing the game-theoretic value of each awari position with 13 stones or 46 Chapter 2. Proof-Number Search fewer left on the board (Allis et al., 1991c). In 1991 Lithidion was enhanced with pn-search and a larger database (all positions with 17 stones or fewer). In 1992 Lithidion was further enhanced with an opening book. In describing Lithidion, we will concentrate on this last version of Lithidion. The basis for Lithidion is its - search algorithm. Any position not in the opening book or the endgame database is searched with iterative-deepening - search. The evaluation function for leaf nodes is trivial: at each leaf node it is assumed that the players divide the remaining stones evenly. If, in the search tree, a position is encountered having 17 stones or fewer on the board, its exact value is retrieved from the endgame database. Thus, the value of the - search is based on a combination of crude guesses for some leaf nodes, and exact values for others (Beal, 1984). We remark that in awari the di erence in the number of stones by which one wins is irrelevant. Therefore, the value retrieved from the endgame database is converted into ;1 for losses, 0 for draws, and 1 for wins. Once the game has progressed to a position contained in the endgame database, no search is needed, and at each turn the best move is played instantly. After a move has been selected by - search (typically based on an 18to-20 ply search), pn-search is called to check the move. If a proof can be found that the selected move loses, the move is rejected, - is asked to select a new move, and the procedure is repeated. If all moves are proved losses, the rst move selected is played, hoping for an error by the opponent. While the opponent is pondering on the position, Lithidion performs pn-searches on her potential moves looking for wins. In case the opponent selects a losing move, Lithidion uses the proof by pn-search to select its winning move. The pnsearch algorithm regards positions within the endgame database as terminal nodes, just as it treats positions where a player has no legal moves. All other positions are internal nodes. In summary: pn-search is only used to prevent Lithidion from playing losing moves and to detect winning lines after erroneous moves by the opponent. All other moves are based on - search. Opponents Lithidion has played in three tournaments: the awari tournaments of the 2nd, 3rd and 4th Computer Olympiads (London 1990, Maastricht 1991 and London 1992). Lithidion won the gold medal each time. The tournaments of the 2nd and 3rd Olympiads have been described in Levy and Beal (1991) and Van den Herik and Allis (1992). 2.4 Results 47 In 1990, Lithidion's only opponent, Marco, written by Remi Nierat, winner of the gold medal at the awari tournament of the 1st computer Olympiad, lost all its games. Marco is based on human-expert knowledge of awari, shallow - searches (averaging fewer than 10 ply) and no endgame databases. In most games, both Marco and Lithidion had prospects of winning, until Lithidion's endgame database was reached. At that point Marco made one or more erroneous moves, leaving Lithidion with an easy win. In 1990, the main deciding factor was the endgame database (at that time, all positions of 13 stones or fewer). In 1991, a new opponent appeared: MyProgram written by Eric van Riet Paap. MyProgram had been created using the published description of Lithidion (Allis et al., 1991c). It contained a large endgame database (all positions of 16 stones or fewer), a fast implementation of - search including the singular-extension enhancement (Anantharaman et al., 1989) and the same evaluation function as Lithidion (see above). Lithidion defeated MyProgram by the smallest possible margin, with three wins, two losses and one draw. In at least one of the games, pn-search played a decisive role, nding a deep winning line in a position unclear to - search. Given the small di erences between the programs (a 17-stone database versus a 16-stone database, pn-search versus singular extensions, and MyProgram searching twice as many nodes per second), it is unclear what the exact impact of pn-search on the match has been. In 1992, two new opponents appeared: Marvin and Juju. Juju turned out to be no competition for its two strong opponents and lost all its games. Marvin was created by Ralph Gasser with Lithidion as its example. The - search algorithms of Marvin and Lithidion performed almost equally well. Marvin's endgame database (20 stones), however, was much larger than Lithidion's (17 stones). A disadvantage to Marvin was that its database did not t in ram memory. Each entry retrieved from the hard disc slowed down the - search. Two further disadvantages to Marvin were its lack of a pnsearch implementation and of an opening book. As a later test indicated, the opening book was the decisive factor in this match, which Lithidion won by a score of 4-2. The test consisted of replaying the rst game from the position where Lithidion had exited its opening book, with Marvin and Lithidion changing places. Marvin easily won the game, similarly to the way Lithidion had won the game during the tournament. Clearly, the opening book had provided Lithidion with a winning advantage. 48 Chapter 2. Proof-Number Search Conclusion We have given a description of the architecture of Lithidion, the role of - search in it, and the competition it faced. From this description we conclude that Lithidion's - -search implementation has been thoroughly tested and has performed well in competition with strong opponents. We stress this point, since Lithidion's - search has been selected as the sparring partner for pn-search in our comparison tests on awari. Such a comparison is only valid if made against a sophisticated implementation, and we believe that practical evidence suggests that Lithidion's - search meets those requirements. For our experiments, we have compared two variants of - search, and two variants of pn-search. We will use the following abbreviations for the four algorithms: - iterative-deepening search without transposition tables. transposition - iterative-deepening search with transposition tables. basic pn pn-search with standard initialization. stones pn pn-search with initialization based on the number of stones to be captured. The - algorithm has the following characteristics. At each node, moves are pre-ordered by capture size. The largest captures are evaluated rst, since the resultant positions are most likely to hit the database. Another reason for processing captures rst is that they are often good moves. An iterativedeepening search is performed with a depth increase of 1 per iteration. The result of each iteration is a value and a move ordering of the full principal variation. The search terminates as soon as the value of the position has reached ;1 or +1, indicating that the value of the position has been determined. The transposition algorithm is the same as - , except that it is extended with a transposition table of a quarter of a million entries. The transposition table is implemented as a hash table, with one entry per hash code. At each node in the search tree, we rst examine whether the position 2.4.4 The algorithms compared 2.4 Results 49 is present in the transposition table. Then we investigate whether the depth to which it had previously been searched is at least as large as the current depth. If both conditions are met, the range of possible values stored in the entry is used to narrow the - window. If after updating exceeds or equals , the search returns to the node's parent. Otherwise, the search is continued with the narrowed window. After a node's value has been established, the results are stored in the transposition table. If the value of the node is equal to the initial or , we only know that the node's value is less or equal to , or greater or equal to , respectively. Only if the value lies between and proper, is the value reliable and can be stored as the true outcome of the search to the given depth. Values ;1 and 1 are treated separately, since these values are always indisputable. For those values, the searched depth is set to 1 as well, since deeper searches cannot change a reliable value, making the information applicable to each following iteration. Collisions in the hash table are resolved in favor of the position which has been searched most deeply. We remark that unlike tournament chess programs, we store a full Godel code per entry in the transposition table, ensuring that two di erent positions will not mistakenly be regarded as equal. The transposition table is expected to be useful in awari in the middle and end games, when empty pits and pits containing single stones are common. A con rmation of this assumption will transpire from the results of our experiments presented in section 2.4.7. Basic pn is the standard pn-search algorithm, enhanced with the technique which removes solved subtrees. Each frontier node is initialized to proof number 1 and disproof number 1. Stones pn is equal to basic pn, except for the initialization of frontier nodes. Instead of proof and disproof numbers being initialized to 1, the number of stones still to be captured by a player to achieve her goal is used as the initialization, as explained in section 2.3.3. We remark that neither variant of pn-search uses transposition tables. The - algorithm calculates approximately 10,000 nodes per second on a sun sparcstation 1+. The other three algorithms are roughly a factor two slower. For transposition, storing and retrieving information from the transposition tables is responsible for the slowed-down performance, while the pn-search variants have as extra overhead the creation and deletion of nodes, as well as the calculation of the proof and disproof numbers. 50 Chapter 2. Proof-Number Search 2.4.5 Comparing the performances When selecting a search algorithm for an application, the elapsed cpu time is an important selection criterion. However, experimental results on tree searches when measured in cpu time are di cult to generalize, due to implementation details. Instead, it is customary to compare the number of nodes visited. In this case, a careful analysis is needed to determine the fairest way to compare the number of nodes visited by - search and pn-search. Let us consider the number of nodes visited by - iterative-deepening search. On the one hand, we could sum the number of nodes visited in each iteration. However, this would be unfair to - search, since a smaller number of iterations (e.g., by searching to even ply depths only) may result in almost the same ordering and thus reducing the number of nodes visited. On the other hand, we could just take the number of nodes visited in the last iteration. That would be unfair towards pn-search, as the last iteration does use the move ordering of previous iterations, and these searches should be included in the total node count somehow. Moreover, - search with transposition tables obtains many early cut-o s during the last iteration due to the solved subtrees stored in the transposition table. Instead, we have chosen to count at iteration i only the nodes at depth i. Then the extra iterations are an asset to - search, without costing anything in terms of the number of nodes visited. Re-ordering of the moves may result in terminal nodes in a new iteration, which are not at the deepest level. These nodes are not counted at all. This slight bias in favor of iterative-deepening search does not signi cantly in uence the results. For pn-search, we simply count the total number of nodes created during the search. 2.4.6 Test positions As mentioned in section 2.4.3, Lithidion has taken part in three awari tournaments of Computer Olympiads. In total, she played 23 games (5 against Marco and 6 each against MyProgram, Juju and Marvin), of which two games were identical, which can be explained as follows. Each of the ve programs described in section 2.4.3 plays deterministically. Therefore, before the next game against the same opponent, a change should be made in the opening choice of the program to avoid losing in exactly the same way. Juju forgot to do so once, and lost two games in identical fashion. 2.4 Results 51 In the 22 di erent games a total of 1707 positions have occurred (from the initial position to the position after the last move had been played). Of these there were 1599 unique positions, which have been selected as the initial test positions. For each of the initial test positions, a search with all four algorithms was performed. Since an awari game has three possible outcomes: win, draw and loss, and pn-search is a two-valued search algorithm, the three outcomes must be divided into two sets. We arbitrarily chose to treat a draw as equivalent to a loss for the player to move. Each of the searches has one of three possible outcomes: The player to move has a proved win. The opponent has at least a draw. The search ran out of resources. Not all test positions can be used to compare the performance of the four algorithms. First, positions with 17 stones or fewer are solved immediately by all four algorithms through a single database lookup. Second, positions too early in the game are likely to be unsolvable by all four algorithms. Therefore, we have selected the relevant positions from the 1599 initial positions as follows. Each position has been investigated by all four algorithms with a resource limit of 500,000 nodes per position. If after 500,000 nodes the search had not succeeded, it was terminated. Using the outcome of the searches, the following selection was made. First, the 2 positions in which the game had just ended were discarded since all four algorithms solved the positions visiting only a single node. The reason why only 2 such positions were found out of 22 di erent games is that most games ended by resignation. Second, all positions with 17 stones or fewer (496 in total) were excluded. Third, all positions which were not solved by any of the algorithms (764 in total) were labeled unsolvable. The remaining 337 positions are named the nal test positions. We remark that in this way positions which are well suited for - search will be selected for the nal test positions, as well as those positions well suited for pn-search. Thus, in our selection method of test positions there is no bias towards either of the algorithms. Each of the algorithms which failed to solve one of the nal test positions within the 500,000 nodes limit, was given virtually unlimited resources to try again. In practice this meant a limit of a quarter billion nodes per position for 52 Chapter 2. Proof-Number Search - search, while for pn-search no nal test position took more than about one and a half million nodes to solve. In this section we present the results of the comparison of the four algorithms described in section 2.4.4 on the 337 nal test positions of section 2.4.6. Each of the 337 nal test positions was solved by basic pn and stones pn. Two positions were not solved by - within a quarter billion nodes, while there were two more positions not solved by both - and transposition. In this section we have set the solution size of unsolved positions at a quarter billion, which is a lower bound on the number of nodes necessary to solve them. Although this results in a bias in favor of - search, it does not in uence our conclusions and it allows us to include the positions in the test results. Removing the positions from the nal test set would be particularly unfair towards pn-search, as it would ignore its nest results. First, we present gures indicating how often one algorithm outperformed another, without paying attention to the exact di erence in node counts. Second, we tabulate the total number of nodes visited by each of the four algorithms, and calculate averages. Third, we group test positions by size of solution, and graphically depict the average di erence in performance of the search algorithms per group. 2.4.7 Results Outperforming the other algorithms In this section, we are interested in whether one algorithm performed better on a speci c test position than another algorithm, but ignore the size of the di erence. In our results we have divided the set of positions into two halves: the easy and the hard positions. To this end, we have sorted the positions according to the minimum number of nodes in which a position was solved. As a result, the 169 positions which were solved by at least one algorithm in fewer than 3200 nodes, were classi ed as easy positions, while the 168 positions with smallest solution larger than 3200 nodes were named the hard positions. In table 2.9 we have listed for each algorithm how often it outperformed all other algorithms, separated for easy and hard positions. If two algorithms shared rst place on a position, they were each awarded half a point. As can be seen from table 2.9, at the easy positions there is hardly any di erence between the - search algorithms (84 times best algorithm) and the pn-search algorithms (85 times best algorithm). For the hard positions 2.4 Results 53 easy hard - transposition basic pn stones pn 23 61 41 44 0 22 31 1 114 1 2 2 Table 2.9: Number of times an algorithm performed best of all. - transposition basic pn stones pn 35 79 91 1 2 transposition 134 89 100 basic pn 90 80 82 1 stones pn 77 2 69 87 Table 2.10: Comparing pairs of algorithms on easy positions. the picture is entirely di erent: the pn-search variants are 146 times best, against just 22 times for the - search variants. Table 2.10 shows per pair of algorithms, how often one algorithm outperformed the other, on the easy positions. Each entry at row R and column C in the table indicates how often the algorithm heading row R found a solution more quickly than the algorithm heading column C . The same information for the hard positions is displayed in table 2.11. Table 2.10 indicates that transposition wins against the other three variants, albeit with a small margin compared with the two pn-search variants (89 against 80 and 100 against 69). Table 2.11 clearly indicates that - has the worst performance of all four algorithms. It loses in all cases against transposition, and only 6 times outperforms the pn-search variants. Transposition occasionally does better than the pn-search variants, but is outperformed in more than 85% of all hard positions. Between the pn-search variants, the initialization based on the stones to be captured seems to pay o , given the 126 against 42 win compared with the standard initialization. 54 Chapter 2. Proof-Number Search transposition 168 basic pn 162 stones pn 162 - transposition basic pn stones pn 0 6 6 25 24 143 42 144 126 - Table 2.11: Comparing pairs of algorithms on hard positions. total nodes average nodes factor tree size 2,437,035,522 7,231,559 128.8 transposition 1,285,839,816 3,815,548 68.0 basic pn 28,214,875 83,723 1.5 42,767 stones pn 18,918,032 56,136 1.0 25,505 Table 2.12: Test gures per algorithm. Nodes visited In this section we concentrate on the number of nodes visited by each algorithm. In table 2.12 the rst column of results lists the total number of nodes visited on the 337 test positions, per algorithm, while the second column contains the average per position. In the third column, the factor di erence between each algorithm's average and the best average is presented. For both pn-search variants we have also determined the maximum number of nodes present in memory during each search. The average of these maxima have been listed in the last column of the table. From table 2.12 a pattern similar to that seen in tables 2.10 and 2.11 becomes apparent: the pn-search variants perform best, with stones pn doing somewhat better than basic pn. With factors 68.0 and 128.8, both - and transposition are clearly outperformed. The average maximum tree size in memory during the pn-searches, compared to the average solution size, indicates that removing solved subtrees during the search results in somewhat smaller memory requirements. Here approximately a factor 2 is gained. We remark that these gures only relate 2.4 Results 55 101 102 103 104 105 106 107 108 109 18 47 40 45 57 61 36 26 7 transposition 18 47 44 59 67 53 23 23 3 basic pn 14 43 52 77 92 57 2 stones pn 14 37 57 77 101 51 Table 2.13: Positions per group, per grouping algorithm. to solved positions. In searches which are not successful, the number of solved subtrees is smaller, rendering the technique less e ective. Performance by size Table 2.12 shows that pn-search is capable of outperforming - search by a large factor. The table does not indicate, however, to what extend the gain factor is related to the size of the search problems. Furthermore, we must realize that in the table the hard problems dominate the results. Measuring the size of the search problems is not a straightforward task, since a position which is di cult to solve with - search may be rather simple for pn-search or vice versa. Therefore, we have grouped the test positions in four di erent ways, each time according to one of the algorithms applied in our experiments. We describe the grouping process based on - . We have created groups for each power of 10. Thus, group i consists of all positions which were solved by - in more than 10i;1 nodes, and less than or equal to 10i nodes. Within each group, the average number of nodes necessary to solve all positions in the group is calculated, for each of the four algorithms. These averages are then compared to see which algorithm performs best on positions of the size represented by the group. In table 2.13 we have listed for each algorithm the number of positions per group, depending on the algorithm used as grouping criterion. These numbers indicate the size of each of the groups on which gures 2.9, 2.10, 2.11 and 2.12 are based. Figures 2.9, 2.10, 2.11 and 2.12 contain the results per group, where the groups are created according to the solutions of - , transposition, basic pn and stones pn, respectively. For each gure, the numbers on the horizontal axis indicate the log10 of the size of the groups. The numbers on the vertical axis indicate the log2 of the factor di erence between the 56 relative logarithmic 3 tree size 2 1 0 -1 -2 -3 -4 -5 -6 -7 -8 Chapter 2. Proof-Number Search logarithmic problem size a-b 1 2 3 4 5 6 7 8 9 10 transposition pn basic -9 pn stones -10 Figure 2.9: Comparison based on grouping by 3 relative logarithmic tree size 2 1 a-b 0 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10 1 2 3 4 5 6 7 8 9 transposition 10 logarithmic problem size pn basic pn stones Figure 2.10: Comparison based on grouping by transpositions 2.4 Results 8 7 relative logarithmic tree size 6 5 57 a-b 4 3 2 1 0 1 -1 -2 2 3 4 5 6 7 8 9 10 transposition logarithmic problem size pn basic pn stones Figure 2.11: Comparison based on grouping by basic pn 8 7 6 relative logarithmic tree size 5 4 3 2 1 0 1 -1 -2 2 3 4 5 6 7 8 9 10 a-b transposition pn basic pn stones logarithmic problem size Figure 2.12: Comparison based on grouping by stones pn 58 Chapter 2. Proof-Number Search 101 102 103 104 105 106 107 108 15 43 46 65 75 57 35 1 Table 2.14: Positions per group, grouped by all four algorithms. algorithms. In gures 2.9 and 2.10 we see that on small problems - search does somewhat better, while with increasing problem size, pn-search does better and better. For the largest problems, the gain factor is around 500. In gures 2.11 and 2.12, again pn-search does worse on the smallest problems and quickly starts doing better on increasing problem size. It is remarkable that the gain factor reduces when the problem size further increases. The cause of this phenomenon is described below. In each gure the algorithm used as grouping criterion plays an important role. In the rst few groups we nd positions which were suitable for that type of algorithm, while in the last few groups the positions found were di cult to solve for the algorithm. It is thus to be expected that in the graphs the other algorithms will do somewhat worse in the rst groups, while they do somewhat better on the last groups. This is exactly what can be seen in all four graphs. In gures 2.9 and 2.10 pn-search outperform - search starting from group 4, while in gures 2.11 and 2.12 pn-search is the better algorithm from group 2 onwards. Furthermore, in the rst two graphs pn-search's gain factor towards the last few groups grows remarkably fast, while in the second two graphs, with pnsearch as the grouping criterion, pn-search's advantage reduces in the last two groups. Thus, when looking at the groups for the hard problems, gures 2.9 and 2.10 are too attering towards pn-search while gures 2.11 and 2.12 do not give pn-search full credit. As a solution to this problem, we present one nal graph. This time we have determined the size of a problem in a more elaborate way. For each solution by an algorithm, we determine the log10 of the number of nodes visited. For the four algorithms we then determine the average of these exponents and use it as group number. The number of positions per group has been tabulated in table 2.14. We average the logs since node counts tend to grow exponentially instead of linearly. 2.4 Results 8 7 6 59 a-b transposition relative 5 logarithmic tree size 4 3 2 1 0 1 -1 -2 2 3 4 5 6 logarithmic problem size 7 8 9 10 pn basic pn stones Figure 2.13: Comparison based on grouping using all four algorithms. The singleton last group has been deleted, and its position has been added to the second last group, making a total of 36 entries in that group. The graph produced by this grouping criterion is pictured in gure 2.13. The numbers on the axes have the same meaning as in gures 2.9, 2.10, 2.11 and 2.12. In it, the bias towards a single algorithm no longer exists. The gure con rms the suggestion from the previous four gures, that pn-search's gain factor, compared with - search, grows with increasing problem size. In this section we have compared the behavior of two pn-search variants with two variants of - search. The comparisons lead to clear conclusions: pnsearch signi cantly outperforms both variants of - search (cf. table 2.12) in proving game-theoretic values in awari. The gain factor di erence between pn-search and the - variants tends to increase with increasing problem size (cf. gure 2.13). We further conclude from table 2.12 that a domain-dependent initialization can be bene cial on awari, with the enhancement yielding a pro t of about a factor 2. Moreover, the removal of solved subtrees in pn-search 2.4.8 Conclusions 60 Chapter 2. Proof-Number Search decreases the working memory requirements by a factor of about 2, in problems which are ultimately solved. We believe that the success of pn-search on awari is due to the nonuniformity of the tree. Allis et al. (1991b) have attempted to measure the degree of non-uniformity necessary for pn-search to outperform alternative algorithms. The results of this section show that awari's non-uniformity warrants the selection of pn-search for proving game-theoretic values instead of - search variants. We tentatively conclude from these results that pn-search has contributed signi cantly to proving the game-theoretic values of other non-uniform trees, such as those of connect-four, qubic (see chapter 4) and go-moku (see chapter 5). 2.5 Related algorithms In this chapter we have presented pn-search as an and/or tree search algorithm. Its roots, however, lie within the game-tree search algorithms. So far we have applied pn-search only to game trees, including awari, chess (Breuker et al., 1994), connect-four (Allis, 1988), give-away chess, go-moku (see chapter 5), othello (Gnodde, 1993) and qubic (see chapter 4). In our discussion of related algorithms we will therefore focus mainly on gametree search algorithms. In this section, we discuss the relationships with conspiracy-number search, sss*, b* and a*, the latter being the only singleagent search algorithm in the list. 2.5.1 Conspiracy-number search Conspiracy-number search (cn-search) is pn-search's direct ancestor. Cnsearch was developed in the middle of the 1980s by McAllester, and has received attention of many researchers since then (McAllester, 1988 Klingbeil and Schae er, 1988 Klingbeil, 1989 Schae er, 1989 Van der Meulen, 1990 Allis et al., 1991b Lister and Schae er, 1994). While pn-search focuses on the minimum number of nodes which must conspire to prove the value of a position, cn-search determines the minimum number of nodes which must conspire to change the value of a position. This main di erence is more apparent when looking at the search tree: pn-search does not use a heuristic evaluation function to evaluate non-terminal nodes, while cn-search does. 2.5 Related algorithms 61 Subtle di erences between cn-search and pn-search can be identi ed by creating an instantiation of cn-search which resembles pn-search as much as possible. To do so, we de ne a three-valued evaluation function for cn-search, which returns -1 for a disproved node, 0 for a node with value unknown, and 1 for a proved node. In such a tree, the conspiracy numbers for -1 and 1 of a node correspond to the proof and disproof numbers of that node. These algorithms only di er in the manner in which the next node to be developed is selected, for which unpublished experiments on connect-four have shown that the selection mechanism of pn-search performs better than the selection mechanism of cn-search. In cn-search, for any potential value v of the evaluation function it is determined for each subtree how many nodes, say Nv , within the subtree must change their evaluation value to v, to change the value of the subtree to v. If Nv for the root exceeds a certain limit, for all v unequal to the current root value, cn-search assumes that the current root value is reliable and terminates the search. Schae er's implementation showed that cn-search could achieve good results in tactical chess positions (Schae er, 1989). Unfortunately, experiments with tournament chess programs (Van der Meulen, 1990) have not been successful. We remark that, despite pn-search's success in analyzing awari positions, we do not claim that pn-search is better suited than cn-search to perform well in a tournament chess program. Instead, we claim that the ideas behind cnsearch, such as applied in pn-search, are better suited for proving values, than for determining the reliability of heuristic root values. Pn-search capitalizes on this suitability, concentrating on proving only. We do envision applications in tournament programs, as we have in our awari program. For instance, Breuker et al. (1994) have shown that pn-search may be an asset to chess programs, to prove quickly whether a mating sequence exists in a given chess position. We conclude that cn-search and pn-search are closely related, with pnsearch focusing on a di erent goal and being successful at it. 2.5.2 SSS* With the availability of large internal memories, algorithms which store the search in working memory have become of practical interest. One of the earliest game-tree search algorithms which uses a stored tree is sss* (Stockman, 1979 Campbell and Marsland, 1983). 62 Chapter 2. Proof-Number Search sss* and pn-search are both best- rst search algorithms. At each step in the algorithm a node is selected according to a certain criterion and then developed. This process is repeated until the tree has been solved, or resources have run out. An important similarity between sss* and pn-search is that neither algorithm uses a heuristic evaluation function for internal nodes. Only leaf nodes are assigned a value, either by a heuristic evaluation function or by reliable game knowledge. The main di erence between the algorithms is the criterion which determines the selection of the next node. Sss* selects a node purely based on the upper bound still achievable. At any point during the search the node which has the highest possible upper bound is selected, while among equals the leftmost node in the tree is preferred. Pn-search does not use a range of terminal-node values. Instead, the set of possible terminal-node values is split into two. Solving the tree means determining in which of the two sets the true value lies. If the exact value from a larger range must be determined, pn-search should be called repeatedly, for instance by having pn-search be the discriminating function in a binary search. While pn-search does not use a range of values, it bases its selection on the proof and disproof numbers implying that a node is tried which may be part of a solution with minimal e ort. A predecessor of pn-search, viz. -cn search (Allis et al., 1991b), can be seen as a hybrid form of pn-search and sss*. It uses both a range of values and proof and disproof numbers (although these were named di erently) to determine the next node to be developed. The main criterion is the range of possible values, like in sss*, while in case of a tie the proof and disproof numbers are used. It can be shown, however, that trees exist with solutions of only a few nodes, in which -cn search could spend a long time in irrelevant subtrees (Allis et al., 1994). The solution to this problem consisted of reducing the impact of the range of values, while enlarging the role of the proof and disproof numbers. The result of this change has been the development of pn-search. For a comparison of sss* and -cn search on random trees, see Allis et al. (1991b). 2.5.3 B* B* is a best- rst game-tree search algorithm introduced by Berliner (1979). It assumes that at each frontier node a special evaluation function returns a reliable lower and upper bound on the true value of the node. After a 2.5 Related algorithms 63 node is expanded, the lower and upper bounds of a node are calculated by maximizing (or minimizing, depending on the node type) the lower and upper bounds of its children. Let us assume that the root of the tree is a max node. Let us further assume that the root has two children A and B, with values in the intervals 0 2] and 1 3]. b*'s main goal is to determine the best move, without necessarily knowing the exact value of such a move. In our example, B is the most-promising child of the root. Before we can terminate the search, however, we should either prove that the upper bound (2) on A's interval can be reduced to a value below the lower bound of B, which currently equals 1, or we should prove that the lower bound of B can be raised to at least the value of A's upper bound. These two di erent strategies are called prove and disprove. Focusing both on proving and disproving is a similarity with pn-search. However, a di erence with pn-search is that there is no way to simultaneously work on both strategies. Thus, in b*, at each step rst a choice must be made for one of the strategies, followed by the selection of a node. Of course, after each node expansion, a change of strategies may take place. Since b* does not assume that some nodes may change their bounds more easily than others, we suggest that the concept of proof and disproof numbers could be a useful addition to b*. An important prerequisite of b* is the reliable evaluation function which determines the lower and upper bound per node. Such an evaluation function heavily depends on domain-speci c knowledge, and may be a serious obstacle in many domains. If, however, the knowledge to create such a function is readily available, b* provides a sound mechanism to incorporate it to guide the search process. An alternative way to obtain these bounds through a small search has been described by (Palay, 1982). For pn-search such a clear mechanism has not yet been formulated. In this respect b* has advantages above pn-search. 2.5.4 A* rst search algorithm, which uses an admissible evaluation function at each frontier node. Such a function calculates a lower bound on the total costs of the path from the root to a solution through that node. At each step a node with minimal lower bound on the solution costs is developed. a* thus guarantees nding an optimal solution (Hart et al., 1968 Hart et al., 1972 Nilsson, 1980). A*, a single-agent search algorithm, has links with pn-search. A* is a best- 64 Chapter 2. Proof-Number Search Where a* concentrates on the cheapest overall solution, including the e ort already spent (i.e., the cost of the path from root to frontier node), pnsearch selects a node on the basis of the cheapest remaining solution, thus ignoring the contribution of already solved nodes and the path length from the root to the most-proving node. As a result, pn-search is not guaranteed to nd the solution tree of minimal size. Surprisingly, a small change to pn-search is su cient to let it nd the minimal solution tree. If, at each internal node, we add one to the proof number and disproof number as calculated from its children's proof and disproof numbers, then the proof number and disproof number at each node are a lower bound on the size of a solution tree for the node. We remark that proof and disproof numbers now can only increase, making some changes to the algorithm necessary. This algorithm, originating from discussions with Ingo Althofer, has been named mst*, short for Minimal Solution-Tree search. Mst*, as variant of pn-search, will be subject of future research. Chapter 3 Dependency-Based Search 3.1 Introduction In section 2.1, we argued that choosing a representation and performing a search are two interacting subprocesses of problem solving. Better representations of a problem may result in smaller state spaces, and better search algorithms may traverse a given state space more e ciently. While the game-tree search algorithm pn-search (chapter 2) focuses on the latter, the single-agent search algorithm dependency-based search (db-search) introduced in this chapter, focuses on the former. Atomic vs. structured states Search problems are often modeled by treating states as atomic entities. This means that two states are considered as either equal or di erent, without the option of a measure of similarity between states. As an alternative to atomic state representations, states can be structured, such as in strips (Fikes and Nilsson, 1971). In strips, each state is de ned as a set of attributes. Each operator f is speci ed by a precondition set, a delete set and an add set. In any state A containing the attributes of the precondition set of f , f can be applied, yielding a state B . B consists of the attributes of A with the attributes of the delete set of f removed and with the attributes of the add set of f added. To see how a structured state representation may help in reducing the size of a state space consider a production system P consisting of 10 rewriting rules r0 r1 : : : r9. 65 66 Chapter 3. Dependency-Based Search 0 1 2 3 4 5 6 7 8 9 ;! ;! ;! ;! ;! ;! ;! ;! ;! ;! r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 a l t o g e t h e r Furthermore, we consider production system P 0, which contains the 10 rules of P as well as the rule r10. altogether ;! goal r10 Rule r10 states that the string altogether may be replaced by the string goal. For both P and P 0 , we start with the initial string 0123456789. The goal of both P and P 0 is to create the string goal. Clearly, in P there is no solution, while any order of applying rules r0 to r9, followed by the application of r10 leads to the goal in P 0 . First, let us represent P using atomic states. The state space will consist of 210 = 1024 states, each representing a mixture of digits and lower-case letters. The state space of P 0 consists of the same 1024 states as P , with one additional state consisting of the string goal. Without the application of domain-speci c knowledge, searching P consists of traversing the full state space of 1024 states. The number of states visited in P 0 depends on the search algorithm applied. Depth- rst search visits the goal as 11th state, while breadth- rst search visits the goal state as number 1025. Second, let us represent P and P 0 using structured states. A possible structure consists of attributes of the form a(i z), where i 2 f0 : : : 9g, and z 2 f0 : : : 9 a e g h l o r tg. An attribute a(i z) indicates that letter or digit z occupies position i in the string represented by the set of attributes. In P 0 we have an additional attribute g representing the string goal. The rule r0 can now be represented by its precondition set fa(0 0)g, its delete set fa(0 0)g and its add set fa(0 a)g. Similarly, rule r5 is represented by its 3.1 Introduction 67 precondition set fa(5 5)g, its delete set fa(5 5)g and its add set fa(5 e)g. The rule r10 is represented by its precondition set fa(0 a) a(1 l) a(2 t) a(3 o) a(4 g) a(5 e) a(6 t) a(7 h) a(8 e) a(9 r)g its delete set fa(0 a) a(1 l) a(2 t) a(3 o) a(4 g) a(5 e) a(6 t) a(7 h) a(8 e) a(9 r)g and its add set The number of states in the state space, as well as the number of states visited by depth- rst search and breadth- rst search algorithms are equivalent to the numbers found for atomic states. The di erence between the atomic and the structured state representations is that the structure of states provides us with a framework for reasoning about relations between states and operators (e.g., rewriting rules), without having to rely on domain-speci c knowledge. As an example of such a relation between operators we state that any two rules r and r , for 0 i < j 9 are independent, meaning that in any state where both rules can be applied, changing the order of application does not in uence the outcome. Clearly, all relations which can be found by using structured state representations can also be found through a domain-speci c analysis of the problem at hand. The advantage of a general framework using structured states as introduced in this chapter is that the analysis is performed once and for all for a category of problems. In this chapter we de ne a framework, based on structured states and strips-like operators. Within the framework, a set of conditions has been identi ed which are su cient to prove that a reduction of the state space can be performed without the loss of solutions in the state space. Conventional search algorithms cannot traverse the reduced state space but the db-search algorithm can. It is proved that db-search, introduced for the purpose, traverses exactly the reduced state space. To give an indication of the amount of state-space reduction achieved by our framework, we once again look at the state space de ned for production systems P and P 0 . For P the reduced state space consists of 11 elements (an initial state and 10 states representing the changes by rules r0 : : : r9 ). For P 0 the reduced state space consists of 12 elements (one additional state representing goal). These numbers should be compared with the 1024 and 1025 found for the atomic-state representation. i j fa(0 g) a(1 o) a(2 a) a(3 l)g: 68 Chapter 3. Dependency-Based Search Overview of the chapter In section 3.2 we describe the double-letter puzzle (dlp), which is used as an example throughout the chapter. In section 3.3 we formally de ne a framework for a category of single-agent searches based on structured-state representations. Each de nition in this section is illustrated by its application to dlp. In section 3.4 db-search is described informally using the framework introduced in the previous section, by applying it to an instance of dlp. In section 3.5 we present algorithms in pseudo-code for db-search. In section 3.6 we compare the performances on dlp of db-search and depth- rst search. Finally, in section 3.7 the scope of applicability of db-search is discussed. For practical results of db-search we refer to chapters 4 and 5. 3.2 The double-letter puzzle The double-letter puzzle (dlp) is a production system consisting of an axiom and a set of 10 rewriting rules. The axiom is an element of fa b c d eg+ . The rewriting rules are listed below. aa bb cc dd ee ! ! ! ! ! ejb ajc bjd cje dja The rewriting rules can be informally described as allowing any double occurrence of a letter to be replaced by a single instance of its alphabetical predecessor or successor in a circular alphabet. We de ne the set of theorems of dlp as follows: 1. The axiom is a theorem 2. If x is a theorem and there exists a rewriting rule r such that x ;! y, then y is a theorem. r 3. There are no theorems except as de ned by 1. and 2. Each theorem of length 1 (i.e., a theorem consisting of a single letter) is called a solution to dlp. 3.3 A formal framework for db-search 69 Two solutions to instance aabdcbbdcaa of dlp are presented below. ! ! ! ! aabdcbbdcaa ;! bbdcbbdcaa ;! adcbbdcaa ;! adccdcaa ;! ! ! ! ! ! ! ;! adddcaa ;! adccaa ;! addaa ;! aeaa ;! aee ;! ! ! ;! aa ;!j b j e ! ! ! ! aabdcbbdcaa ;! bbdcbbdcaa ;! cdcbbdcaa ;! cdccdcaa ;! ! ! ! ! ! ! ;! cdddcaa ;! ccdcaa ;! ddcaa ;! ccaa ;! baa ;! ! ! ;! bb ;!j a j c From the examples we see that a, b, c and e can be deduced. For a proof that d cannot be deduced from aabdcbbdcaa, we refer to appendix A. aa b bb a bb c cc d cc d dd c cc d dd e aa e ee a ee a aa be aa b bb c bb c cc d cc d dd c cc d dd c cc b aa b aa b bb ac 3.3 A formal framework for db-search In this section we de ne a formal framework for db-search. The framework is described in four steps. In section 3.3.1 we de ne states and operators. In section 3.3.2 we de ne paths through the state space and classes of equivalent paths. It is shown that conventional search algorithms traverse exactly the set of all paths. In section 3.3.3 key classes are de ned. These form a subset of the classes of paths de ned previously. It is shown that, under accurately de ned circumstances, the set of all key classes is complete, meaning that each solution path is represented by a key class. In section 3.3.4 we de ne a metaoperator for traversing the state space de ned by the set of all key classes. It is shown that the meta-operator is sound and complete, meaning that through application of the meta-operator exactly all key classes are visited. Finally, in section 3.3.5, we summarize the properties of our framework. The description of the framework for db-search requires a large number of de nitions. For reference purposes, we have listed the symbols used in this section and a short description of their meaning in table 3.1. Each de nition in this section is illustrated by its application to the instance of dlp with axiom aacc. 3.3.1 States and operators In this section we rst de ne the set of attributes U and the state space U . Then we de ne operators (consisting of a precondition set, a delete set and an add set) which map states onto other states, followed by the set of all s 70 Chapter 3. Dependency-Based Search f U U U U U U U f f f f f (S ) ff ff s i g f p k pre del add symbol 1 2 2 1 P PQ PQ P =Q P (S ) P] U= key(P ) P kQ Par (P ) Anc (P ) Ax p f f (p q r z1 z2 ) description the set of all attributes the state space the set of all initial states the set of all goal states the set of all operators the set of all paths applicable to initial states the set of all key classes an operator the precondition set of operator f the delete set of operator f the add set of operator f the state reached when applying operator f to S f1 supports f2, f2 depends on f1 f1 precedes f2 an operator in dlp a path the concatenation of paths P and Q paths P and Q are equivalent P and Q are transpositions the state resulting from applying path P to state S the equivalence class of path P the set of equivalence classes of U the key operator (last operator) of path P the merge of paths P and Q the set of parents of operator f in path P the set of ancestors of operator f in path P the axiom state of dlp p Table 3.1: Symbols used in db-search framework 3.3 A formal framework for db-search f i 71 g operators U . Finally, we de ne the set U of all initial states, and the set U of all goal states. de ned as 2 , the power set of U . U s De nition 3.1 Let U be a set of attributes. Then the state space U is We index the letters of the axiom in dlp from 0 to n ; 1, where n is the length of the axiom (i.e., 4 for dlp with axiom aacc). In the axiom, the rst a has index 0, the second a has index 1, the rst c has index 2 and the second c has index 3. Each letter in a theorem originates from a substring of the axiom. We represent a letter z in a theorem by three values: the rst and last index of the substring of the axiom z originates from, and z itself. If aab is produced from aacc, the letter b originates from the substring cc in the axiom, which has rst index 2 and last index 3. Therefore, the b in aab is represented by A(2 3 b). The set of all attributes U is speci ed as follows. U = fA(i j z) j 0 i j 3 ^ z 2 fa b c d egg As the axiom will play a special role in many of the de nitions of this section, we denote the state representing the axiom aacc by Ax. In accordance with de nition 3.1, Ax 2 U is represented as follows. s Ax = fA(0 0 a) A(1 1 a) A(2 2 c) A(3 3 c)g De nition 3.2 We de ne an operator f as a 3-tuple hf f pre pre f . The elements in the 3-tuple are named the precondition set, the delete set and the add set of f , respectively. Operator f is a partial function f : U ;! U , de ned as f (S ) = (S n f ) f , for all S f . del add del pre s s del add pre f f U and f f del f i, with add De nition 3.2 states that an operator f is applicable to each state containing all attributes in the precondition set of f . Applying operator f to state S yields a state T , by deleting the attributes of the delete set of f from S and adding to the result the attributes of the add set of f . In dlp, two equal adjacent letters z1 are replaced by z2, which is either the successor or predecessor of z1. The two z1s originate from two adjacent substrings in the axiom. Let the rst z1 originate from the substring with start index p and end index q, and let the second z1 originate from the substring with start index q + 1 and end index r. Then, the indices p, q, and r, and the letters z1 72 Chapter 3. Dependency-Based Search and z2 are su cient information to de ne an operator. In the following, we denote with f( ) the operator p q r z1 z2 hfA(p q z ) A(q + 1 r z )g fA(p q z ) A(q + 1 r z )g fA(p r z )gi: 1 1 1 1 2 An example operator in dlp is f U. f f (0 0 1 a b) = hfA(0 0 a) A(1 1 a)g fA(0 0 a) A(1 1 a)g fA(0 1 b)gi a b) Applying f(0 0 1 to axiom state Ax yields fA(0 1 b) A(2 2 c) A(3 3 c)g. De nition 3.3 The set of operators de ned within a domain is denoted by Using de nition 3.3 we de ne for our instance of dlp the set of operators (p q r z1 z2 ) j 0 p q < r 3 ^ z 2 fa b c d eg ^ z 2 succpred(z )g Here succpred(z) denotes a set containing the circular alphabetical successor and predecessor of z. De nition 3.4 We denote the set of initial states by U , with U U . We denote the set of goal states by U , with U U . 1 2 1 i i s g g s U as ff For our instance of dlp, U = ffA(0 0 a) A(1 1 a) A(2 2 c) A(3 3 c)gg U = ffA(0 3 a)g fA(0 3 b)g fA(0 3 c)g fA(0 3 d)g fA(0 3 e)gg: i g In this section we rst de ne paths, which are just sequences of operators. We de ne the application of a path to a state S , as one by one applying the operators, starting from state S . Then solutions for a state S are de ned as the paths which, if applied to S yield a superset of a goal state. We then de ne the extension of a path P , which is a path consisting of all operators of P , in the same order, plus one additional operator. An equivalence relation for paths is de ned, which states that two paths are equivalent if one is a permutation of the other. Then, a notation for equivalence classes of paths is introduced. Finally, we describe the behavior of conventional search algorithms in terms of paths. 3.3.2 Paths 3.3 A formal framework for db-search f 73 1 n path. Let concatenation of two paths P and Q be denoted by P Q. Then, P is applicable to S if (1) P = , or (2) P = (f ) Q and f (S ) is de ned and Q is applicable to f (S ). If P is applicable to S , then De nition 3.5 Any element P of U is a path. Let P = (f : : : f ) be a P (S ) = f (f ; (: : : (f (f (S ))) : : :)): n n 1 2 1 For path P = (f(0 0 1 ) f(2 2 3 ) f(0 1 3 state Ax, it follows from de nition 3.5 that ab cb b c) ), applicable to the axiom P (Ax) = f (f (f (Ax))) = =f (f (fA(0 1 b) A(2 2 c) A(3 3 c)g)) = =f (fA(0 1 b) A(2 3 b)g) = = fA(0 3 c)g (0 1 3 (0 1 3 (0 1 3 b c) (2 2 3 (2 2 3 c b) c b) (0 0 1 a b) b c) b c) De nition 3.6 The set of paths U , is de ned as follows. U = fP j S 2 U ^ P is applicable to S g p p i It can be checked that for our instance of dlp with initial state Ax, U (de nition 3.6) consists of 17 paths. p U = f (f (f (f (f (f (f p (0 0 1 ab cd (0 0 1 (2 2 3 (0 0 1 (0 0 1 (0 0 1 ae ab ab ) (f(0 0 1 )) (f(2 2 3 )) (f(2 2 3 )) ) f(2 2 3 )) (f(2 2 3 ) f(0 0 1 ) ) (f(0 0 1 ) f(2 2 3 )) ) f(0 0 1 )) (f(0 0 1 ) f(2 2 3 )) (f(2 2 3 ) f(0 0 1 )) ) f(2 2 3 ) ) (f(2 2 3 ) f(0 0 1 )) ) f(2 2 3 ) f(0 1 3 )) (f(2 2 3 ) f(0 0 1 ) f(0 1 3 )) ) f(2 2 3 ) f(0 1 3 ) ) (f(2 2 3 ) f(0 0 1 ) f(0 1 3 ))g a b) ae cb cd cb cb ab ab cd ab ae cd cb cb ae cd ae cb cb ba bc cb ab ba cb ab bc De nition 3.7 Let P = (f : : : f ) be a path applicable to S . We de ne the following terminology with respect to P . g 1 n 1. P is a solution for S , if 9x 2 U x P (S ). 2. A path Q is an extension of P , if Q = P (f ), for some operator f . ): We give examples for de nition 3.7 using path P = (f (0 0 1 a b) f (2 2 3 c b) f (0 1 3 b c) 74 Chapter 3. Dependency-Based Search 1. P is a solution for axiom state Ax, because P (Ax) = fA(0 3 c)g and fA(0 3 c)g 2 U . 2. P is an extension of path (f(0 0 1 ) f(2 2 3 )). g ab cb . P Q, if P is a permutation of Q. Ax is De nition 3.8 Let P and Q be paths. P and Q are equivalent, denoted by An example of de nition 3.8 from the set of paths in dlp applicable to (f(0 0 1 a b) f (2 2 3 c b) f p (0 1 3 b c) ) (f(2 2 3 c b) f (0 0 1 a b) f (0 1 3 b c) ) p De nition 3.9 Let P 2 U be a path. We denote the set of all paths Q 2 U such that P Q by P ] (the equivalence class of P modulo ). The set of all equivalence classes of U modulo is denoted by U = . p p From de nition 3.9 and the example after de nition 3.6 it follows that for )g: We mention that in our instance of dlp U = consists of 11 equivalence classes. (0 0 1 a b) P ] = f(f P = (f f (0 0 1 c b) a b) f (2 2 3 b c) c b) f p (0 1 3 b c) ) a b) (2 2 3 f (0 1 3 ) (f(2 2 3 c b) f (0 0 1 f (0 1 3 b c) Traversing U p p In this section we describe how a conventional tree search algorithm traverses U , as de ned within our framework. As an example tree search algorithm, we discuss depth- rst search (dfs). Starting from initial state Ax, dfs traverses a tree such that each node N represents a path P applicable to initial state Ax. At node N , an operator f of U can be applied, if f is applicable to P (Ax). In other words, f can be applied at node N , if P (f ) is applicable to Ax, i.e., P (f ) 2 U . Clearly, dfs will traverse a nite U fully, unless terminated early. A reduction of state space U is applied in many practical domains. We say that P = Q if P (Ax) = Q(Ax). Thus, if P = Q, then P (Ax) and Q(Ax) are transpositions. From the de nition of a path, it is clear that in such a case P and Q can be extended in exactly the same way. Thus, even though f p p p 3.3 A formal framework for db-search 75 several paths may lead to the same state, the continuations from that state need to be investigated only once. Instead of traversing U , we may therefore restrict ourselves to traversing U ==. To do so, transposition tables are used to store the results of investigating the continuations starting at each node. Before investigating a node, it is checked whether the node has already been investigated (indicating that the node is a transposition) (Greenblatt et al., 1967). We conclude that conventional tree search algorithms traverse the state space U , which may be reduced by investigating each transposition only once. p p p In this section we de ne the key operator of a path (which is just the last operator of the path), key classes (which are equivalence classes of paths where all paths have the same key operator), and the set of all key classes. We de ne monotonicity, which indicates that in the course of executing operators, an attribute can never be recreated after it has been deleted. We de ne singularity, which means that each goal state consists of a single attribute. Furthermore, we de ne redundant paths, which are extensions of solutions. Finally, we show that the set of all key classes is complete under the condition of monotonicity, singularity and the absence of redundancy. Completeness means that each solution in U is an element of a key class. De nition 3.10 Let P = (f1 : : : f ) be a path applicable to S . The last operator of a non-empty path P (i.e., f ), is called the key operator of the path. Notation: key(P ) = f . For path P = (f(0 0 1 ) f(2 2 3 ) f(0 1 3 )) we obtain from de nition 3.10 that key(P ) = f(0 1 3 ). De nition 3.11 Let C 2 U = be a class. C is a key class, if for all P P 2 C , key(P ) = key(P ). The set of all key classes of U = is denoted by U . The key of a key class C is de ned to equal the key of the paths in C and is denoted by key(C ). From de nition 3.11 and the example after de nition 3.9, it follows that for P = (f(0 0 1 ) f(2 2 3 ) f(0 1 3 )), P ] is a key class. For Q = (f(0 0 1 ) f(2 2 3 )), Q] is not a key class, since Q has key f(2 2 3 ), while (f(2 2 3 ) f(0 0 1 )) has key f(0 0 1 ). We note that U for our instance of dlp consists of 7 key classes. p n n n ab bc cb bc p i j i j p k ab cb bc ab cb cb cb ab ab k 3.3.3 Key classes 76 Chapter 3. Dependency-Based Search 1 n De nition 3.12 A path P = (f : : : f ) applicable to S is monotonous for = ^ 8 S\f = : U is monotonous if all paths in U are monotonous for all S 2 U . In our instance of dlp, there are 17 paths. Investigation shows that each path P is monotonous for Ax. From de nition 3.12 it follows that U is monotonous. De nition 3.13 We say that U is singular if each S 2 U consists of a single attribute, i.e., jS j = 1. The U de ned for dlp, U = ffA(0 3 a)g fA(0 3 b)g fA(0 3 c)g fA(0 3 d)g fA(0 3 e)gg is singular, according to de nition 3.13. De nition 3.14 A path Q is redundant, if Q is an extension of P , and P is a solution for an initial state, or P is redundant. U is non-redundant, if no path in U is redundant. In dlp, there are no operators applicable to goal states. Therefore, there are no redundant paths in dlp, as de ned in de nition 3.14. i =j add i add j i add i p p i p g g g g p p S if 86 f \f Completeness of U p p k In section 3.3.2 we have shown that conventional search algorithms traverse U . Through the equivalence relation , we have de ned classes of paths, U = . Of these classes, the subset U of key classes has been singled out. In this section we will show that to nd all solutions in U , it is su cient to consider only paths which are elements of key classes, thereby restricting the size of the state space. Our proof is based on the assumption that U is monotonous and non-redundant, and that U is singular. Our proof consists of three steps. First, in lemma 3.1 we show that either all paths in a class are a solution, or none are. It follows that instead of focusing on paths, we need only to focus on classes of paths, thereby restricting our state space to U = . Second, in lemma 3.2 we show that each equivalence class containing a solution must be a key class. Third, in theorem 3.1 we combine these two results to show that it is su cient to examine the set of all key classes U . Lemma 3.1 Let P and Q, paths applicable to S , be elements of P ] , for P and Q monotonous for S . Then P (S ) = Q(S ). k p p g p k 3.3 A formal framework for db-search 77 We assume without lack of generality that P = (f : : : f ) for some natural n. Let a 2 P (S ) be an attribute. Then, because of monotonicity, a is an element of exactly one of the following sets: S f f : : : f . Now let us suppose that a 2 f , for some i. Then from de nition 3.2 it follows that a 2 f , restricting a to membership of exactly one of the following sets: S f f : : : f ; . But then, since a 62 f (: : : (f (S ))), also a 62 P (S ). This contradicts our assumption that a 2 P (S ). Thus, there is no i 2 f1 : : : ng such that a 2 f . Since Q is a permutation of P , a 2 Q(S ) and P (S ) Q(S ). Analogously, Q(S ) P (S ). 2 1 n pre del i Proof 1 add 2 add add n 1 i add 2 add add i1 i 1 del i Lemma 3.2 Let U be singular and let U be monotonous and non-redundant. If P is a solution applicable to S then P ] is a key class. g p Let Q 2 P ] . We assume without lack of generality that Q = (f1 : : : f ) for some natural n. Let x be an attribute in an element of U . If x 2 f , for some p 2 f1 : : : ng, then (f1 : : : f ) is a solution, since U is singular. Since U is non-redundant, Q is non-redundant. Thus, f must be the last operator (i.e., the key operator) of Q. As f occurs in all paths in P ] , it must be the key operator in each of these paths. Thus, P ] is a key class 2 n g add p p g p p p Proof Theorem 3.1 Let U be monotonous and non-redundant and let U be singular. Then U is complete (i.e., each solution path in U is element of a class in U , and each class in U either consists of only solutions, or no solutions). k p k k p g Proof From lemma 3.1 it follows that either all paths in the equivalence classes of U = are solutions, or none are. From lemma 3.2 it follows that the equivalence class modulo for any solution path is a key class. Thus, for any solution path, its equivalence class is a key class, of which each representative is a solution. Thus U is complete. 2 p k 3.3.4 Traversing U k In this section we de ne two relations, to support and to precede, between operators. These relations create a partial order between operators in 78 Chapter 3. Dependency-Based Search monotonous paths. Using the partial order we can de ne the parents (operators which directly support or precede an operator) and ancestors (operators which directly or indirectly support or precede an operator). Last, we de ne the merge of a set of classes, which itself is a class. The merge of a set of classes consists of paths containing exactly the operators in the paths of the classes merged. Stated more simply, if we merge a class containing path P with a class containing path Q, the merge contains all paths consisting of exactly the operators in P and Q. Operators in both P and Q occur only once in the paths of the merge. The purpose of these de nitions is to create a meta-operator which is capable of traversing exactly U . We have shown in section 3.3.3 that U is complete. Together with a proof that we have a meta-operator which traverses exactly U , we have shown that a restricted state space can be traversed, without reduced e cacy. The de nition of the meta-operator and the proof of its soundness (each application leads to a key class) and completeness (all key classes will be created by application of the metaoperator) follow the de nitions in this section. k k k and De nition 3.15 Let f f 2 U . We de ne the two relations (supports) (precedes) on U 1 2 f U as follows. f f 1. f1 2. f1 f () f 2 2 \ f 6= . f () f \ f 6= . 1 add pre 2 pre 1 2 del We remark that we will use both the phrases f1 supports f2 and f2 depends on f1 to describe f1 f2. We provide examples in our instance of dlp, for the two relations of de nition 3.15. 1. f(0 0 1 a b) f (0 1 3 b a) , as f(0 0 1 add pre a b) \f pre (0 1 3 b a) = fA(0 1 b)g. 2. f(0 0 1 ) f(0 0 1 ), as f(0 0 1 ) \ f(0 0 1 ) = fA(0 0 a) A(1 1 a)g. We remark that also f(0 0 1 ) f(0 0 1 ). Which shows that f(0 0 1 ) and f(0 0 1 ) cannot occur in the same monotonous path. ab ae del ab ae ae ab ae ab De nition 3.16 Let P be a non-empty path applicable to S , and let f be an operator in path P . The set of parents of f in P is de ned as follows. Par (P ) = ff j f 2 P ^ (f f i i i f _f i f )g 3.3 A formal framework for db-search ab cb bc 79 Between f(0 0 1 ), f(2 2 3 ) and f(0 1 3 ) the following two relations hold: f(0 0 1 ) f(0 1 3 ) and f(2 2 3 ) f(0 1 3 ). Thus, for path P = (f(0 0 1 ) f(2 2 3 ) f(0 1 3 )), de nition 3.16 states that Par (P ) = ff(0 0 1 ) f(2 2 3 )g. ab bc cb bc ab cb bc f(0 1 3 b c) ab cb De nition 3.17 Let P be a non-empty path applicable to S , and let f be an operator in P . The set of ancestors of f in P is de ned as follows. Anc (P ) = ff g f fi i 2 Anc (P ) fi P arf (P ) j Furthermore, a parent f of f is named a relevant parent if for all parents f of f , with f 6= f , f 62 Anc (P ). j i i fj f f In each instance of dlp, each parent is a relevant parent. 1 n In our example instance of dlp, Anc (P ) = ff g P ar (P ), for all paths P and all operators f . In more complex instances of dlp, however, not all ancestors of f as de ned in de nition 3.17 will be parents of f (or f itself). De nition 3.18 Let P : : : P be paths applicable to S . Then the merge of P : : : P , denoted by P k : : : k P , is de ned as the set of all paths Q 1 n applicable to S , such that Q is a permutation of the set of all operators in the P . The merge of a set of classes P ] is de ned as the merge of a set of representatives of the classes. Thus, i i 1 n P ] k ::: k P ] = P k ::: k P : 1 n 1 n We present three examples of merges of paths, as de ned in de nition 3.18. (f(0 0 1 (f(0 0 1 (f(0 0 1 a b) ab ab ) k (f(2 2 3 )) k (f(0 0 1 )) k (f(2 2 3 c b) ae cb ) = f(f(0 0 1 = )) = f(f(0 0 1 )) a b) f (2 2 3 c b) ) (f(2 2 3 c b) c b) f (0 0 1 a b) )g a b) )g k f(f(2 2 3 )gg A meta-operator So far, we have de ned U and proved its completeness, under the assumptions of singularity, non-redundancy and monotonicity. For the remainder of this section we assume that these three conditions hold, unless stated otherwise. k 80 k Chapter 3. Dependency-Based Search p Traversing U is not as straightforward as traversing U . For instance, f(f(0 0 1 ))g is a key class in the unsolvable instance aacaa of dlp, whose axiom is represented by ab these paths are not key classes. Thus, extending elements of key classes may lead to paths which are not element of a key class. We conclude that traversing U involves more than just extending paths. In this section we introduce the meta-operator F (N f ) which is capable of traversing U . First, we de ne F (N f ). Then we prove that each application of F (N f ) in a graph where each node represents a key class, creates only nodes representing key classes. Finally, we prove through induction that each key class is created through application of F (N f ). k k fA(0 0 a) A(1 1 a) A(2 2 c) A(3 3 a) A(4 4 a)g: We can extend the only path in the key class to the paths (f f ) and (f f ). However, in both cases the equivalence classes of (0 0 1 a b) (3 3 4 a b) (0 0 1 a b) (3 3 4 a e) We de ne meta-operator F (N f ) in de nition 3.19. De nition 3.19 Let N U , with N = fC1 : : : C g, all C 6= , and n 1. Let C1 k : : : k C = C , with C 6= . Let operator f 2 U , such that 81 (key(C ) f _ key(C ) f ), and let f be an extension to a path P 2 C . We then say that f is valid in N . F (N f ) is applicable if and only if f is valid in N and there is no proper subset M of N , such that f is valid in M . If F (N f ) is applicable, then F (N f ) = P (f )] . Furthermore, F ( f ) is applicable if and only if f (Ax) is de ned. In those cases, F ( f ) = f(f )g. An informal interpretation of F (N f ) is as follows. Operator f can only be applied to states containing all elements of f . Each element C of the set of key classes N contributes one or more attributes of f , implying that f depends on or is preceded by the key operators of each C . If all operators in the C can be combined without con icts (i.e., the merge of all C is not empty) and paths in the merge extended with f are applicable, then F (N f ) is applicable. We give two examples. First, we look at instance aacc of dlp. Both C1 = f(f(0 0 1 ))g and C2 = f(f(2 2 3 ))g are key classes. Operator f = f(0 1 3 ), with f = fA(0 1 b) A(2 3 b)g depends on the keys of the paths of C1 and C2. Furthermore, C1 k C2 = f(f(0 0 1 ) f(2 2 3 )) (f(2 2 3 ) f(0 0 1 ))g. k n i n f i n i i pre i pre i i i ab cb ba pre ab cb cb ab De nition of the meta-operator 3.3 A formal framework for db-search 81 For both paths Q1 and Q2 in the merge, Q1 (f ) and Q2 (f ) are applicable to Ax. Finally, F (fC1g f ) and F (fC2g f ) are not valid. Thus, F (fC1 C2 g f ) is applicable. Second, we look at the production system P 0 of section 3.1. In P 0 , each of the applications of r0 : : : r9 results in a key class of one element, which we name C0 through C9 . Rule r10 depends on each of the applications of r0 to r9 to have been executed. Thus, F (fC0 : : : C9 g r10) is applicable and yields the solution of P 0 . In theorem 3.2 we prove that each application of meta-operator F (N f ) creates a key class. Before we give the proof of theorem 3.2, we prove lemmas 3.3 and 3.4. Soundness of the meta-operator Lemma 3.3 Let P = (f : : : f ) be a path applicable to S . Let f 6 f ^ f 6 f . Then (f : : : f ; f f f : : : f ) is also a path applicable 1 n i i+1 to S . i i+1 1 i 1 i+1 i i+2 n Let f ;1 (: : : (f1(S )) : : :) = T . Then f (T ) is de ned, and f T . Since f 6 f +1 we know that f \ f +1 = . Thus, f +1 T and f +1 (T ) is de ned. Furthermore, f 6 f +1 implies that f +1 \ f = . Therefore f (f +1(T )) is de ned. Since f (f +1 (T )) = f +1(f (T )) according to lemma 3.1, (f1 : : : f ;1 f +1 f f +2 : : : f ) is a path. 2 pre i i i i add i pre pre i i i i i i del i i pre i i i i i i i i i i n Proof in C . Then the following two statements are true. Lemma 3.4 Let C be a key class with key f . Let P = (f : : : f ) be a path n 1 n 1: 81 2: 81 i i n n ;1 9 ;1 (f j>i n f _f f ) 6 f ^f 6 f ) (f i j i j i n i Proof 1. Suppose that there exists an f , with 1 p n ; 1, such that 8 (f 6 f ^ f 6 f ). Then, by repeated application of lemma 3.3, we can move f to the end of P . However, this contradicts the assumption that C is a key class. Thus, 81 ;1 9 (f f _ f f ). p j>p p j p j p i n j>i i j i j 82 add n pre p Chapter 3. Dependency-Based Search 2. Suppose that there exists an f , with 1 p n ; 1, such that f f . Then f \ f 6= . Let x 2 f \ f . Then, by monotonicity, x 62 S and x 62 f , for all i < n. Thus, f is only applicable if p n, which is a contradiction. Thus, f 6 f . Suppose that there exists an f , with 1 p n ; 1, such that f f . Then f \ f 6= . Let x 2 f \ f . Then, by de nition 3.2 x 2 f , and either x 2 S , or x 2 f for exactly one i < p, but not both. And thus, x 62 f for all j p. Thus, f is only applicable, if n p, which is a contradiction. Thus, f 6 f . Therefore, 81 ;1 (f 6 f ^ f 6 f ). 2 p n p add n pre p p add i n p p n p pre n del p pre n del p pre p add i add j n n p i n n i n i Theorem 3.2 If F (N f ) is applicable, then F (N f ) is a key class. Proof Consider arbitrary P 2 F (N f ) and suppose that key(P ) 6= f . Then either key(P ) = key(P ) for some P in a class in N or key(P ) is a non-key operator f in a path in some class in N . The rst case leads to a contradiction, since key(P ) f according to de nition 3.19, which contradicts lemma 3.4. The second case also leads to a contradiction, since P ] is a key class, and from lemma 3.4 it follows that f precedes or supports at least one operator f in P and thus cannot be the key in P . We conclude that the assumption key(P ) 6= f is invalid, thus F (n f ) is a key class with key f . 2 i i j i i j k i i Completeness of the meta-operator In this section we prove by induction that each key class can be created through applications of F (N f ), as formulated in theorem 3.3. Before we present the proof of theorem 3.3, we prove lemmas 3.5 and 3.6. Lemma 3.5 Let P be a path applicable to S , and f 2 P . Then there is a path Q applicable to S , such that 1. Q consists of exactly the operators in Anc (P ). f 2. Q] is a key class, with key f . We name Q] the key class induced by f in P . 3.3 A formal framework for db-search 83 Proof 1. Let P = (f1 : : : f ) and Q be the path consisting of the operators in Anc (P ) in the same order as they appear in P . Now let us suppose that Q is not applicable to S , i.e., there is an operator f in Q, such that f is not applicable. Then, there is an attribute x 2 f , such that x 2 f , while f 62 Anc (P ). However, then f \ f 6= , and thus f f , and thus f 2 Anc (P ) if f 2 Anc (P ). Thus, Q is applicable to S . n f i pre i i add j j f add j pre i j i j f i f 2. By de nition of Anc (P ), for each operator f 2 Anc (P ) with f 6= f , there is an operator f , such that f f _ f f . And thus, f must occur before f in any path containing both. Thus, only f may be the last operator in a path containing all operators in Anc (P ). Therefore, Q] is a key class, with key f . f i f i j i j i j i j f 2 Lemma 3.6 Let C be a key class with key f and let P 2 C be a path, with P = (f : : : f ). Let N be the set of relevant parents of f in P . Then the merge M of the key classes induced by the elements of N is non-empty, and for each path Q 2 M , Q (f ) 2 P ] . n 1 n n n Let f 2 P (1 i n ; 1) be the operator with highest index such that f is not in any path of the key classes induced by the relevant parents of f . Since P is a path in a key class, it follows from lemma 3.4 that there exists an f such that f f _ f f . If f = f , then f is a parent of f and by de nition a relevant parent. If f 6= f , then f is in a path in the same key class as f induced by a relevant parent of f . Thus, in both cases, f is in a path in a key class induced by a relevant parent of f . From this contradiction, it follows that all f 2 P are in a path in a key class induced by a relevant parent of f . Thus, the merge of all these key classes contains at least the path Q such that Q (f ) = P . From this it follows immediately that for each path Q 2 M , Q (f ) 2 P ] . 2 i i n j i j i j j n i n j n i j n i n i n n n Proof Theorem 3.3 For each non-empty key class C 2 U , there is a set N of key classes and an operator f , such that F (N f ) = C . k 84 Chapter 3. Dependency-Based Search Proof Basis. Let C = f(f1)g. Then by de nition F ( f1) = C . Induction step. We assume that each class C consisting of paths with length less than n is the result of an application of F (N f ). Let P = (f1 : : : f ) and let C = P ] be a key class, with key f . Let R be the set of relevant parents of f in P . Furthermore, let N be the set of key classes induced by the elements of R (cf. lemma 3.5). Then, from lemma 3.6 it follows that the merge M of all paths in N is non-empty, and that for each path Q 2 M , Q (f )] = C . Thus, F (N f ) = C . n n n n n 2 3.3.5 Summary In this section we have created a framework for db-search. We have shown that conventional search algorithms traverse the set of all paths U . The object of determining the state space traversed by conventional search algorithms was to create a standard for comparison with db-search. Next we have de ned the set of key classes U , which is a subset of the equivalence classes of U modulo . We have proved that U is complete, which means that all solutions in U are elements of classes in U under the conditions of monotonicity, non-redundancy and singularity. Thus, even though the cardinality of U is not larger than that of U , and often (much) smaller, all solutions are present in the smaller state space. Finally, we determined a meta-operator which can be used to traverse the smaller state space U . The meta-operator F (N f ) was de ned, and we have shown that it is both sound and complete. The former indicates that each operation of the meta-operator yields an element of the reduced state space, while the latter indicates that each element of the reduced state space can be reached by application of the meta-operator. Summarizing, we have succeeded in creating a framework which allows us to search a smaller state space, while being assured that the smaller state space contains all solutions of the original state space, and that the smaller state space is fully traversed. What remains to be done, is to describe practical algorithms for applying the meta-operator in an e cient manner. This is the topic of the next section. p k p k p k k p k 3.4 Informal description of db-search 85 3.4 Informal description of db-search In section 3.3.4 we have de ned meta-operator F (N f ). F (N f ) can be applied to a set of nodes N (each node representing a key class) and an operator f , under the three conditions that (1) the merge M of all key classes in N is non-empty (2) the concatenation of a path P in M with f is applicable to the initial state (3) for each of the key operators f of the classes in N , f f or f f . Clearly, trying all subsets N of nodes of a tree T as parameter for F (N f ), has a search complexity exponential in the number of nodes of T . In such a case, searching U may be more expensive than searching the larger set U using a conventional search algorithm. The way in which db-search traverses the search graph is designed to limit the cost of applying F (N f ) as much as possible. We present a short informal description of db-search, followed by an explanation of the application of db-search to an instance of dlp Db-search repeatedly executes levels, where each level consists of two stages. In the rst stage, named the dependency stage, only sets of nodes with cardinality 1 are selected for application of F (N f ). If new eligible sets of nodes with cardinality 1 are created during a stage, F (N f ) is applied to these sets as well. The dependency stage ends when F (N f ) has been applied to all such sets. In the second stage, called the combination stage, sets of nodes with larger cardinality are considered. A node A created during the combination stage may not be element of a set N to which F (N f ) is applied during the same stage. This ensures that the computationally expensive combination stage does not continue any longer than is strictly necessary. We remark that during each combination stage of db-search, we only perform preparatory work for application of F (N f ). We create a combination node A for each set of nodes N , such that at least one f exists allowing the execution of F (N f ). During the dependency stage of the next level, f will be executed from A. Thus, nodes created during the combination stage do not themselves represent elements of U , but are aids to a clear implementation. They correspond to the merge of the classes represented by the nodes in N . In the following, we describe the application of db-search to instance aaccadd of dlp. Figure 3.1 shows the search graph after executing the rst dependency stage for axiom aaccadd. In each child node we have capitalized the letter which has been created through the last applied operator. In each of the 1-ply nodes of the tree four operators are applicable. However, none of these i i i k p k 86 Chapter 3. Dependency-Based Search aaccadd Eccadd Bccadd aaBadd aaDadd aaccaC aaccaE Figure 3.1: Search graph after 1st dependency stage for theorem aaccadd. aaccadd Eccadd Bccadd aaBadd aaDadd aaccaC aaccaE BBadd Figure 3.2: Search graph after 1st combination stage for theorem aaccadd. correspond to an application of meta-operator F (N f ), since the operator f does not depend on the operator leading to the 1-ply node. To clarify this, we look at the node representing theorem Eccadd. The rules cc ! bjd and dd ! cje are applicable and correspond to the operators f(2 2 3 ), f(2 2 3 ), f(5 5 6 ) and f(5 5 6 ). Neither of these operators depends on the operator f(0 0 1 ) which has led to the creation of this node. Therefore, the metaoperator is not applicable in node Eccadd. Having nished the rst dependency stage, we proceed with the rst combination stage. In dlp, each precondition set of an operator consists of two attributes. As a result, during the combination stage only combinations of exactly two nodes need to be considered. Figure 3.2 shows the search graph for our instance of dlp after nishing the rst level of db-search. It was created by examining all 15 combinations of two 1-ply nodes, to see if the combination of two nodes would lead to a valid application of the meta-operator. In one case it did, resulting in the creation of node BBadd. The operators which led to the creation of the parents BBadd are f(0 0 1 ) and f(2 2 3 ). Depending on both these operators are f(0 1 3 ) and f(0 1 3 ). Thus, two operators are applicable in BBadd, for which reason the combination node representing theorem BBadd was created. Next, we execute the dependency stage of the second level of db-search. For this stage, we apply F (N f ) to the combination node created in the rst level. The application of f(0 1 3 ) and f(0 1 3 ) from the combination lead to the creation of Aadd and Cadd. From Aadd we can apply two more cb cd dc de ae ab cb ba bc ba bc 3.4 Informal description of db-search aaccadd 87 Eccadd Bccadd aaBadd aaDadd aaccaC aaccaE BBadd Aadd Cadd Bdd Edd Figure 3.3: Search graph after 2nd dependency stage for theorem aaccadd. aaccadd Eccadd Bccadd aaBadd aaDadd aaccaC aaccaE BBadd EE Aadd Cadd D Bdd Edd A Figure 3.4: Complete dependency-based search graph for theorem aaccadd. operators which depend on the operator leading to Aadd. Thus, a total of four nodes is added in the second dependency stage. Figure 3.3 shows the search graph after the second dependency stage. For the second level of combination nodes, not all combinations of nodes in the tree need to be checked. Only combinations involving at least one node created during the second dependency stage need to be investigated. In our example this leads to a combination between second-level node Edd and rst-level node aaccaE . Using the new combination node, the third level of nodes is created, again consisting of a dependency stage and a combination stage. The complete dependency-based search graph for theorem aaccadd is 88 Chapter 3. Dependency-Based Search procedure DbSearch() od end CreateRoot(root) level := 1 while ResourcesAvailable() and TreeSizeIncreased() do AddDependencyStage(root) AddCombinationStage(root) level := level + 1 Table 3.2: Main db-search algorithm. procedure AddDependencyStage(node) if node 6= nil then if level = node.level+1 and node.type in Root, Combination] then AddDependentChildren(node) AddDependencyStage(node.child) AddDependencyStage(node.sibling) end Table 3.3: Dependency-stage algorithm. shown in gure 3.4. The graph consists of three dependency levels, and two combination levels. The third combination level is empty, which terminates the search. From gure 3.4 we see that the instance of dlp with axiom aaccadd has two solutions: single-letter theorems a and d can be created. 3.5 Algorithms In this section we present the db-search algorithms in pseudo-code. We remark that many implementation details have been omitted in the algorithms. Table 3.2 shows the main loop of db-search. Repeatedly, a level is created, 3.5 Algorithms 89 procedure AddDependentChildren(node) for operator in LegalOperators(node) do if Applicable(operator, node) then od end LinkNewChildToGraph(node, operator) AddDependentChildren(node.newChild) Table 3.4: Dependent-children algorithm. procedure AddCombinationStage(node) if node 6= nil then if node.type = Dependency and node.level = level then FindAllCombinationNodes(node, root) AddCombinationStage(node.child) AddCombinationStage(node.sibling) end Table 3.5: Combination-level algorithm. consisting of a dependency stage and a combination stage, as described in section 3.4. Table 3.3 shows the algorithm for creating the dependency stage. It is assumed that each node has a child pointer and a sibling pointer. The child pointer points to the rst child of the node, while the child's sibling pointer points to the next child, etc. This assumption explains the recursive calls in AddDependencyStage(). In the graph, we distinguish between three types of nodes: Root, Combination and Dependency. A dependency stage is started only from combination nodes, and, for the rst level, from the root. The algorithm of table 3.4 determines all operators dependent on a node and creates children for each eligible operator. The function Applicable() tests to see if the selected operator and node form a pair of parameters which is eligible for application of the meta-operator F (N f ). The second stage of each level of db-search consists of creating the combinations of independent paths. In our example algorithm (see table 90 Chapter 3. Dependency-Based Search procedure FindAllCombinationNodes(partner, node) if node 6= nil then if NotInCon ict(partner, node) then if node.type = Dependency then combination := Combine(partner, node) operators := DependingOn(combination) if operators 6= nil then AddCombinationNode(node, combination) FindAllCombinationNodes(partner, node.child) FindAllCombinationNodes(partner, node.sibling) end Table 3.6: Algorithm to nd combinations of nodes. 3.5) we have assumed that each combination consists of exactly two nodes. In the double-letter puzzle and qubic, this is indeed the case. In go-moku, combinations of up to four nodes exist. Extending the algorithm to include combinations of three or more nodes is not di cult. A disadvantage is, however, that searching for combinations of c nodes in a graph of size N has a time complexity in the order of N . Domain-speci c reductions of the complexity may often be possible. We have therefore refrained from presenting a general algorithm for combinations of other than two nodes. The algorithm of table 3.6 nds a node in the graph for a selected partner. It is checked that the nodes are not in con ict, that its type is a dependency node, and that the combination of the two nodes allows at least one application of the meta-operator. This last condition is important to prevent the creation of a large number of useless combination nodes. c 3.6 Test results Earlier, we stated that conventional search algorithms traverse U , while dbsearch traverses U . In this section we investigate through experiments on dlp the di erence in cardinality between U and U . First, we describe the four algorithms used in the experiments. Second, p k p k 3.6 Test results 91 we describe the set of test problems used for the experiments, as well as the conditions in which the experiments took place. Third, we present the results of the experiments. Selected algorithms to run the algorithms in our experiments until the complete state space has been traversed, the performance of alternatives like breadth- rst search are equivalent to the performance of dfs. The other two implemented algorithms are the domain-speci c algorithm triangle, presented in appendix A, and, of course, db-search. An advantage of db-search over triangle is that in cases where only few theorems can be deduced, db-search may search less nodes than the xed number of entries needed for triangle. A disadvantage of our implementation of db-search is that we did not implement a transposition table. However, transpositions resulting from the order in which operators are executed are non-existent in U , as they are all part of the same key class. As a result, transpositions have only a minor in uence on the performance of db-search on dlp. k dfs of which we have implemented two variants: (1) without transposition tables (dfs-), and (2) with transposition tables (dfs+). Since we intend As a conventional tree-search algorithm for our experiments, we have selected Test problems We have generated random instances of dlp. For each string length of 1 to 20, 100 strings were generated, for a total of 2000 axioms. For each of these 2000 axioms, all four algorithms were to run to completion. However, in order not to have extremely large state spaces dominate the results and to keep the required resources within practical limits, we have set limits for the state spaces examined by dfs+ and dfs-. We terminated dfs+ as soon as the tree size exceeded 100,000 nodes, while dfs- was terminated as soon as the tree size exceeded 1,000,000 nodes. Both triangle and db-search were run to completion on all selected test problems. Results The tree-size limit set for dfs+ terminated the search 26 times out of the 2000 runs. Only once did the early termination result in missing a solution. For dfs- a million nodes was insu cient to complete the search in 129 of the 2000 runs. In 24 of these, at least one of the solutions was missed. 92 20 18 16 log 2 of Chapter 3. Dependency-Based Search DFS without transpositions DFS with transpositions nodes 14 visited 12 10 8 6 4 2 0 db-search TRIANGLE 2 4 6 8 10 12 14 16 18 20 axiom length Figure 3.5: Tree size per algorithm applied to the double-letter puzzle. Db-search's most di cult problem was dbdeabbaacccddaeecda, for which it needed 3934 nodes to determine that it has no solutions. Both variations of depth- rst search did not complete the search on this axiom within their respective tree-size limits. The average number of nodes visited by each algorithm is illustrated in gure 3.5. The horizontal axis is the axiom length, while the vertical axis is the log2 of the number of nodes created. Up to strings of length 18, db-search outperforms triangle. For those strings, transpositions do not outweigh the gain db-search makes in terminating the search early if possible. Still, the time complexity of dbsearch, in particular in the combination stage of each level, is higher than for the domain-speci c algorithm. Therefore, we do not claim that db-search outperforms triangle. The trees traversed by both variants of dfs su er from a combinatorial explosion. At theorem length 20 the average cardinality of U (the size of the trees searched by dfs-) is more than 1200 times the average cardinality of U (the size of the graphs searched by db-search). As can be seen from the size of the graph traversed by dfs+, transpositions are responsible for a factor 20. The more than 60 times smaller graph traversed by db-search compared to dfs+ indicates that db-search is far more e cient on dlp than conventional search algorithms. In chapters 4 and 5 db-search has been applied to qubic and go-moku, p k 3.7 Applicability 93 resulting in signi cantly reduced state spaces, while no domain-speci c algorithm has yet been developed which does the same. 3.7 Applicability Db-search is a single-agent search algorithm. The main source of applications therefore lies within that area. In some games, such as qubic and gomoku, a restricted search concentrates on sequences of threatening moves only. If the opponent is constantly restricted to only a single reply, the state space is conceptually transformed into a single-agent state space. In those circumstances db-search may be applied to games. For details of such transformations on qubic and go-moku see chapters 4 and 5. In section 3.3.3 we have proved that U is complete if three conditions are met. While these conditions all hold for dlp, they do not hold fully in domains such as qubic and go-moku (i.e., after the transformation to a single-agent state space). As a result, U may neither be sound nor complete. Searching a non-complete U may still be favorable to searching U , if the size of U prohibits full investigation. However, further research is necessary to understand the implications of applying db-search to such domains in general. k k k p p 94 Chapter 3. Dependency-Based Search Chapter 4 Qubic In chapters 2 and 3 two new search techniques, pn-search and db-search, were introduced. Pn-search attempts to use non-uniformity in and/or trees to traverse the state space more e ciently than the various conventional search algorithms. Db-search traverses a smaller graph than conventional search algorithms. Still, for a special class of problems it has been shown that the smaller graph is sound and complete. This means that each solution found by a conventional search algorithm will also be found by db-search. Pn-search and db-search were developed during the investigation of several games: connect-four (Allis, 1988), awari (Allis et al., 1994), qubic (Allis and Schoo, 1992) and go-moku (Allis et al., 1993). The application of pn-search and db-search to qubic and go-moku are discussed in this and the next chapter. The purpose of these chapters is twofold: 1. to explain in detail how pn-search and db-search were applied to two combinatorially complex problems, and 2. to show that qubic and go-moku can be solved, thereby positively answering our rst research question (cf. section 1.4) for two speci c games. At this point it is important to mention that qubic was solved more than a decade before we started our research. Oren Patashnik solved qubic in 1977 and his solution was con rmed by Ken Thompson (Patashnik, 1980). Our interest in qubic sprang from its potential as a test bed for go-moku, due to the similarity between these two games. While threat sequences (see section 4.2.2) play an important role in both games, threat sequences in gomoku are more complex than threat sequences in qubic. 95 96 Chapter 4. Qubic Being ignorant of Patashnik's work, there was the added challenge of solving the game. After we were informed of Patashnik's work by Ingo Althofer and Ralph Gasser, we nevertheless decided to nish our work on the game. The experience gained has helped to solve go-moku, while it also provided the means for a comparison of db-search and pn-search with the search techniques applied by Patashnik. The chapter is organized as follows. In section 4.1 we provide a background to the investigations in qubic. The rules of qubic and common strategies are presented in section 4.2. The application of db-search to qubic is described in section 4.3. The role of pn-search in the solution of qubic transpires from section 4.4. The results of our investigations, as well as comparisons with the results of Patashnik, are presented in section 4.5. 4.1 Background Among the games of the Olympic List, qubic is one of the lesser-known games. Despite its simple rules, qubic has a severe handicap: it is played on a three-dimensional board. Therefore, visualizing sequences of moves is a di cult task for human players, while most games end in a long sequence of threatening moves requiring careful analysis. Nevertheless, at least some strong human players exist, as is apparent from Patashnik (1980), who describes how qubic is solved using a combination of human expert knowledge and a standard search algorithm. Patashnik assumed that qubic would be a rst-player win. Therefore, to prove a win in a position with white (the rst player) to move, only one winning move had to be selected. To prove a win in a position with black (the second player) to move, all moves had to lead to wins for white. Using a standard ; search, Patashnik created a tactical module which determined in a given position whether the player to move had a forced win. For each position in the search tree, it was determined whether the player to move had to make a forced move. Otherwise, if black was to move, for each legal black move a child position was created. If white was to move, a so-called strategic move had to be made. These moves were selected by hand by Patashnik. Using some 1500 hours of cpu time, and 2929 strategic moves, qubic was solved. The database with the solution tree has been checked by Ken Thompson, who con rmed Patashnik's results. Our research in 1991 consisted of creating a tactical module based on db-search. Furthermore, instead of selecting strategic moves by hand, pnsearch guided the search process. After the program was created we were 4.2 Rules and strategies 97 informed that qubic had already been solved. Nevertheless, as qubic was not yet removed from the Computer Olympiad, we nished our solution in collaboration with Patrick Schoo. Since then our understanding of db-search has improved, resulting in a new implementation of our qubic program. In this chapter we describe the 1993 implementation and its results, which di er somewhat from Allis and Schoo (1992). In earlier publications (Allis and Schoo, 1992 Allis et al., 1993) we used the term threat-space search for the application of db-search to qubic and gomoku. In this text we only use the term db-search. We gladly acknowledge that both names were suggested by Barney Pell. 4.2 Rules and strategies Qubic is a three-dimensional instance of a category of games of which wellknown two-dimensional analogs are tic-tac-toe, go-moku and renju. First, we present the rules in section 4.2.1. Second, in section 4.2.2 we discuss the role of threats and threat sequences in qubic. Finally, we analyze the automorphisms (i.e., mappings of the playing board onto itself, such that all relevant properties of the board are preserved) of the qubic board and its two di erent types of cubes in 4.2.3. 4.2.1 Rules Qubic is played on a 4 4 4 cube, thus consisting of 64 small cubes. Players move alternately by occupying any empty cube. The game ends as soon as one of the players has occupied four consecutive cubes in a straight line (either in one, two or three dimensions). Such a set of four cubes in a straight line is called a group. There are 3 16 = 48 one-dimensional groups, 3 8 = 24 two-dimensional groups and 4 three-dimensional groups, for a total of 76 groups. In gure 4.1 the three di erent types of groups are shown. Group a is onedimensional, group b is two-dimensional, while group c is three-dimensional. 4.2.2 Threats and threat sequences If a player has occupied three cubes in a group, with the fourth cube empty, she threatens to win at her next move. In such a position, the opponent is forced to refute the threat (unless she can win at her next move). The game 98 b a a a a Chapter 4. Qubic c b c c b c b Figure 4.1: Three types of groups in qubic. is usually decided by a player creating a threat sequence ending in a double threat, which cannot be stopped by the opponent. In gure 4.2 an example winning threat sequence in a single plane is shown. White has occupied three cubes in the plane (in the corners), while black has played her moves elsewhere (i.e., in other planes). White now has an 11-ply winning threat sequence starting with moves 1 through 9 in gure 4.2. After move 9, white threatens to win at a and b, which cannot both be countered by black's next move. In general, a threat sequence may end in one of three possible ways. First, a double threat may be created, resulting in a win for the attacker. Second, the attacker may run out of threats. Third, the forced moves of the defender may result in her accidentally creating a threat of her own, and changing her role from defender to attacker. If a threat sequence ends without success for the attacking player, she has normally exhausted most of her threat potential, reducing her winning chances. Therefore, early in the game, both players try to occupy cubes which increase their potential for creating threats, without actually executing those threats. 4.3 Applying db-search 1 a 5 8 6 3 4 7 b 2 9 99 Figure 4.2: An 11-ply winning threat sequence. 4.2.3 Cube types and automorphisms The 64 cubes fall into two categories. The 8 corner cubes and 8 center cubes are named 7-cubes, as each is part of 7 groups (3 one-dimensional groups, 3 two-dimensional groups and 1 three-dimensional group). The other 48 cubes are called 4-cubes as they are part of four groups only (3 one-dimensional groups and 1 two-dimensional group). The number of automorphisms in qubic is surprisingly high: 192. This can be explained as follows. By rotation, each of the six sides of the cube can be brought on top in four di erent ways, resulting in a total of 24 automorphisms by rotation. There are three more operations, each doubling the number of automorphisms. First, re ection in a plane through the center of the cube. Second, turning the cube inside out, i.e., exchanging (in all three dimensions) the inner planes with the outer planes. Third, internal exchange, i.e., exchanging the inner planes in all three dimensions, while leaving the outer planes untouched. Due to the automorphisms, there are only two distinct opening moves in qubic, one at any 7-cube, and one at any 4-cube. After White's rst move at a 7-cube, black has 12 distinct answers, as presented in gure 4.3. Each of the empty 51 cubes in the gure can be mapped to at least one of the 12 black cubes, through at least one of the automorphisms of qubic. 4.3 Applying db-search As mentioned before, threat sequences play a dominant role in qubic. Obviously, to play qubic well, it would be advantageous to have a module which determines whether a winning threat sequence exists. Our application of db-search to qubic is restricted to searching for winning threat sequences. This section consists of three parts. First, in section 4.3.1 we describe how the adversary-agent state space, when restricted to threat sequences, can be transformed into a single-agent state space. Second, in section 4.3.2 100 Chapter 4. Qubic Figure 4.3: The 12 two-ply moves. we illustrate how the single-agent state space thus created for qubic ts in the framework for db-search presented in chapter 3. Third, in section 4.3.3 we discuss three properties of the single-agent state space for qubic which have not been included in the framework of section 4.3.2. For each of these properties there is an explanation of how our implementation of db-search handles them. 4.3.1 A single-agent search in qubic Our description of the single-agent state space of threat sequences in qubic consists of a set of de nitions, an interpretation of the de nitions, and the transformation of the adversary-agent state space to a single-agent state space. De nitions In the previous sections we informally introduced the concept of threats, threat sequences and winning threat sequences in qubic. These notions are de ned in de nitions 4.1, 4.2 and 4.3. 4.3 Applying db-search 101 position such that 1. The defender cannot win at her next move, and 2. The defender has at most one move stopping the attacker from winning at her next move. De nition 4.1 A threat in qubic is a move by the attacker leading to a If a threat leaves the defender without any moves to stop the attacker from winning at her next move, it is called a double threat, otherwise the threat is called a single threat. any sequence of moves such that each a , 1 i n is a single threat, and each d the single response to a which does not lose immediately, i i i De nition 4.2 A threat sequence (a1 d1 a2 d2 : : : a d ), with n 1, is n n De nition 4.3 A winning threat sequence in qubic is a sequence of moves n n n n n n n n (a1 d1 : : : a d a +1 d +1 ), such that (a1 d1 : : : a d ) is a threat sequence, a +1 is a double threat and d +1 is any legal move. Interpretation Here we elaborate on the de nitions presented above, interpreting them in the context of groups. To win in qubic, a player must occupy all four cubes in a group. Thus, a player who occupies three cubes in a group, while the last cube is empty, threatens to win. According to de nition 4.1, such a move is only a threat if the opponent has not obtained three cubes in a group herself. In other words, a threat consists of a local property for the attacker (i.e., the state of one speci c group) and the global lack of a similar property for the defender (i.e., no group on the board having the property). In a threat sequence, each attacker move occupies the third attacker cube in a group, while the fourth cube is empty. Each defender move occupies the fourth cube in that group. In each case, the defender has no alternative move which wins immediately and, although the rules of qubic allow playing anywhere else, alternative moves are blunders as they would result in losing at the next move. In other words, a threat sequence consists of a sequence of moves where each attacker move is followed by its only non-blundering reply. A winning threat sequence is a threat sequence followed by a double threat and any legal move. Since there are at least two places where the attacker threatens to win at the next move, and the defender cannot win 102 Chapter 4. Qubic herself immediately, all moves are equally bad. Therefore, any legal move may be selected. Adversary-agent vs. single-agent As we have seen, in threat sequences and winning threat sequences each move by the defender is implied by the previous attacker move. Therefore, we may conceptually merge these two moves into a single meta-move. If we examine the state space created by these meta-moves, it is no longer an adversary-agent state space, but instead a single-agent state space. For each meta-move, the attacker selects any of the possible threats in a position. If the threat is a single threat, the move by the opponent is implied by the previous move. If the threat is a double threat, all moves by the opponent are equally bad, and a random move may be selected to represent all possible moves. In both cases the defender has no real choice, e ectively transforming the state space into a single-agent state space. In the remainder of this section, we will only regard meta-moves, and assume that the attacker move and defender move in a meta-move are made at the same time. In this section, we describe a db-search framework for the single-agent state space of qubic. We mention that the framework only involves local properties, i.e., occupation of single groups, while ignoring global properties, i.e., possible counter threats of the defender. Global properties of a position will be handled in section 4.3.3. The terminology introduced in chapter 3 is used throughout this section. The set U of all attributes is de ned as follows. U = fC (i x)j0 i size ; 1 ^ x 2 f gg. Attribute C (i x) represents the fact that cube i is occupied by the attacker ( ), occupied by the defender ( ) or empty ( ). The constant size equals the number of cubes on the playing board (i.e., 43 = 64). It can easily be checked that U has 192 elements. 4.3.2 A db-search framework for qubic Attributes Operators The operator f 1 c c2 c3 c4 is de ned as follows. = fC (c1 ) C (c2 ) C (c3 ) C (c4 )g f1 c pre c2 c3 c4 4.3 Applying db-search 103 f1 f1 c del c c2 c3 c4 add c2 c3 c4 = fC (c3 ) C (c4 )g = fC (c3 ) C (c4 )g f The set of all operators U is de ned as follows. U = ff 1 f c c2 c3 c4 jfc1 c2 c3 c4 g is a groupg We remark here that a group is a set of four squares which, if all occupied by one player, result in that player winning the game. In qubic there are 76 groups. For each group, the 4 elements can be ordered in 4! = 24 possible ways. Thus, there are 24 76 = 1824 operators in U . Since c1 and c2 can be exchanged without changing the operator, there are e ectively 912 operators in U . f f Initial state and goal states The initial state consists of exactly 64 attributes, one per cube indicating the contents of the cube. Each qubic position which is to be checked for the existence of a winning threat sequence can serve as an initial state. The set U of goal states is independent of the initial state, and is de ned as follows. g U = ffC (c1 ) C (c2 ) C (c3 ) C (c4 )g j fc1 c2 c3 c4 g is a groupg g In other words, any state in which a group exists of which three cubes have been occupied by the attacker and the fourth cube is empty, is a goal state. We remark that each meta-move starts with a move by the attacker. Therefore, a state as described here in the single-agent search, ensures that in the adversary-agent search the attacker can win at her next move. U has 304 elements and is not singular. g Properties of the qubic framework The framework we have described above is monotonous. Furthermore, we can easily restrict ourselves to non-redundant paths. If U were singular, our U would be complete. We can create a singular U = f fGg g, by de ning a special goal attribute G and operators which transform any element of U into G, which would result in a complete U . A discussion of the completeness of U would be premature, however, since we have ignored the global properties of qubic so far. g k g0 g k k 104 Chapter 4. Qubic 4.3.3 Qubic-speci c enhancements to db-search The db-search framework for qubic presented in the previous section focuses only on the local properties of threats. In this section we discuss the three global properties which need to be incorporated in db-search. Each property is described followed by the method of inclusion in db-search. Defender four In each winning threat sequence, both the attacker and defender occupy cubes. Even though the defender has no choices of which cubes to occupy, the attacker may, accidentally, force her to occupy all four cubes in a group. Such a group is named a defender four. If this happens, the threat sequence by the attacker has failed. During the dependency stage of each level of db-search, it is easy to check after each meta-move (a d), consisting of attacker move a and defender move d, whether d has created a defender four. It is su cient to investigate the 4 or 7 groups in which d lies. During the combination stage of each level of db-search, a defender four could be created by the merge of two or more paths. To detect such a defender four, all 76 groups must be investigated when creating a combination. We remark that the qubic-speci c enhancements mentioned below render the dependency-stage test for defender fours super uous and it has therefore been omitted in our implementation. Closed defender three Each meta-move results in a group containing three attacker cubes and one defender cube. Such a group is named a closed attacker three. Similarly, a closed defender three is a group containing three defender cubes and one attacker cube. A group where one player has occupied three cubes, while the fourth cube is empty are named open attacker three or open defender three. Even though closed defender threes cannot be converted into a winning group, they may represent a subtle problem. If two paths in db-search are merged they may create one or more closed defender threes on the board. Let us assume that the three defender cubes are occupied during meta-moves m1, m2 and m3, while the attacker cube is occupied during meta-move m. Furthermore, let us assume that a path P in the merge exists, consisting of the following sequence of moves: (m1 m2 m3 m4 m), where m4 is any meta-move. Then, after move m3, an open defender three exists. Clearly, 4.3 Applying db-search 105 the only way for the attacker to stop the open defender three is to immediately play move m. In P move m4 is played rst, which means that meta-move m4 erroneously ignores the option for the defender to win. We remark that (some of) the cubes in a closed defender three need not be part of a meta-move, but could be part of the initial state. Summarizing, closed defender threes present a problem when the metamove occupying the attacker cube is played later than immediately after the third defender cube has been occupied. In other words, an ordering exists between the set of meta-operators occupying the defender cubes in the closed defender three, and the operator occupying the attacker cube. During the dependency stage of db-search, to create a closed defender three, rst an open defender three must be created, otherwise the closed defender three does not pose a problem. As these are monitored separately, we can safely ignore closed defender threes during the dependency stage. During the combination stage, a merge may create one or more closed defender threes. Only paths in which the attacker cube for each closed defender three is occupied in time (i.e., not later than immediately after the third defender cube has been occupied) should be included in the merge. Determining whether a merge is non-empty may be time-consuming when fully incorporating the closed defender tests. Instead, we have implemented a simple and surprisingly e ective heuristic. Previously, for each combination node (i.e., for each merge), a path representing the merge was selected randomly. The heuristic consists of investigating whether the selected path honors the ordering criteria imposed by the closed defender threes. If so, the merge is not empty. If not, the merge is assumed to be empty. Clearly, in this way valid merges may be rejected, but invalid merges are never wrongfully accepted. To investigate the amount of error created through the use of this heuristic, we ran the program twice on a set of test positions. The rst variant of the program contained the heuristic test, while the second variant did not test for closed defender threes at all. In less than 1% of the test positions did the second variant suggest a winning line, while the rst variant failed to nd any winning line, although several times the rst variant suggested a di erent winning line. We remark that in the extra 1%, the suggested winning line may have been incorrect, due to defender threes, or may have been valid and have been accidentally rejected by the above heuristic. A non-heuristic implementation for investigating closed defender threes is expected to yield only a small gain in e cacy while causing a signi cant decrease in e ciency. Such an implementation has therefore been omitted. 106 Chapter 4. Qubic Open defender three When a threat sequence contains an open defender three, the attacker must respond to that defender three immediately or lose at the next move. During the dependency stage of db-search, only meta-moves are considered which depend on, or are preceded, by the node from which the metamove is made. Therefore, during the dependency stage it is often not possible to counter a defender three. Instead, we solve the problem of open defender threes during the combination stage. In standard db-search, to apply meta-operator F (N f ), set N must be a minimal set of key classes, such that f depends on, or is preceded by, the key operator in each of those classes. We extend the application of meta-operator F (N f ) as follows. Let F (N f ) be applicable and let P be an element in the merge of classes of N . Furthermore, we assume that (x1 : : : x ) is the priority queue of empty cubes in open defender threes in P (A), where A is the initial state. Then, we de ne F (N f ), with N = N P1, to be applicable, if (1) the key operator f1 of P1 occupies with its attacker move x1 , and (2) the merge of all elements in N is non-empty. Using the extended meta-operator, we can create combinations of paths to counter open defender threes. Clearly, a combination should only be created if F (N f ) is applicable and its priority queue of empty cubes in open defender threes is empty. In our db-search implementation for qubic we have implemented the extended meta-operator. n 0 0 0 Summary of db-search enhancements In this section we have introduced three qubic-speci c enhancements to dbsearch. The main question yet to be answered is whether the state space searched by db-search with these qubic-speci c enhancements is complete. Of course, the heuristic applied to counter closed defender threes renders the state space incomplete, but as has been argued, only a marginal number of solutions are incorrectly rejected. Without proof we state that, except for the aforementioned heuristic, our implementation of db-search is complete. In other words, in each position where a winning threat sequence exists, db-search nds a winning threat sequence, unless the meta-moves within each winning threat sequence can be reordered such that a closed defender three is countered too late by the attacker. 4.4 Applying pn-search 107 4.4 Applying pn-search To apply pn-search to qubic, we need to convert the qubic game tree into an and/or tree. This is described in section 4.4.1. The enhancements to basic pn-search adopted for our qubic implementation are described in section 4.4.2. 4.4.1 Qubic as an AND/OR tree Proof-number search (as described in chapter 2) is an and/or-tree algorithm. To apply it to qubic, we represent positions where white is to move as or nodes, and positions where black is to move as and nodes. A win for white is represented by the value true, while a draw and a win for black are represented by the value false. Thus, proving the pn-search tree means that white can win in the root position, while disproving the pn-search tree means that black can achieve at least a draw. At each or node, white is to move. At such nodes, db-search with white as attacker is used as evaluation function. If db-search nds a winning threat sequence, the node is proved, otherwise it obtains the value unknown. In and nodes, black is to move. In such nodes, db-search with black as attacker is used as evaluation function. If a winning threat sequence is found, the node is disproved, otherwise the value of the node is unknown. A node representing a position with all 64 cubes occupied, while neither player has created a winning con guration, is a draw and therefore has value false, without applying the evaluation function. 4.4.2 Enhancements The above description explains how standard pn-search is applied to qubic. However, four enhancements have been added to speed up the search. The enhancements are discussed in this section. Transpositions A dag is created instead of a tree, using the algorithm described in section 2.3.3. This ensures that if a position has already occurred in the dag, or if a position is equivalent through automorphisms to another position in the dag, the position is not investigated again. 108 Chapter 4. Qubic Threatening moves by white Pn-search favors subtrees in which the mobility (i.e., the number of choices available to a player) of one player is restricted, while the mobility of the other is enlarged. In qubic, this means that threatening moves are favored above all other moves, as they leave the opponent with just a single response. After a threatening move, and the forced response by the opponent, again threatening moves are favored above all other moves, and so on. Thus, pnsearch automatically focuses on the space of threatening moves. This is undesirable for pn-search in qubic, since the evaluation function (db-search) will already have investigated whether a winning threat sequence exists. If such a sequence does not exist, the potential for threats should be increased, instead of decreased by executing them. Therefore, in our pn-search tree, we have restricted white to non-threatening moves, simply by omitting moves which create a threat in the move-generation module. For black, of course, all moves are investigated. Heuristic (dis)proof number initialization In chapter 2 we have suggested several methods to include some domainspeci c knowledge in the initialization of proof and disproof numbers. Here we describe our qubic-speci c initializations. After expansion of an and node (black to move), usually many nodes are proved immediately by db-search. Nodes in which black has just created a threat, however, are not proved immediately, because white is forced to counter the threat. A good estimate of the number of nodes which must still be proved at an and node is the number of threatening moves black can make. Therefore, the proof number of an and node is initialized to the number of threatening moves for black (with a minimum of 1), while the disproof number is initialized to 1. After expansion of an or node (white to move), usually several nodes are disproved immediately by db-search. Moves which create potential threats (named positional moves), however, are usually not disproved immediately. We determine the number of positional moves using the following heuristic. For a move M we consider the set of groups G which contain M , while not containing any black cubes. M is named positional if G consists of at least three groups, each containing zero or one white cubes (besides M ), or at least two groups, each containing at least one white cube (besides M ). At or nodes, the disproof number is initialized to the number of positional moves for white (with minimum 1), while the proof number is set to 1. 4.5 Solving qubic 109 Removing solved terminal nodes In section 2.3.1 it was described how solved subtrees in a pn-search tree may be removed. Such a technique has disadvantages when applied to a dag instead of a tree. Assume that a node J has been solved and is subsequently removed from the dag. If in another subtree a new instance of node J is created, the work to solve J will be duplicated. The decision of which solved nodes to remove may depend on the size of the working memory and the probability that this scenario will occur. Generally, nodes which have been solved with little e ort may be removed with less cost than nodes which have been solved only after a large search. We have decided to remove nodes from the dag only if they were solved through evaluation. As evaluations of nodes require only a small amount of time, the reduced memory requirements were judged to outweigh the cost of re-evaluation for the terminal nodes which occur more than once in the search. In our experiments the memory requirements were thus reduced by approximately 70%. 4.5 Solving qubic In this section we describe how we solved qubic. First, in section 4.5.1 we describe how we subdivided the game tree into 195 subtrees. Second, in section 4.5.2 we present statistics on solving each of the 195 subtrees. Third, we compare our results with those of Patashnik (1980) in section 4.5.3. Finally, in section 4.5.4 we discuss the reliability of our results. In this section we explain why and how we subdivided the qubic game tree in 195 subtrees. First, we explain why this was necessary. Second, we show how we subdivided the game tree into four-ply subtrees. Third, we explain how each of the four-ply subtrees was investigated. 4.5.1 Subdividing the game tree Necessity of subdividing the game tree Before white can create a threat, she must have occupied two cubes in the same group. After the threat is executed by white and countered by black, white has three cubes in one group together with a black cube. To create a new threat she must have occupied at least one other cube. Thus, winning 110 Chapter 4. Qubic threat sequences can only be found in positions with at least six cubes (three white and three black) on the board. As we have seen in gure 4.2, in some positions with exactly three white cubes, winning threat sequences exists. From the above, it follows that any evaluation by db-search of positions with 0 to 5 cubes occupied will return the value unknown. Furthermore, the number of children per qubic position at level d equals size-d. Therefore, the rst 5 ply of the qubic game tree, using evaluation function db-search, has a uniform branching factor per level of the tree. Executing pn-search for the full game tree (with the root representing the empty board) will be ine ective, as pn-search relies on non-uniformity. For this reason, we decided to split the game tree into subtrees. Selecting a minimal set of subtrees The subtrees each represent positions 4-ply into the game. A depth of four was selected since it was deep enough to overcome the uniformity problem for pn-search mentioned above, while it required the selection of only 13 strategic moves by hand (i.e., one move for the initial position, and 12 moves in the twelve 2-ply positions of gure 4.3) thus leaving as much work to pn-search as possible. Starting from the empty board, we suggested a move for white. Since there are only two distinct moves, one at a 4-cube, and one at a 7-cube, we selected the 7-cube move as white's best chance for winning. As shown in gure 4.3, black has 12 di erent rst moves. Thus, at ply two we have 12 positions to solve. In each of these positions we suggested a move for white. In Patashnik (1980), moves at 7-cubes were selected, such that the number of di erent resultant positions (after applying automorphisms) was as small as possible. There, 7 three-ply positions are presented. To obtain the 7 three-ply positions, in each of the 12 two-ply positions, white played in a one-dimensional group containing white's rst move. Since white's rst move at a 7-cube is an element of 3 one-dimensional groups, it is possible to select such a move with the extra constraint that black's rst move is not an element of the same group. Using this approach, it turns out that there are eight di erent ways in which the 12 two-ply positions are reduced to 7 three-ply positions. We represent a three-ply position by a three-tuple < w1 w2 b1 >, with w1 and w2 the cube number of the white stones, and b1 the cube number of the black stone. The cube numbers for each of the 64 cubes of the qubic board are shown in gure 4.4. The eight ways to create 7 three-ply positions is as 4.5 Solving qubic 0 4 8 12 13 9 14 5 10 15 1 6 11 2 7 3 111 16 20 24 28 29 25 30 21 17 22 26 31 18 23 27 19 32 36 40 44 45 41 46 37 33 38 42 47 34 39 43 35 48 52 56 60 61 57 62 53 49 54 58 63 50 55 59 51 Figure 4.4: Cube numbers on the qubic board. follows. For each of these eight groups of 7 three-ply positions, we have created the set of all four-ply positions. Since there are 61 legal moves per position, initially 427 four-ply positions were created. After applying automorphisms, however, 195, 195, 217, 217, 226, 226, 241 or 241 positions remain, depending on the group of three-ply positions. The 3-ply position < 0 3 1 > looks bad for white, since black has blocked the potential white threats. Therefore, the group expanding to 195 four-ply positions of which < 0 3 1 > is an element is ignored. The remaining group of 7 three-ply positions expanding to 195 four-ply positions is listed below. < 0 3 12 > < 0 3 21 > < 0 3 60 > ((< 0 3 5 > < 0 3 29 >) _ (< 0 3 20 > < 0 3 24 >)) ((< 0 12 1 > (< 0 3 28 > _ < 0 3 61 >))_ (< 0 3 28 > (< 0 3 1 > _ < 0 12 1 >))) < 0 3 12 > < 0 3 20 > < 0 3 21 > < 0 3 24 > 112 Chapter 4. Qubic < 0 3 60 > < 0 3 61 > < 0 12 1 > The same group of three-ply positions was selected by (Patashnik, 1980). Since each 7-cube move is also an element of three two-dimensional groups, we could instead try moves at 7-cubes in the same two-dimensional group as the rst white move. Again, the 12 two-ply positions can be reduced to 7 three-ply positions, this time in seven di erent ways, all of which have been listed below. < 0 15 51 > < 0 15 21 > < 0 60 3 > < 0 51 6 > < 0 51 5 > ((< 0 15 1 > (< 0 60 1 > _ < 0 60 7 >))_ (< 0 60 7 > (< 0 15 1 > _ < 0 60 1 >))_ (< 0 60 1 > (< 0 60 7 > _ < 0 51 7 > _ < 0 15 1 >))) The number of four-ply positions grown from each of these groups is 219, 229, 229, 229, 229, 240 and 240. We have selected the same set of three-ply positions as Patashnik (1980), since it yields the smallest set of four-ply positions. This choice also allows us to compare his results with ours. Investigating the subtrees Pn-search has been applied to all but two of the 195 four-ply positions. The two remaining positions have the property that the two black stones lie within a group G1 which intersects the group G2 containing the two white stones. By playing at the intersection c of G1 and G2, either player can create a threat and counter a potential threat by the opponent at the same time. Therefore, move c is regarded as a strong move for white. However, in pnsearch we explicitly forbid white to create threats. In these two positions, this heuristic deprives white of her best move, and allows black to gain counterplay. Therefore, in these two positions we played white's third move at c and countered the threat with black's third move before applying pnsearch. Tests on these two four-ply positions showed that one position was quickly solved through an alternative third move for white, while the pn-search for the other position was terminated after a dag of a million nodes had been created. In the latter case, these tests suggest that playing at intersection c may be white's only path to a win. 4.5 Solving qubic 113 In this section we present the statistics of running pn-search on the 195 positions described in the previous section. We distinguish between execution time, pn-search dag size, db-search evaluations and solution size. We also present an example winning line. 4.5.2 Statistics Execution time All experiments were run on a sparcstation 2 at the Vrije Universiteit in Amsterdam. The machine has 128 Megabytes of internal memory, allowing pn-search trees of up to 1 million nodes to t in internal memory, without slowing down the search by swapping to disk. The sparcstation 2 is estimated to have an execution speed of 28 mips. The cpu time needed for the 195 positions (193 four-ply positions and 2 six-ply positions) was 55,700 seconds, or roughly 15.5 hours. Pn-search DAG size The pn-search tree size is the number of nodes created during the search. Since no nodes are removed from the dag once created, this equals the size at termination. We remark that terminal nodes solved by db-search are not included in the dag, as described in section 4.4.2. The smallest pn-search dag consisted of 884 nodes, while the largest consisted of 310,000 nodes, with the median at 4,000 nodes and the average at 10,000 nodes. Only one other dag was larger than 60,000 nodes, at 118,000. These two di cult positions are < 0 3 1 7 > (118,845 nodes) and < 0 3 21 22 > (310,424 nodes). (The positions are described by the two cubes containing white stones, followed by two cubes containing black stones.) Db-search evaluations A total of 3.5 million positions were evaluated with db-search, for white to move, and 0.9 million positions for black to move. Comparing the total number of evaluations, 4.4 million, with the sum of the sizes of the pn-search dag's created, 2.0 million, it follows that not creating nodes for the solved positions in the tree shrinks the tree to be held in memory by a factor of almost 3.2. Each time db-search created a dag of 500 nodes or more it was reported. This occurred just 241 times, out of over 4 million evaluations. Among these, 114 Chapter 4. Qubic depth positions 0 1 2 12 4 195 6 2000 8 1426 10 1074 12 772 14 573 16 345 18 142 20 62 22 36 24 8 26 4 28 4 total 6654 Table 4.1: Number of positions in the qubic solution 31% were successful evaluations. The largest successful evaluation took 2,008 nodes, while the largest failed evaluation took 3,153 nodes. Creating the dbsearch dag of 3,153 nodes took less than 5 seconds cpu time. Solution size The solution tree we found for qubic consists of a set of positions with white to move, and a winning move for each of these positions. The number of positions at each depth of the tree is shown in table 4.1. A deep winning line Our approach to solving qubic makes it di cult to determine the length of the winning line constituting optimal play by both sides. First, db-search does not search for the shortest winning threat sequence, but terminates as soon as any winning sequence is found. Second, pn-search does not search for the shallowest solution, but for one which reduces the work still to be done 4.5 Solving qubic 1 26 8 7 28 6 9 3 115 17 5 12 14 16 18 2 21 19 4 20 31 32 15 22 24 10 25 33 23 29 30 11 13 27 Figure 4.5: A deep winning line. to complete the proof. Therefore, the 4 lines of depth 28, as shown table 4.1, followed by the winning threat sequence found by db-search are not necessarily the longest lines with optimal play by both sides. Nevertheless, the winning line shown in gure 4.5, consisting of 33 ply, is one of these four. Below follows a short analysis of the game. The rst four ply consist of white and black occupying 7-cubes. White 5 comes somewhat as a surprise: white occupies a 4-cube to block the potential threat created by black. Black 6 similarly blocks white's potential threat. With white 7 two more potential threats are created, of which one is countered by black 8. White 9, again at a 4-cube, creates several opportunities for white to win through a threat sequence. Black then starts creating threats up to black 28, each of which is followed by a forced move by white. White 29, countering black 28, regains the initiative for white by creating a threat. After black's forced reply, white creates a double threat with white 31 and wins with white 33. We remark that while this may be the line of play where black postpones the end as long as possible, after white 9 all white had to do is counter threats created by black. The rst time white had to select a move again, she had 116 Chapter 4. Qubic many options to win, of which white 31 is the simplest way. Therefore, from the point of view of human players, playing white in this line only requires skill up to white 9. We remark that other lines exist in the solution to qubic which require more strategic moves by white, although the winning line is shorter. 4.5.3 Comparison with Patashnik In this section we compare our solution with that of Patashnik. This comparison is not meant to criticize Patashnik's work in any way. On the contrary: his ability to solve qubic in the late 1970s constrained by the computing resources of that time should be regarded as one of the more impressive achievements in games research. The goal of our comparison is only to obtain information on the performance of db-search and pn-search. First, we compare the performance of pn-search in selecting strategic moves with that of Patashnik as strong human player. Second, we compare the performance of db-search with that of the forced-sequence searcher used by Patashnik. Third, we summarize the results. Pn-search vs. human expert As stated in section 4.5.1, we have researched the same 195 four-ply positions as Patashnik (1980). Patashnik de nes a strategic move as a non-obvious move for white, thus excluding moves suggested by the tactical search, and excluding forced moves for white when countering threats made by black. To compare our results with Patashnik's we must exclude all forced moves for white from the 6,654 moves in our solution to qubic. The number of strategic moves per depth, for both Patashnik and our solution, are shown in table 4.2. From table 4.2 it follows that Patashnik made 10% fewer strategic moves than pn-search. For Patashnik, making the strategic moves was a bottleneck in solving qubic, as each strategic move was made by hand. Consequently, minimizing the number of strategic moves was a major concern in his research. Therefore, we feel that pn-search, while not explicitly trying to minimize the number of strategic moves in our solution to qubic, has performed well. 4.5 Solving qubic 117 level Patashnik pn-search 0 1 1 2 12 12 4 195 195 6 1448 1960 8 788 668 10 309 248 12 110 113 14 51 41 16 15 14 18 0 2 total 2929 3254 Table 4.2: Number of positions in qubic solution per depth. Db-search vs. forced-sequence search Before we can compare the total amount of cpu time spent by Patashnik with our results, we must allow for the di erent types of machines used. Although it is di cult to compare such vastly di erent machines, an expert indicated that if the performance had to be expressed in mips, his best estimate for the dec-10 would be between 2 and 3 mips (Witmans, 1994). Compared with the approximately 28 mips of the sparcstation 2, we assume that our machine was between 10 and 20 times faster than the hardware used by Patashnik. In our comparison we disregard the fact that today's computers are equipped with much larger memories than 15 years ago. Our rst comparison is based on the total solution time. Patashnik's solution took approximately 1500 hours, not counting time wasted on backtracking due to bad strategic moves, and computer failure. We compare this gure with our 15.5 hours of cpu time. Factoring out the di erence in machine speed, our solution is between 5 and 10 times faster than the solution found by Patashnik. As almost all cpu time is spent on searching forced sequences, for both Patashnik's solution and ours, this is a rst indication that db-search may be 5 to 10 times more e cient than a conventional forcedsequence search as implemented by Patashnik. Comparing the execution time of individual instances of Patashnik's forced-sequence search and our db-search is slightly more di cult. Patashnik 118 Chapter 4. Qubic remarks that typically his forced-sequence search took about two seconds, but occasionally as long as half an hour. He also remarks that if his strategic moves had been slightly worse, an uncontrollable combinatorial explosion would have occurred in some positions. For a second comparison, we will assume an average of two seconds per forced-sequence search for Patashnik. To simplify matters for db-search, we assume that all 55,700 seconds of cpu time were spent on db-search evaluations (disregarding the time necessary to perform pn-search, to check for automorphisms, and to nd transpositions in the pn-search dag). During this time over 4.4 million evaluations were performed, for an average of almost 80 evaluations per second. Given the di erence in machine speed, we nd that db-search is between 8 and 16 times faster than Patashnik's forced-sequence search. As a third comparison, we look at the slowest evaluation of db-search (less than 5 seconds) and the slowest forced-sequence search of Patashnik (approximately 30 minutes). This di erence implies a gain factor for dbsearch of 20 to 40 on the di cult positions. Summarizing comparison with Patashnik We conclude that applying expert knowledge (Patashnik) to solve qubic, results in a marginally smaller solution compared to applying the knowledgefree search technique pn-search. On qubic, db-search performs between 5 and 40 times better than a conventional search algorithm. In our opinion, the qubic results illustrate the strengths of both pn-search and db-search. 4.5.4 Reliability There are many sayings concerning the number of errors made by programmers, among which one of the most famous is: There is always one more bug. These bugs may vary from uninitialized variables to serious programminglogic errors. For a program the size of our qubic implementation (over 6,000 lines of C code), there may thus be some doubts about the reliability of our results. In this section we present some measures taken to ensure their correctness. The most complicated part of the program consists of the db-search implementation. During the implementation errors were made, and corrected but, of course, ensuring that this code is error free is a di cult task. Therefore, the products of db-search, viz. winning threat sequences, were independently examined for their correctness. Once a potential winning 4.5 Solving qubic 119 threat sequence was found, the program started from the initial search position and played the sequence move by move. After each move by the attacker it is investigated whether (1) the defender has a threat, and (2) the attacker has a threat at the cube suggested as next move for the defender. After the last move by the attacker, it is investigated if indeed a group of four cubes has been occupied by the attacker. If any of these investigations show that db-search made an error, this is reported. No errors have been discovered in db-search during the process of solving qubic. We conclude that the product of the most complicated part of the program is independently veri ed. The second most complicated part of the program consists of the pnsearch implementation. Fortunately, pn-search has been implemented for several di erent games, ensuring that the chances of implementation errors are much lower than for new code. Still, the search process is too complicated to monitor fully, and thus errors may go unnoticed. To examine our results, a successful pn-search produces a small database consisting of the positions in the solution tree. After we solved all 195 four-ply positions, we merged these databases. Next, we created a database-checking module. For each position in the database with white to move a successor should be contained in the database. For each position in the database with black to move, all successors are generated. A successor should either be contained in the database, or white should have a winning threat sequence as found by db-search. We have thus checked the database and found it complete. Third, our solution is consistent with the results of Patashnik (1980), but arrived at independently. In conclusion, we believe that our implementation may be regarded as reliable. 120 Chapter 4. Qubic Chapter 5 Go-Moku In this chapter we discuss the application of pn-search and db-search to gomoku. In the previous chapter we stated two goals for chapters 4 and 5, which we repeat here. The rst goal is to explain in detail how pn-search and db-search have been applied to two combinatorially complex problems. The second goal is to show that qubic and go-moku can be solved, thereby positively answering our rst research question (cf. section 1.4) for two speci c games. In several ways, qubic and go-moku are related games, with go-moku being the more complex one. The relationship between qubic and go-moku is expressed in the organization of this chapter: almost every section has a corresponding section in chapter 4. We mention this relationship for readers who are particularly interested in the application of db-search or pn-search. Comparing corresponding sections on qubic and go-moku may provide additional insight in these algorithms. The chapter is organized as follows. In section 5.1 we provide a background to investigations in go-moku. The rules of go-moku and common strategies are presented in section 5.2. The application of db-search to gomoku is described in section 5.3. The role of pn-search in the solution of go-moku is explained in section 5.4. The results of our investigations are presented in section 5.5. 5.1 Background Among the games of the Olympic List, go-moku has the simplest rules: two players (black and white) alternate placing stones on a 15 15 square lattice 121 122 Chapter 5. Go-Moku with the goal of obtaining a line of exactly ve consecutive stones of the player's color. While its roots lie in China and Japan, it is also popular in several countries of Europe and the former Soviet-Union. Part of go-moku's popularity must be ascribed to the fact that it can be played with pencil and paper, allowing it to be played virtually everywhere (including classrooms) by virtually everyone (including bored students). In Japan professional renju players (renju being a complicated variant of go-moku) have studied go-moku in detail and have stated that the player to move rst (black) has an assured win (Sakata and Ikawa, 1981). These statements are sometimes accompanied by a list of main variations, such as the 32-page analysis in Sakata and Ikawa (1981). Close examination of these analyses reveals that in each position only a small number of white moves are analyzed. For example, after black's rst move at the center of a 15 15 board, white has 35 distinct moves, of which 2 are adjacent to black's rst move, ignoring symmetrically equivalent moves. In Sakata and Ikawa (1981) only the variations after the 2 moves adjacent to black's rst move are discussed. As far as we know, prior to this work no complete proof of black's win in go-moku has been published. Until this study, all go-moku programs have been defeated at least once or been in a lost position when playing black. As an example of the latter we mention the game between the go-moku 1991 world-champion program Vertex (black) and the program Polygon (white). Vertex maneuvered itself into a position provably lost for black (Uiterwijk, 1992a). As an aside we note that Polygon played its rst move non-adjacent to the rst black stone, indicating that nding a win in such a variation may not be entirely obvious. Summarizing, go-moku is assumed to be a rst-player win but, as far as we know, no complete proof has been published nor has any go-moku program ever been shown to be unbeatable when playing black. At this point we reiterate our remark of section 4.1. In earlier publications we have used the term threat-space search for the application of db-search to qubic and go-moku. In this text we only use the term db-search. 5.2 Rules and strategies Go-moku is a two-player game, related to the well-known trivial game of tictac-toe. While in tic-tac-toe players must create a line of three consecutive markers of their color on a restricted 3 3 board, in go-moku players must create a line of ve on a practically unrestricted lattice. Through the years, several variants of go-moku have been developed, which are described in detail 5.2 Rules and strategies 123 in section 5.2.1. Next, threats and threat trees are discussed in section 5.2.2. Finally, in section 5.2.3 some insight is given into the way human go-moku experts think. In go-moku, simple rules lead to a highly complex game, played on the 225 intersections of 15 horizontal and 15 vertical lines. Going from left to right the vertical lines are lettered from a to o going from the bottom to the top the horizontal lines are numbered from 1 to 15. Two players, black and white, move in turn by placing a stone of their own color on an empty intersection, henceforth called a square. Black starts the game. The player who rst makes a line of ve consecutive stones of her color (horizontally, vertically or diagonally) wins the game. The stones once placed on the board during the game never move again nor can they be captured. If the board is completely lled, and no one has ve-in-a-row, the game is drawn. 5.2.1 Rules Go-moku variants Many variants of go-moku exist they all restrict the players in some sense, mainly reducing the advantage of black's rst move. We mention four variants. Non-standard boards In the early days the game was played on a 19 19 board, since go boards have that size. Some people prefer to think of go-moku as being played on an in nite board. However, a larger board increases black's advantage (Sakata and Ikawa, 1981), which resulted in the standard board size of 15 15. Free-style go-moku An overline is a line of six or more consecutive stones of the same color. In this variant, an overline is regarded as a win. Standard go-moku In the variant of go-moku played most often today, an overline does not win (this restriction applies to both players). Only a line of exactly ve stones is considered as a winning pattern. any way, e.g., an overline wins the game for white. However, black is not allowed to make an overline, nor a so-called double three or double four (cf. Sakata and Ikawa (1981)). If black makes any of these patterns, she is declared to be the loser. Renju is not a symmetric game: to play Renju A professional variant of go-moku is renju. White is not restricted in 124 Chapter 5. Go-Moku it well requires di erent strategies for black and for white. Even though black's advantage is severely reduced, she still seems to have the upper hand. We have investigated both free-style go-moku and standard go-moku. We remark that in this chapter we discuss free-style go-moku unless it is explicitly stated otherwise. Opening restrictions In an attempt to make the game less unbalanced, opening restrictions have been imposed on black. We mention two such restrictions. Professional go-moku In professional go-moku, black is forced to make her rst move in the center of the board. White must play her rst move at one of the eight squares adjacent to black's rst move. Black's second move must be outside the set of 5 5 squares centered by black's rst stone. which are named temporary black and temporary white. Temporary black plays her rst move at the center of the board, while temporary white plays her rst move adjacent to the black stone on the board. Due to symmetry, there are only two distinct rst moves for temporary white. For each of these two, there are 12 selected squares where temporary black is allowed to play her second move. Thus, there are 24 possible 3-ply sequences in this variant. Next, temporary white may choose between playing black or white for the remainder of the game. Temporary black automatically plays the other color. Then, white plays her second move. Finally, black selects two squares for her third move and gives white the choice between these two. From there, the game continues according to the rules of standard renju. Professional renju In professional renju, the game starts with two players In our research we have investigated variants of go-moku without any opening restrictions. 5.2.2 Threats and threat trees We describe the four types of threats in go-moku, followed by a discussion of threat trees and winning threat trees. 5.2 Rules and strategies a q 125 c q c q b q e q e q d q dd qq b q e q Figure 5.1: Threats in go-moku. Threats In go-moku a threat is an important notion the main types have descriptive names: the four ( gure 5.1a) is de ned as a line of ve squares, of which the attacker has occupied any four, with the fth square empty the straight four ( gure 5.1b) is a line of six squares, of which the attacker has occupied the four center squares, while the two outer squares are empty the three ( gure 5.1c and 5.1d) is either a line of seven squares of which the three center squares are occupied by the attacker and the remaining four squares are empty, or a line of six squares with three consecutive squares of the four center squares occupied by the attacker and the remaining three squares empty the broken three ( gure 5.1e) is a line of six squares of which the attacker has occupied three non-consecutive squares of the four center squares, while the other three squares are empty. A winning pattern, i.e., a line of ve squares, all occupied by one player, is named a ve. If a player constructs a four, she threatens to win on the next move. Therefore, the threat must be countered immediately at the empty square of the four. If a straight four is constructed, the defender is too late, since there are two squares where the attacker can create a ve at her next move (unless, of course, the defender has the opportunity to win at her next move). With a three, the attacker threatens to create a straight four at her next move. Thus, even though the threat has a depth of two moves, it must be countered immediately. If an extension at both sides is possible ( gure 5.1c), then there 126 a q aa qq Chapter 5. Go-Moku b q bb qq b q c q cc qq c q c q Figure 5.2: Complicated threats. are two defensive moves: both directly adjacent to the attacking stones. If only one extension is possible then three defensive moves are available ( gure 5.1d). Moreover, against a broken three, three defensive moves exist ( gure 5.1e). We remark that more complicated threats exist, which threaten to win in two or more moves. Three examples are shown in gure 5.2, in each of which black threatens to play at the intersection of the two lines of black stones. In gure 5.2a, black threatens to create a double four, in gure 5.2b, black threatens to create a four-three, and in gure 5.2c, black threatens to create a double three. Each of these is a winning pattern. White can counter the threats of gure 5.2 in 3, 4 and 5 possible ways, respectively. In our research we have not included the patterns of gure 5.2 as threats for three reasons. First, the large number of defensive moves per threat does not combine well with our transformation of the winning threat-tree search to a single-agent search, as described in section 5.3.1. Second, recognizing threats which consist of a single line on the board can be performed more e ciently than recognizing threats which consist of combinations of lines. Third, the threats shown in gure 5.2 are only a small sample of the complete set of more complicated threat patterns, making inclusion of all possible threats of go-moku a complex task. In Uiterwijk (1992b) a program based on a large set of threat patterns is described. 5.2 Rules and strategies 127 16 2 1 15 8 4 3 7 5 12 13 17 6 8 9 7 4 12 6 10 11 9 14 3 1 2 11 10 13 5 (a) Fours only. (b) Threes only. Figure 5.3: Winning threat variations Threat trees To win the game against any opposition a player needs to create a double threat (either two threes, two fours, or a three and a four). In most cases, several threats are executed before a double threat occurs. A tree in which each attacker move is a threat is called a threat tree. A threat tree leading to a (winning) double threat in each variation is called a winning threat tree. A variation in a winning threat tree is called a winning threat variation. Each threat in the tree forces the defender to play a move countering the threat. Hence, the defender's possibilities are limited. In gure 5.3a a position is shown in which black can win through a winning threat variation consisting of fours only. Since a four must be countered immediately, the whole sequence of moves is forced for white. In gure 5.3b a position is shown in which black wins through a winning threat variation consisting of threes, twice interrupted by a white four. As mentioned earlier, white has at each turn a limited choice. During the play, she can create fours as is shown in gure 5.3b. Still, her loss is inevitable. 128 Chapter 5. Go-Moku 5.2.3 Human strategies During the second and third Computer Olympiad (Levy and Beal, 1991 Van den Herik and Allis, 1992), we observed two human expert go-moku players (A. Nosovsky, 5th dan and N. Alexandrov, 5th dan). These Russian players are involved in two of the world's strongest go-moku playing programs (Vertex and Stone System). While observing the experts, it became clear that they are able to nd quickly sections on the board where a winning threat tree can be created, regardless of the number of threes which are part of the winning threat tree. The depth of these winning threat trees are typically in the range of 5 to 20 ply. The way a human expert nds winning threat trees so quickly can be broken down into the following four steps. 1. A section of the board is chosen where the con guration of stones seems favorable for the attacking player. It is then decided whether enough attacking stones can collaborate making it useful to search for a winning threat tree. This decision is based on a "feeling", which comes from a long experience in judging patterns of stones (cf. De Groot (1965)). 2. Threats are considered, and in particular the threats related to other attacking stones already on the board. Defensive moves by the opponent are mostly disregarded. 3. As soon as a variation is found in which the attacker can combine her stones to form a double threat, it is investigated how the defender can refute the potential winning threat variation. Whenever the opponent has more than one defensive move, an examination is made to see whether the same threats work in all variations of the threat tree. Moreover, it is investigated whether the opponent can insert one or more fours, e ectively neutralizing the attack. 4. If only a few variations of the tree do not lead to a win via the same threat variation, an examination is made to see whether the remaining positions can be won via other winning threat trees. In practical play, a winning threat tree often consists of a single set of attacking moves applicable to each variation of the tree, independent of the defensive moves. We remark that the size of the state space is considerably reduced by rst searching for one side (the attacker). Only if a potential winning 5.3 Applying db-search 129 threat tree is found is the impact of defensive moves investigated. This approach is supported by the analyses given in (Sakata and Ikawa, 1981). When presenting a winning threat tree, they only provide the moves for the attacker, thus indicating that the set of attacking moves works irrespective of the defensive moves. Possible fours which the defender can create without refuting the threat tree can be neglected altogether In positions without winning threat trees, the moves to be played preferably increase the potential for creating threats or, whenever defensive moves are called for, the moves chosen will reduce the opponents potential for creating threats. The human evaluation of the potential of a con guration is based on two aspects: (1) direct calculations of the possibilities, (e.g., if the opponent does not answer in that section of the board) and (2) a so-called good shape (i.e., con gurations of which it is known that stones collaborate well). In section 5.3 we model the above thinking process in our application of db-search to go-moku. 5.3 Applying db-search As mentioned before, threat trees play a dominant role in go-moku. To play go-moku well, it would be advantageous to have a module which determines whether a winning threat tree exists. Our application of db-search to go-moku is restricted to searching for winning threat trees. This section consists of four parts. First, in section 5.3.1 we describe how the adversary-agent state space, if restricted to a subset of all possible threat trees, can be transformed into a single-agent state space. Second, in section 5.3.2 we illustrate how the single-agent state space thus created for go-moku ts in the framework for db-search as presented in chapter 3. Third, in section 5.3.3 we discuss properties of the single-agent state space for gomoku which have not been included in the framework of section 5.3.2. For each of these properties it is explained how our implementation of db-search handles them. Fourth, in section 5.3.4 heuristics are described which lead to a signi cantly improved e ciency, at the cost of a slightly reduced e cacy. Fourth, in section 5.3.5 we describe the additional requirements necessary to apply db-search to standard go-moku instead of free-style go-moku. 130 Chapter 5. Go-Moku 5.3.1 A single-agent search in go-moku Our description of the single-agent state space in go-moku consists of a set of de nitions, an interpretation of the de nitions, and the transformation of the adversary-agent state space to a single-agent state space. De nitions In the previous sections we have introduced the concept of threats, threat trees and winning threat trees. For our application of db-search to go-moku, we formally de ne the notions threat (de nition 5.1), reply (de nition 5.2), threat sequence (de nition 5.3), potential winning threat sequence (de nition 5.4) and winning threat sequence (de nition 5.5). De nition 5.1 A threat in go-moku is a move by the attacker creating a ve, a straight four, a four, a three or a broken three. A ve and a straight four are called double threats, while a four, three and broken three are called single threats. The squares related to a threat are the 5 ( ve and four), 6 (straight four, three, broken three) or 7 (three) squares in the line of the threat (cf. section 5.2.2). De nition 5.2 A reply to a threat T in go-moku is the set of defender moves R, such that each element of R counters T . Against a ve and a straight four, R is empty, against a four, R consists of one move, against a three R consists of two or three moves, and against a broken three, R consists of three moves. De nition 5.3 A threat sequence in go-moku is any sequence of moves (a1 d1 a2 d2 : : : a d ), with n 1, such that each a , 1 i n is a single threat, and each d is the reply to a . De nition 5.4 A potential winning threat sequence in go-moku is any sequence (a1 d1 : : : a d a +1 d +1 ), such that (a1 d1 : : : a d ) is a threat sequence, a +1 is a double threat and d +1 is the reply to a +1 . De nition 5.5 A winning threat sequence in go-moku is a potential winning threat sequence (a1 d1 : : : a d a +1 d +1 ), for which it has been checked that the defender cannot counter the threat sequence by: 1. interjecting a sequence of threats the attacker must respond to, leading to a win for the defender 2. interjecting a sequence of threats the attacker must respond to, leading to occupation of a square related to a threat a , before the defender has played the reply to d . n n i i i n n n n n n n n n n n n n i i 5.3 Applying db-search 131 Interpretation Here we elaborate on the de nitions presented above. De nition 5.1 de nes threats in accordance with the de nitions of section 5.2.2. The only di erence is our inclusion of the ve as a threat, and naming the straight four and the ve double threats. The reason for doing so is explained below. When a double three is created, it is assumed that the defender counters one of them, allowing the attacker to convert the remaining three into a straight four at the next move. When a double four is created, it is assumed that the defender counters one of them, allowing the attacker to convert the remaining four into a ve at the next move. When a four-three is created, depending on the threat countered by the defender, the attacker can create either a ve or a straight four. Thus, we may recognize double threats one move after they appear in the form of straight fours or ves. The de nition of a reply forms a crucial step in our conversion of the adversary-agent state space of go-moku into a single-agent state space. Human strategies imply that often threat trees are found such that in each variation the same attacking moves are played. In other words, the choice between defensive moves in such threat trees is irrelevant. We convert these threat trees to threat sequences by allowing the defender to play all defensive moves as a single reply. In gure 5.4, we have depicted such a winning threat sequence, consisting of four threats. After black 1, white has the three-move reply 2. After black 3, white has the two-move reply 4. After black 5, white has the three-move reply 6. Black 7 creates a straight four, to which the reply set is empty. Clearly, in free-style go-moku, having extra stones on the board is never a disadvantage. Thus, if a variation wins for the attacker when the defender is allowed to play replies consisting of multiple stones, then the variation wins also if the defender is forced to select one stone from each multiple-stone reply. Positions exist in which the multiple-stone reply leads to counter play for the defender, while the attacker would win in all variations through the same attacking moves if the defender were restricted to playing one stone per reply, but these are rare. A potential winning threat sequence as de ned in de nition 5.4 has investigated only local defensive moves, i.e., after each threat, it is assumed that the defender must immediately counter the threat. A winning threat sequence has also been checked for global defensive moves, i.e, that the squares not related to the threat sequence have been investigated for their in uence 132 Chapter 5. Go-Moku 2 6 4 3 1 2 4 2 7 6 5 6 Figure 5.4: White defending with multiple-stone replies on the success of the threat sequence. Adversary-agent vs. single-agent As we have seen, in (winning) threat sequences, each reply by the defender is implied by the previous attacker move. Therefore, we may conceptually merge these two moves into a single meta-move. The state space created by these meta-moves is no longer an adversaryagent state space, but instead a single-agent state space. In the remainder of this section, when discussing meta-moves, we assume that the attacker move and defender move in a meta-move are made simultaneously. 5.3.2 A db-search framework for go-moku In this section we de ne a db-search framework for the single-agent state space of go-moku, de ned in the previous section. We mention that the framework only involves local defensive moves, while ignoring global defensive moves. Global defensive moves of a position will be discussed in section 5.3.3. The terminology introduced in chapter 3 is used throughout this chapter. 5.3 Applying db-search 133 Notation Lines of ve, six and seven squares play an important role in go-moku. For notational purposes, we de ne the following sets. G5 = G6 = G7 = ff ff ff s1 s2 : : : s5 s1 : : : s5 form a line of ve squares s1 s2 : : : s6 s1 : : : s6 form a line of six squares s1 s2 : : : s7 s1 : : : s7 form a line of seven squares gj g gj gj g g We mention that on a 15 15 board, G5 has 572 elements, G6 has 500 elements and G7 has 432 elements. Furthermore, we de ne a linear order on the squares of the go-moku board, such that a1 < a2 < : : : < a15 < b1 < : : : < o15. Clearly, the outer squares of a line are always minimal and maximal within the line, with respect to this ordering. The set U of all attributes is de ned as follows. U = S (i x) a1 i o15 x . Attribute S (i x) represents the fact that square i is occupied by the attacker ( ), occupied by the defender ( ), or empty ( ). It can easily be checked that U has 675 elements. f j ^ 2f gg Attributes Operators The operator f follows. F I g5 ( ve), for g5 = s1 : : : s5 and g5 f g f 2 G5, is de ned as g f f f pre F I g5 F I g5 F I g5 add del = = = f f S (s1 ) S (s2 ) S (s3 ) S (s4 ) S (s5 ) S (s5 ) S (s5 ) g g f g The operator f 6 (straight four), for g6 = s1 : : : s6 and g6 and s1 < s2 s3 s4 s5 < s6, is de ned as follows. SF g 2 G6, f f f pre SF g6 del SF g6 SF g6 add = = = f f f S (s1 ) S (s2 ) S (s3 ) S (s4 ) S (s5 ) S (s6 ) S (s5 ) S (s5 ) g g g 134 Chapter 5. Go-Moku F O g5 The operator f as follows. (four), for g5 = s1 : : : s5 and g5 G5, is de ned f g 2 g S (s1 ) S (s2 ) S (s3 ) S (s4 ) S (s5 ) S (s4 ) S (s5 ) S (s4 ) S (s5 ) 5 The operator f 6 (broken three), for g6 = s1 : : : s6 and g6 G6, and s1 < s2 s3 s4 s5 < s6 and s4 neither minimum nor maximum in s2 s3 s4 s5 , is de ned as follows. f 6 = S (s1 ) S (s2 ) S (s3 ) S (s4 ) S (s5 ) S (s6 ) f 6 = S (s1 ) S (s4 ) S (s5 ) S (s6 ) f 6 = S (s1 ) S (s4 ) S (s5 ) S (s6 ) The operator f 2 7 (three with 2 reply moves), for g7 = s1 : : : s7 and g7 G7, and s1 < s2 < s3 s4 s5 < s6 < s7, is de ned as follows. f 2 7 = S (s1 ) S (s2 ) S (s3 ) S (s4 ) S (s5 ) S (s6 ) S (s7 ) f 2 7 = S (s2 ) S (s5 ) S (s6 ) f 2 7 = S (s2 ) S (s5 ) S (s6 ) pre F O g5 del F O g5 add FO g f f f = = = f f f g g BT g f g 2 f g pre BT g del f g BT g add BT g f f g g T g f g 2 pre T g f g del T g f g add T g f g The operator f 3 6 (three with 3 reply moves), for g6 = s1 : : : s6 and g6 G6, and s1 < s2 s3 s4 s5 < s6 and s2 either minimum or maximum in s2 s3 s4 s5 , is de ned as follows. f 3 6 = S (s1 ) S (s2 ) S (s3 ) S (s4 ) S (s5 ) S (s6 ) f 3 6 = S (s1 ) S (s2 ) S (s5 ) S (s6 ) f 3 6 = S (s1 ) S (s2 ) S (s5 ) S (s6 ) The set of all operators U is de ned as follows. U = f 5 g5 G5 f 6 g6 G6 f 5 g5 G5 f 6 g6 G6 f 2 7 g7 G7 f 3 6 g6 G6 We mention that on a 15 15 board, U contains 3076 operators, of which T g f g 2 f g pre T g f g del T g add T g f f g g f f f FI g j 2 g f SF g fT j 2 g f FO g fT g j 2 g f BT g j 2 g g j 2 g j 2 g each can be applied in more than one way, resulting in a total number of 23596 possible applications of operators. f 5.3 Applying db-search 135 Initial state and goal states The initial state consists of exactly 225 attributes, one per square indicating the contents of the square. Each possible con guration of black, white and empty squares in which neither player has occupied a line of ve can serve as initial state. The set U of goal states is independent of the initial state, and is de ned as follows. g U= g f f f f f f S (s1 s1 s2 S (s1 s1 s2 s3 s4 s5 G5 ) S (s2 ) S (s3 ) S (s4 ) S (s5 ) S (s6 ) s3 s4 s5 s6 G6 s1 < s2 s3 s4 s5 < s6 g2 g g2 ^ ) S (s2 ) S (s3 ) S (s4 ) S (s5 ) gj gj g In other words, each state containing a ve or straight four is a goal state. U is not singular. g Properties of the go-moku framework The framework we have described above is monotonous. Furthermore, we can easily restrict ourselves to non-redundant paths. If U were singular, our U would be complete. We can create a singular U , by de ning a special goal attribute G and operators which transform any element of U into G, which would result in a complete U . A discussion of the completeness of U would be premature, however, since so far we have ignored global defensive moves. g k g0 g k k 5.3.3 Go-moku speci c enhancements to db-search The db-search framework for go-moku presented in the previous section focuses only on the local defensive moves. For those moves we de ned replies such that each defender move was forced, allowing us to transform the search into a single-agent search. A search for global defensive strategies is only necessary to investigate whether a potential winning threat sequence is correct. Thus, given such a threat sequence, it should be investigated whether the defender has alternatives to the local reply to refute the threat sequence. To investigate the global defensive strategies, we perform single-agent searches, this time xing the attacker choices. After each attacker move speci ed in the threat sequence, the resultant position is investigated for a global defensive strategy by the defender. We describe the investigations in four steps. 136 Chapter 5. Go-Moku First, we de ne the threat categories, which play an important role in determining for each position the types of global defensive moves available. Second, we describe two ways in which global defensive moves may successfully counter a potential threat sequence. Third, we describe a set of parameters for db-search. Fourth, we describe how the module searching for winning threat sequences is composed of a series of db-searches. Threat categories The operators de ned in section 5.3.2 can be divided in three categories. Category 0 consists of the ve, category 1 of the straight four and four, and category 2 consists of the three and the broken three. Using these categories we can state exactly what kind of global defensive moves may be interjected by the defender while countering a threat sequence. Against a threat from category i, only threats from categories j can be used as global defensive moves, with j < i. Thus, against a ve no global defensive moves exist, against a (straight) four only a ve can serve as global defensive move, while against a three or broken three, both ves, straight fours and fours may serve as global defensive moves. The above relation between global defensive moves and threat categories can easily be veri ed by noting that each threat in category i threatens to win in exactly i moves. Global defensive strategies In section 5.3.1 we have listed two ways in which the defender may successfully counter a threat by interjecting global defensive moves. First, she may create a sequence of threats leading to a win. Second, she may create a sequence of threats leading to the occupation of a square in the threat sequence. Here we describe how db-search can be used to determine whether such a global defensive strategy exists. Our application of db-search for this purpose is such that we may erroneously decide that a defensive strategy exists, thus rejecting a winning threat sequence for the attacker, but that we will never overlook the existence of a defensive strategy. To prevent confusion arising from the terms attacker and defender in this context, we assume here that player A has found a potential winning threat sequence, and we investigate whether player B has a global defensive strategy after move a by A. Three remarks concerning the application of db-search to search for global defensive strategies for player B are in order. i 5.3 Applying db-search g 137 1. The goal set U for player B should be extended with singleton goals for occupying any square in threat a or reply d , with j i. 2. If B nds a potential winning threat sequence (i.e., a global defensive strategy against the potential winning threat sequence of A), this threat sequence is not investigated for counter play of player A. Instead, in such a case we always assume that A's potential winning threat sequence has been refuted. 3. In the application of db-search for player B , only threats of categories less than the category of the threat played by A may be applied. Thus, in a db-search for player B , only threats having replies consisting of a single move are applied. If we examine the description of db-search for B , we may nd that the search is monotonous and contains no redundant paths. As argued before, U can be easily transformed into a singular U , without a conceptual di erence in the resulting U . Since any sequence found for player B is accepted as refutation of the potential winning threat sequence of A, we claim that if application of db-search does not nd a global defensive strategy, such a strategy does not exist for player B . We stress this point as it is a vital element in the process of solving gomoku: we must ensure that in no position we accept a threat sequence as winning, if the threat sequence could be refuted. j j g g 0 k Parameters to db-search Above, we have seen that db-search is used to nd potential winning threat sequences as well as to investigate whether the defender has a global defensive strategy refuting a potential winning threat sequence. These searches are all performed by the same module, whose parameters are listed below. 1. The position to which db-search is to be applied. 2. The attacker, i.e., the player for whom a db-search is applied. 3. The goal squares, i.e., the set of squares, which, if one is occupied by the attacker, terminates the search. 4. The defensive check option. This is a Boolean value indicating whether a potential winning threat sequence should be investigated for counter play. 138 Chapter 5. Go-Moku b q a q 4 2 2 1 2 8 9 7 e q c q d q 3 5 6 Figure 5.5: White refutes a potential winning threat sequence. 5. The maximum category, i.e., only threats of this category and lower categories may be applied. The winning threat sequence module Here we present a step by step description of the winning threat sequence module, with the aid of the position in gure 5.5. To nd the winning threat sequence for black in the position before black 1 of gure 5.5, db-search may be called with (1) that position (2) attacker black (3) the empty set of goal squares (4) the defensive check option at value true (5) maximum category 2. If the potential winning threat sequence shown in gure 5.5 is found, db-search will be called ve more times, after black 1, black 3, black 5, black 7 and black 9. The parameters to db-search after, for instance, black 1 are: (1) the position after black 1 (2) attacker white (3) the set consisting of the 28 squares related to the threats black 1 (7 squares), black 3 (5 squares), black 5 (5 squares), black 7 (5 squares) and black 9 (6 squares) (4) the defensive check option at value false (5) maximum category 1. After black 5, which is of category 1, black can only use a defensive strategy involving threats of category 0, i.e., ves only. However, to create a ve after black 5, white should have created several fours after black 1 (of category 2), followed by the local defensive reply white 2. Therefore, we need to try threats of category 0 after black 5, for all positions which could arise 5.3 Applying db-search 139 after sequences of fours by white, in earlier global defensive strategy searches. Indeed, if white, instead of playing 2 immediately after 1, interjects move a (followed by black's forced reply b) and move c (followed by black's forced d), then after white 2, black 3, white 4 and black 5, white can create a ve at e. Summarizing, to nd the global defensive strategies, after each attacker move of category 2, a search for category 1 for the defender should be performed, while after each attacker move of category 1, a search for category 0 for the defender should be performed, from every position which could be reached by interjecting defender fours after previous threats of category 2 by the attacker. 5.3.4 Heuristically improving the e ciency of db-search As we have argued before, the module which searches for winning threat sequences will only return a winning threat sequence if the winning threat sequence is guaranteed to lead to a win for the attacker. The opposite is not true: not all winning threat sequences will be found. This is caused by our acceptance of a global defensive strategy, without investigating whether the defensive strategy itself can be countered. In the context of winning threat trees our search is far from complete, as we only nd winning threat sequences, i.e., threat trees in which each variation leads to a win through the same attacking moves, in the same order. In this section we present three heuristics which signi cantly increase the e ciency of our winning threat sequence module, at the cost of another (small) reduction in e cacy. Each of the heuristics, if at all applicable, is not applied during searches for global defensive strategies, in order to ensure that all existing refutations of potential winning threat sequences are found. Global refutation Our rst heuristic for increasing the e ciency of db-search is based on the existence of global refutations in some positions. A global refutation is a con guration on the board which refutes all winning threat sequences of the attacker. An arti cial example is depicted in gure 5.6. Black to move has a large number of distinct potential winning lines at her disposal, each starting with a three. For instance, black 1 creates a double three immediately. White 2, however, creates a double four, thus successfully countering the three created by black 1. Alternative lines for black, such as 140 Chapter 5. Go-Moku a q bc qq 1 2 Figure 5.6: Global refutation of all potential winning lines. black a, black b and black c, again creating a double three, are all also refuted by white 2. Thus, while db-search, focusing on local defenses, nds many potential winning threat sequences, each of these is refuted by the search for global defensive strategies. Finding all several hundreds or thousands of potential winning threat sequences in such a position is clearly a waste of time. As heuristic to recognize those positions, we check at the end of each dbsearch level the number of potential winning threat sequences investigated so far. If this number exceeds a preset threshold T , the search is terminated. Experiments showed that T = 10 leads to a largely increased e ciency, at a small cost in e cacy. We remark that while searching for global defender strategies, the rst potential winning threat sequence found is accepted as refutation. The search is therefore not in uenced by this heuristic. Category reduction The category reduction heuristic is designed for a special type of global refutations. Let us suppose that the defender has a threat T 1 of category c1 . If the attacker creates a threat T 2 of category c2 , then either (1) c2 < c1 , or (2) T 2 should counter T 1 , or (3) T 1 is a refutation of T 2 . As the search for potential winning lines does only consider local replies, countering T 1 by T 2 will only occur by accident. c c c c c c c c 5.3 Applying db-search 141 Ignoring the option that this may happen, we obtain the category reduction heuristic: if in a node N of the db-search dag, the defender has a threat of category c1 , for each descendent of N the attacker is restricted to threats of categories less than c1 . We remark that this heuristic is switched o while searching for global defender strategies. The de nitions of operator f 3 6 (three with 3 reply moves) and operator f 2 7 (three with 2 reply moves) imply that if the latter is applicable, the former is too. While in most positions where both are applicable they are interchangeable, operator f 2 7 is superior in that its reply consists only of 2 moves, thus diminishing the chances for counterplay. Only in rare occasions are both applicable, while only f 3 6 leads to a winning threat sequence. To prevent the creation of threat sequences with as only di erence the occurrence of f 3 6 instead of f 2 7 , we restrict application of f 3 6 to lines where f 2 7 is not applicable. We remark that while searching for global defender strategies, only threats of categories 0 and 1 are applicable. The search is therefore not in uenced by this heuristic. T g T g T g T g T g T g T g T g Restricted threes Standard go-moku di ers from free-style go-moku in the value of overlines: an overline is a win in free-style go-moku, while it is not in standard go-moku. To apply our winning threat sequences module, as described in the previous sections, to standard go-moku, a few additional requirements are necessary. We discuss these requirements brie y. First, we introduce the concept of a line extension. Second, we describe how a line extension in uences a db-search for potential winning threat sequences. Third, we describe the in uence of line extensions to the search for global defensive strategies. For each line g G5, a square c is an extension of g, if g c G6. Similarly, for each line g G6, a square c is an extension of g, if g c G7. We mention that the extension of a line g G7 is de ned analogously, after the set G8 has been de ned. The extension set of a line g, i.e., the set of all 2 f g2 2 fg2 2 5.3.5 Additional requirements for standard go-moku Extensions 142 Chapter 5. Go-Moku extensions of g consists of 0, 1 or 2 elements, depending on the position of g on the board, with respect to the board edge. Line extensions and winning threat sequences A winning threat sequence in standard go-moku must meet all the requirements for a winning threat sequence in free-style go-moku. An added requirement is that at the moment of execution of threat a , the squares in the extension set of a must not be occupied by an attacker stone. An attacker stone may be placed at the extension of a threat in three distinct ways. i i 1. The stone was present in the initial position. 2. The stone is played while executing an earlier threat in the threat sequence. 3. The stone is played as forced response to a defender threat. The rst and second way of placing an attacker stone at a threat extension is checked during the db-search for potential winning threat sequences: an operator can only be applied if the extension squares are empty or occupied by the defender. During the combination stage of db-search, we ignore the occupation of extensions. Instead, after a potential winning threat sequence has been found, the extensions of all threats in the threat sequence are examined. Line extensions and global defensive strategies The third way of placing an attacker stone in a threat extension provides the defender with an extra global defensive strategy. This strategy ts as follows within the parameters provided to db-search. In addition to the set of goal squares provided for free-style go-moku, the set of extensions to the threats which have not yet been executed by the attacker is passed to db-search. A refutation of the potential winning threat sequence has been found, if one of the extensions has been occupied by the attacker (i.e., the player whose potential winning threat sequence is being examined). Special attention must be paid to the multiple-stone replies. While having extra stones on the board does not harm a player in free-style go-moku, it may harm a player in standard go-moku. To ensure that each global defensive strategy is found, we perform the db-search for global defensive strategies 5.4 Applying pn-search 143 as a free-style go-moku search. Thus, a potential winning threat sequence in standard go-moku may be refuted through a sequence of defender threats containing overlines. 5.4 Applying pn-search To apply pn-search to go-moku, we need to convert the go-moku game tree into an and/or tree. This is described in section 5.4.1. Furthermore, we describe the enhancements to basic pn-search adopted for our go-moku implementation in section 5.4.2. Pn-search (as described in chapter 2) is an and/or-tree algorithm. To apply it to go-moku, we represent positions where black is to move as or nodes, and positions where white is to move as and nodes. A win for black is represented by the value true, while a draw and a win for white are represented by the value false. Thus, proving the pn-search tree means that black can win in the root positions, while disproving the pn-search tree means that white can achieve at least a draw. In each or node, black is to move. As evaluation function at such a node, we apply db-search with black as attacker. If db-search nds a winning threat sequence, the node evaluates to true, otherwise to the value unknown. In each and node, white is to move. The same procedure as in or nodes is applied, this time with white as attacker. If a winning threat sequence is found, the node evaluates to false, otherwise to the value unknown. A node representing a position with all 225 squares occupied and neither player having a winning con guration, is a draw, and therefore obtains value false, without applying db-search. The above description explains how standard pn-search is applied to go-moku. However, ve enhancements have been added to speed up the search. The enhancements are discussed in this section. 5.4.1 Go-moku as an AND/OR tree 5.4.2 Enhancements Transpositions A dag is created instead of a tree, using the algorithm described in section 2.3.3. This ensures that if a position has already occurred in the dag, or 144 Chapter 5. Go-Moku if a position is equivalent through automorphisms to another position in the dag, the position is not investigated again. We test for the 8 standard automorphisms of a square board. Restricting black's moves In go-moku, the average branching factor is more than 200. Most of these moves are unrelated to the battle at the center of the board and should be ignored. However, since we want to prove the value of the root position, we cannot simply ignore moves using heuristic selection functions. A large reduction of the branching factor at the or nodes can be made, however. Since we want to prove a win for black in the root position, it is su cient to prove for each internal or node that (at least) one child leads to a win for black. For each internal and node all children must be proved. Using these properties, we may at each or node restrict black to, say, the N most-promising children, using a heuristic ordering function. If in the restricted game tree a proof of black's win is found, the same proof is valid in the full game tree. In our investigations presented in section 5.5 we have restricted black in each or node to the 10 most-promising children. Before the ordering function is applied, we rst restrict the set of all legal moves to the set of moves which counter the threats of the opponent, as described in the next section. The heuristic ordering function used is rather simple: each square is assigned 4 points for each three with a two-stone reply, 3 points for each three with a three-stone reply, 2 points for each broken three, 2 points for each open two, which is de ned as two black stones in the center of an otherwise empty line of 6 and 1 point for each broken two, which is de ned as two black stones with a one-square gap in the center three squares of an otherwise empty line of 7. Among all children, the 10 children with the highest score are selected. No points are given for the creation of a four. Creating a four is often only a strong move if it stops a threat of the opponent, or if it creates a winning threat sequence. Since a node is only expanded if no winning threat sequence exists and it is ensured that we select the 10 best moves among the moves which counter the existing threats of the opponent, there was no need to assign any points for creating fours. Clearly, a thorough analysis of the strategic knowledge of experts would have led to a more re ned move-ordering function. As we show in section 5.5, the function described here was su cient for our purposes. 5.4 Applying pn-search 145 2 6 4 3 1 2 4 2 7 6 5 6 Figure 5.7: Black threatens to win by moves 1 through 7. Related squares As stated before, most of the approximately 200 legal moves per position are unrelated to the battle at the center of the board and should be ignored. Although we cannot ignore moves by white using heuristic selection functions, we may try to apply a winning threat sequence found as reply to one move to a large number of other moves. In this section we describe how this is done in a reliable way. For each winning threat sequence of the attacker, we de ne the set of related squares as follows. An empty square c is related to a winning threat sequence in a given board position, if the threat sequence no longer wins, if c would have been occupied by the defender. Before we use the notion of related squares, we introduce the term implicit threat, for any position where a player threatens to win through a winning threat sequence. In gure 5.7 black threatens to win through the threat sequence consisting of black 1 through 7. Therefore, the position is an implicit threat for black. Now let us suppose that we have algorithms to determine whether a position is an implicit threat, and that we can determine for each winning threat sequence the set of related squares. Given a position with white to move, which is an implicit threat for black, we determine the set of squares related to the winning threat sequence. Then, it follows directly from the de nition of related squares that we may restrict white to these related 146 Chapter 5. Go-Moku ab qq bbababa qqqqqqq ba qq b q bbb qqq bb qq b q b q b q aaaa qqqq a q aa qq a q babaa qqqqq b q b q a q a q aa qq a q aaa qqq a q a q a q a q a q a q a q a q aaa qqq (a) Superset of related squares (b) Related squares Figure 5.8: Replies to the threat sequence of gure 5.7 squares. Clearly, by determining implicit threats and sets of related squares in an e cient way, we could speed up our search. To determine an implicit threat, it su ces to make a null-move for the opponent (white in gure 5.7) and to apply db-search to nd a winning threat sequence for black. Determining the exact set R of squares related to a winning threat sequence is computationally expensive. Instead, we determine a superset S of the set of related squares. The set consists of all squares meeting one of the following two conditions. 1. The square is in one of the lines of the threats in the winning threat sequence. 2. The square may be used in any counter threat by the opponent, in any of the global defensive strategy searches performed to investigate the winning threat sequence. Using db-search S can be determined e ciently. Without proof we state that R S . For empirical evidence of this claim we refer to section 5.5.4. In gure 5.8a, we have shown the set S for the threat sequence of gure 5.7. The squares labeled a are part of the lines of the threats. The squares 5.4 Applying pn-search 147 labeled b may, together with white stones on the board or the defensive moves in the threat sequence, form new defensive threats for white. Iterated related squares The related-squares concept can be used to even further reduce the set of white moves to be examined. After having determined the superset S of the set of related squares, an element s of S is selected. A white stone is placed at s, and the position is investigated with db-search. If no winning threat sequence is found, a child is added to the tree for s. Otherwise, the superset S1 of squares related to the newly found winning threat sequence is determined. Only squares in S S1 need further be investigated, since all moves at other squares lead to a win through one of the two winning threat sequences found so far. This procedure is repeated until all moves have been examined. In gure 5.8b we have marked the set of squares for which child nodes are grown. Of the 35 related squares of gure 5.8a (set S ), only 19 squares (set R) remain in 5.8b. The null-move heuristic and the related-squares heuristic are applied for both players in the pn-search dag. For the attacker in the search (the player for which we select only 10 moves per node) we rst determine the set of counter moves using the heuristics of this section, and then order the moves according to the move-ordering function of the previous section. Of course, if less than 10 counter moves exist, these are all selected. \ The implicit-threat heuristic The branching factor of go-moku is such that the search tree may become quickly intractable. To force black to select moves where white has a restricted number of moves, we evaluate a position which is not an implicit threat for black to false. Only early in the game tree (i.e., when there are less than 9 stones on the board), if no black move leads to an implicit threat, is the above restriction lifted. We have found that no later than move 11 in the game, black can ensure that each move is an implicit threat. By enforcing this restriction, the size of the search tree is signi cantly reduced. 148 Chapter 5. Go-Moku Heuristic (dis)proof number initialization During our initial experiments, we have used the standard proof and disproof numbers initialization of 1 each. While studying the trees grown, it became apparent that pn-search tended to pursue some deep lines longer than desirable. This is mainly caused by continuously executing threats, without creating a potential for a winning threat sequence. In qubic, as described in chapter 4, we chose to remove all threatening moves for the attacker from the search tree. We could safely do so, since our db-search implementation for qubic searched the full space of threatening sequences. The incompleteness of db-search in go-moku with respect to the space of all threat trees blocks a similar approach in go-moku. Instead, we have opted to attach a small penalty to all deep lines. At each frontier node the proof and disproof numbers are initialized to the number of full moves made from the root position. Thus, at depth d, the proof and disproof numbers are initialized to 1 + d=2 . This heuristic initialization ensures that forcing lines are not searched too deeply (before su cient alternatives have been tried), without interfering with the essence of pn-search. b c 5.5 Solving go-moku The program Victoria consists of the pn-search algorithm described in the previous section, using db-search as evaluation function. In this section we describe how Victoria solved both free-style go-moku and standard go-moku. First, we describe the i/o of Victoria. Second, it is explained how the game tree was split in several hundreds of subtrees. Third, we present statistics regarding the search process. Finally, we discuss the reliability of our results. 5.5.1 Victoria's I/O The input to Victoria consists of (1) A go-moku position (2) The game variant (free-style go-moku or standard go-moku) (3) The player to move and (4) The maximum tree size for pn-search. The output of Victoria consists of (1) the value upon termination of pn-search (true, false, unknown) (2) a database containing a record for each position in the solution tree. The database returned by Victoria is empty unless the value true was returned. For each record in the database representing a position with black to move, at least one child position will also 5.5 Solving go-moku 149 be represented in the database. For each record in the database representing a position with white to move, only child positions are represented in the database in which black does not have a winning threat sequence. The database created by Victoria served two purposes. First, the merged database of all subtrees investigated should provide us with a solution tree for the full go-moku game tree. Second, the databases created by solved subtrees were used as transposition table for pn-searches. We have seen several occasions where a search of several hundreds of thousands of nodes without transposition tables was reduced to a mere few thousands nodes, by hitting the database early during the search. We have divided the go-moku game tree into several hundreds of smaller problems. The main reason for doing this is that the size of the go-moku game tree is such that we could not solve it through a single pn-search, due to the limits imposed on pn-search by the size of our computer's working storage. We remark that by splitting the game tree into subtrees, part of the solution process has been performed by hand. Most of these moves have been made with the aid of Sakata and Ikawa (1981), while others where suggested by the proof and disproof numbers of failed pn-searches. The number of black moves selected by hand (several hundreds) is less than one percent of the total number of black moves in the solution tree (many tens of thousands). In this section we present the statistics of running pn-search on go-moku. As mentioned before, we have subdivided the problem in several hundreds of subtrees, each of which was individually solved. Since each completed search extended the database of solved positions, the number of positions searched partly depend on the order in which the subproblems were solved. 5.5.2 Subdividing the game tree 5.5.3 Statistics Execution time Our calculations were performed in parallel on 11 sun sparcstations of the Vrije Universiteit in Amsterdam. Each machine was equipped with 64 or 128 megabytes internal memory, ensuring that pn-search trees of up to 1 million nodes would t in internal memory, without slowing down the search by swapping to disk. The processor speed of the machines ranged from 16 150 Chapter 5. Go-Moku to 28 mips. Our processes could only run outside o ce hours. As a result, sometimes processes which had not nished at 8am were killed, and had to be restarted at 6pm. Still, over 150 cpu hours per day were available for solving go-moku. In the gures below, we have not included cpu time spend on processes which were killed in the morning and restarted in the evening, nor have we included the cpu time spent on test runs during which we discovered bugs in our software (see also section 5.5.4). Thus, the time mentioned indicates the amount of time necessary to solve go-moku without interruptions, using the nal version of Victoria. Free-style go-moku was solved using 11.9 days of cpu time, while standard go-moku (thus banning wins through overlines) was solved with 15.1 days of cpu time. Pn-search tree size The summed size of all pn-search trees built during the calculations (again excluding terminated processes and runs of initial versions of the program) for free-style go-moku is 5.3 million. For standard go-moku, 6.3 million nodes were grown. Comparing these gures with the execution time necessary for the solutions, we see that both variations ran at the speed of approximately 5 nodes per second. The rejection of potential winning lines involving overlines, resulted in the creation of a 20% larger search tree. Db-search evaluations For each internal node of the pn-search tree, 10-20 independent db-searches (excluding global defensive strategy searches) were performed on the average, resulting in, between 50 and 100 db-searches per cpu second. Multiplied by the total calculation time, the number of independent db-searches executed to solve go-moku lies between 50 million and 130 million. Solution size The solution tree found by Victoria for free-style go-moku is slightly smaller than the solution tree found for standard go-moku: 138,790 versus 153,284 database records. Comparing these numbers with the total size of the pnsearch trees, we nd that 1 out of every 40 nodes created participates in the solution. The deepest variation in both solution trees is 35 ply. 5.5 Solving go-moku 151 depth free-style standard depth free-style standard 0 1 1 18 1351 1885 1 1 1 19 1094 1590 2 35 35 20 710 1125 3 35 35 21 594 954 4 7227 7242 22 408 641 5 6824 7251 23 327 506 6 20859 22749 24 193 296 7 20239 21078 25 154 241 8 20686 22056 26 85 159 9 20550 21898 27 74 128 10 8959 10015 28 40 67 11 8637 9570 29 35 54 12 5246 6015 30 7 19 13 4778 5492 31 7 18 14 2999 3663 32 1 8 15 2647 3282 33 1 6 16 2173 2810 34 1 1 17 1811 2392 35 1 1 Table 5.1: Nodes per tree depth in go-moku solutions. In table 5.1 we have listed the number of nodes per depth for both solution trees. We remark that for each position with black to move, only one child position needs to be included. Due to transpositions, the number of nodes at each odd ply should therefore be less or equal to the number of nodes at the preceding even ply. The only exception in the table, ply 5 for standard go-moku, is caused by the fact that we have included several options for black for some opening positions in our set of positions to be checked by pn-search. Deep winning lines The combination of db-search and pn-search makes it di cult to determine the maximin of go-moku (i.e., the length of the game after optimal play of both players). Both db-search and pn-search do not aim at nding the shortest winning paths, while the longest path found by the combination of the algorithms may well be di erent from the game leading to the single 152 Chapter 5. Go-Moku 38 44 27 26 16 9 10 6 3 4 13 11 1 2 12 8 17 18 5 7 28 19 23 25 31 32 14 21 39 29 33 37 15 38 24 22 35 30 34 36 4 20 40 39 41 43 9 45 34 25 24 16 10 6 3 13 11 1 2 12 37 33 42 35 17 18 5 7 36 31 19 32 23 30 21 20 27 26 22 28 8 15 29 14 (a) Free-style go-moku (b) Standard go-moku Figure 5.9: Deep variations position at level 35 of the solution tree. Even though the games leading to positions at level 35 of the solution trees do not necessarily show optimal play of either side, we have depicted two of these games in gure 5.9. In section 4.5.4, we have explained the hazards of solving games through large computer programs. The same hazards mentioned there exist in go-moku in even greater form. Our go-moku implementation consists of almost 20,000 lines of C-code. Approximately half is dedicated to the X-interface created by Loek Schoenmaker, while the other half consists of db-search, pn-search, database look-up and database creation, automorphism management, etc. Errors in programs this size are virtually unavoidable. Many errors have been created and corrected during implementation and testing of the program, but there is no guarantee that all bugs have been found. A further source of error is the complexity of the calculation process. We used 11 sparcstations in parallel to solve each of the several hundreds of subtrees. These 11 sparcstations created their own databases when solving a position, while they all used one large database as transposition table. After 5.5.4 Reliability 5.5 Solving go-moku 153 solving a position, the transposition table should be extended with the newly created small databases. A locking mechanism was created to ensure that no databases would be corrupted. Still, computers going down at critical moments introduced the possibility that data would get lost. This, in fact, has happened during our calculations. To ensure the completeness of the solution found, we have created a module which examines the nal database created. For each position with black to move a successor position must be present in the database. For each position with white to move, for each legal move either a winning threat sequence must exist, or the successor position must be present in the database. The only common element with the solving process is dbsearch. Thus, an error in db-search may go unnoticed, while all other parts, including pn-search and the related-squares generator, are eliminated from the checking process. Using the database checking module, we have both located missing database parts, due to computers failing at critical moments and human error, and have found an error in our related-squares generator. The nal investigations, however, for both the free-style go-moku and standard go-moku variants were successful. The correctness of our db-search implementation is based on meticulously testing all possible types of counterplay, including intricate ways in which the opponent forces the attacker, after a sequence of fours, to occupy an extension square of a threat in the threat sequence. After the nal database creation, which was checked and accepted by the database-checking module, no errors have been found in this part of the program. Therefore, go-moku should be considered a solved game. 154 Chapter 5. Go-Moku Chapter 6 Which Games Will Survive? 6.1 Scope In chapters 2 and 3, we presented two new search techniques which have been applied to qubic and go-moku in chapters 4 and 5, thereby partly answering our rst research question (see section 1.4). In this chapter, we broaden our scope to all three research questions and the problem statement. The chapter consists of four parts. First, in section 6.2, we de ne four properties of games. These are perfect information, convergence, sudden death, and complexity. Second, in section 6.3, we discuss four aspects of each of the games of the Olympic List. 1. The relation between the game and the four game properties introduced in section 6.2. 2. The state of the art in game-playing programs. 3. Techniques currently applied. 4. Obstacles to progress. Third, in section 6.4 we review our three research questions on the basis of the discussion of individual games presented in section 6.3, leading to a review of the problem statement. Finally, in section 6.5, we speculate about the future playing strength of computer game playing programs, as well as of the future of thinking games in our society. 155 156 Chapter 6. Which Games Will Survive? For the rules of the games discussed in this chapter, we refer to Levy and Beal (1989), Levy and Beal (1991) and Van den Herik and Allis (1992). 6.2 Game properties In this section we de ne four properties of games, viz. perfect information (section 6.2.1), convergence (section 6.2.2), sudden death (section 6.2.3) and complexity (section 6.2.4). The perfect-information property divides the set of games into two disjoint subsets: the set of perfect-information games and the set of imperfectinformation games. In a perfect-information game, all players, at any time during the game, have access to all information de ning the game state and its possible continuations. Any game which is not a perfect-information game is de ned to be an imperfect-information game. For example, chess is a perfect-information game. Relevant information de ning the game state in chess includes: (1) the con guration of chess pieces on the board (2) the number of moves made since a pawn was moved, or a piece has been captured (3) the en-passant capturing opportunities in the current game state (4) the castling options left to both players and (5) previous con gurations with their en-passant capturing opportunities and castling options. The information described here allows each player to determine the game state and its possible continuations, including en-passant capturing moves, castling moves, repetition of positions, and the status with respect to N -move rules. In practice, a player needs only three pieces of information: (1) the con guration of chess pieces (2) the game score, i.e., all moves played since the start of the game and (3) the o cial rules of chess. The combination of these three pieces of information allows a player to deduce all necessary information during a game. Bridge is an example of an imperfect-information game. During the bidding phase of bridge, each player sees only her own cards, leaving her unaware of the distribution of the remaining 39 cards over her partner and her opponents. During the playing phase, each player sees her cards, those of the dummy and the cards already played, still leaving her unaware of the distribution of the remaining cards over the undisclosed hands. Optimal play in a perfect-information game always consists of a pure strategy, while in imperfect-information games optimal play may require a 6.2.1 Perfect information 6.2 Game properties 157 mixed strategy. In a pure strategy, for each game state a single move can be determined, which leads to the game-theoretic value of the position. In a mixed strategy, optimal play requires a player to play a move i with probability p , while at least two such p are non-zero. For a discussion of pure and mixed strategies, we refer to von Neumann and Morgenstern (1944). i i 6.2.2 Convergence The convergence property labels games as either converging, diverging or unchangeable. Before we can de ne these classi cations, we introduce conversions in de nition 6.1. De nition 6.1 A move M from state A to state B is a conversion, if no con guration of pieces which could have occurred in any game leading to the con guration of pieces in A, can occur in a game continuing from state B . Examples of conversions in chess are moving a pawn, or capturing a piece. In checkers, any move except for a non-capture move by a king is a conversion. For most games, the main conversions involve the addition (e.g., connectfour, go-moku, qubic and othello) or removal (e.g., chess, checkers, bridge) of pieces from play. We may divide the state space of all legal positions of a game into disjoint classes, where each class contains all positions with the same number of pieces on the board. Let us de ne a directed graph G in which each class is a node, and an arc exists between class A and class B if and only if a position P exists in A such that a move exists from P which leads to a position Q in B . We can now de ne convergence using this notion of classes of positions. A game converges if for the majority of edges from A to B in G, the cardinality of A is larger than the cardinality of B . A game diverges if for the majority of edges from A to B in G, the cardinality of B is larger than the cardinality of A. A game is unchangeable if the game does not have conversions, or if it neither converges nor diverges. An example of a converging game is checkers. The initial position in checkers consists of 24 men, while during the game the number of men decreases. After the rst few captures, the number of legal checkers positions decreases as the number of pieces on the board decreases. An example of a diverging game is othello. Each move in othello adds a piece to the board. Except for the endgame, the number of legal positions increases as the number of stones on the board increases. An example of an unchangeable game is shogi. Although shogi contains captures, there are no conversions in shogi. Captured pieces may be brought 158 Chapter 6. Which Games Will Survive? into play again by the player who captured the piece. As a result, the total number of pieces participating in a shogi game does not increase or decrease. Thus, shogi is an unchangeable game. The relevance of the convergence property is that for converging games endgame databases (Thompson, 1986) can be created, while this is generally unfeasible for diverging or unchangeable games. The sudden-death property labels games as either sudden-death or xed-termination. A sudden-death game may end abruptly by the creation of one of a prespeci ed set of patterns. A xed-termination game lacks sudden-death patterns. An example of a sudden-death game is go-moku: the game is terminated if one of the players has created a line of ve stones in her color. Suddendeath games need not always terminate through the creation of a suddendeath pattern: go-moku is declared a draw when all 225 squares have been occupied without either player creating a winning pattern. An example of a xed-termination game is othello. Othello lasts until both players run out of moves or one of the players has no discs left on the board. In practice, games last between 55 and 60 moves. Even though a game might be decided within 15 moves by one player capturing all the discs of the opponent, such an anomaly is only of marginal relevance. The sudden-death property often is an important property in restricting the search tree of a game. For games of high complexity (see section 6.2.4) the sudden-death element in combination with a clear advantage for one of the players may be the main property that allows the game to be solved. Examples are qubic and go-moku (both sudden-death games) described in chapters 4 and 5. The property complexity in relation to games is used to denote two di erent measures, which we name the state-space complexity and the game-tree complexity. 6.2.3 Sudden death 6.2.4 Complexity State-space complexity The state-space complexity of a game is de ned as the number of legal game positions reachable from the initial position of the game. While calculating 6.2 Game properties 159 the exact state-space complexity of games such as chess is hardly feasible, we present a method for calculating an approximation, using tic-tac-toe as an example. A crude approximation to tic-tac-toe's state-space complexity is obtained through the notion that each of the nine squares can be occupied by cross, nought, or be empty. Thus, an upper bound to the state-space complexity is 3 = 19 683. A sharper upper bound is obtained by noting that the number of crosses should equal the number of noughts, or exceed it by one. This results in an upper bound of 6 046. The exact state-space complexity, however, is obtained by observing that a position is illegal if a move has been added after a player has created three-in-row. Thus, positions containing a line of three noughts with nought to move, or a line of three crosses with cross to move must be excluded. The resulting 5 478 legal positions determine the statespace complexity of tic-tac-toe. The de nition of the state-space complexity could be re ned so that symmetrically equivalent positions are counted only once. We refrain from such a re nement. Let us assume that we have established a superset of all legal positions of the game and the cardinality of that superset. Let us also assume that for each individual position of the superset we have an evaluation function which determines whether the position is legal. Using the combination of these two and a Monte-Carlo simulation, we may obtain an estimate of the true state-space complexity. We performed 10 Monte-Carlo simulations, with a thousand samples per simulation, chosen from the superset of 3 con gurations mentioned above. For each simulation we determined the fraction of the positions which were legal. Multiplication of this fraction by the size of the superset, 3 , gave an estimated state-space complexity in our 10 simulations ranging from 4 920 to 5 983 with an average of 5 479, surprisingly close to the true state-space complexity. The main application of the state-space complexity of a game is that it provides a bound to the complexity of games which can be solved through complete enumeration. With today's (1994) technology, where computer networks have access to Gigabytes of disk storage, the boundary of solvability by exhaustive enumeration lies at a state-space complexity of approximately 10 . 9 9 9 11 Game-tree complexity Before we are able to de ne the game-tree complexity of a game, two auxiliary de nitions are needed. 160 Chapter 6. Which Games Will Survive? of a full-width search su cient to determine the game-theoretic value of J . According to de nition 6.2, the solution depth of a mate-in-n position in chess, n 1, is 2n ; 1 ply. De nition 6.3 The solution search tree of a node J is the full-width search tree with a depth equal to the solution depth of J . As an example we consider a chess position J with white to move. White has 30 legal moves. For simplicity's sake, we assume that after each legal white move, black has 20 legal moves of which at least one mates white. Then, the solution search tree of J consists of J , the 30 children of J , and the 600 grandchildren of J . De nition 6.4 The game-tree complexity of a game is the number of leaf nodes in the solution search tree of the initial position(s) of the game. If J were the initial position of a game, its game-tree complexity would be 600. While calculation of the exact game-tree complexity of games such as chess is hardly feasible, we can calculate a crude approximation as follows. Using tournament games, we can observe the average game length. Also, we may determine the average branching factor, either as a constant, or as a function of the depth in the game tree. The game-tree complexity can be approximated by the number of leaf nodes of the search tree with as depth the average game length (in ply), and as branching factor the average branching factor (per depth). For instance, in tic-tac-toe, the average game length is close to nine ply, since most games end in a draw, which always takes exactly nine half-moves. The branching factor at level i in the game tree equals 9 ; i. Thus, the minimax search tree with depth 9 and branching factor 9 ; i at level i consists of 9! = 362880 terminal nodes, which is an estimate of the game-tree complexity of tic-tac-toe. Note that the game-tree complexity of a game may be larger than the state-space complexity, as the same position may occur at several di erent places in the game tree. The game-tree complexity is an estimate of the size of a minimax search tree which must be built to solve the game. Thus, using optimally-ordered - search, we may expect to search a number of positions in the order of the square root of the game-tree complexity (Knuth and Moore, 1975). As a guide to the perplexed, anticipating results duely credited in the following section, we present a graphical overview of the two complexities we De nition 6.2 The solution depth of a node J is the minimal depth (in ply) 6.3 The games of the Olympic List game-tree complexity state-space complexity 161 320 log of 10 complexity 160 80 40 20 10 nine men’s morris awari checkers connectfour othello qubic Chinese chess draughts chess go-moku renju go Figure 6.1: Estimated game complexities. distinguish in gure 6.1. For credits and sources see the discussions of the individual games. 6.3 The games of the Olympic List In this section we discuss each of the games of the Olympic List individually. For each game, we describe (1) its properties, as introduced in section 6.2 (2) the currently strongest computer programs (3) the techniques applied in these programs and (4) the obstacles to progress in the game. We have ordered the games of the Olympic List as follows. First, we discuss four solved games (qubic, connect-four, go-moku and nine men's morris) in the order in which they were solved. Second, we discuss the eight unsolved perfect-information games, in an order depending on the strengths of the currently strongest game-playing program: (1) stronger than the current 162 Chapter 6. Which Games Will Survive? world champion (awari and othello) (2) Grand Master strength or stronger (checkers, draughts and chess) (3) below Grand Master strength (Chinese chess, renju and go). Third, we discuss the three imperfect-information games of the Olympic List (scrabble, backgammon and bridge). 6.3.1 Qubic Game properties Qubic is a diverging, perfect-information game with sudden death. An upper bound to the state-space complexity of qubic is 3 10 . To estimate the game-tree complexity, we assume an average game length of 20 ply. With 64 ; i legal moves in a position at ply i, the game-tree complexity of qubic is approximately 10 . 64 30 64! 34 44! The state of the art Qubic was the rst game of the Olympic List to be solved. It was proved that the game is a win for the player to move rst (Patashnik, 1980). The main game property responsible for qubic being solvable is sudden-death. For details on the solution of qubic, we refer to chapter 4. Techniques currently applied Qubic was solved by Patashnik using a standard - search for determining the existence of winning threat sequences. All non-forced moves leading to the solution were made by hand, using expert knowledge. Qubic has been solved again using db-search for determining the existence of winning threat sequences, and pn-search for making the non-forced moves, as described in chapter 4. Obstacles to progress It could be argued that qubic provides additional challenges beyond solving the game. For instance, one might want to determine the game-theoretic value of every legal position, or determine the shortest winning threat sequence from each position. However, we believe that with respect to human performance on qubic, all interesting problems within qubic have been solved. During the solution processes, no obstacles to progress have been discovered. 6.3 The games of the Olympic List 163 6.3.2 Connect-Four Game properties Connect-four is a diverging, perfect-information game with sudden death. Although at rst sight the sudden death in connect-four may seem as important as in qubic, most games in connect-four are decided between moves 37 and 42, i.e., while lling the last column of the board. The state-space complexity of connect-four has been estimated at 10 (Allis, 1988). With an estimated average game length of 36 ply, and an average branching factor of 4, the game-tree complexity of connect-four is approximately 4 10 . 14 36 21 The state of the art In September 1988, James Allen determined the game-theoretic value through a brute-force search (Allen, 1989): a win for the player to move rst. A few weeks later, in October 1988, connect-four was solved through a knowledge-based approach, resulting in the tournament program victor (Allis, 1988 Uiterwijk et al., 1989a Uiterwijk et al., 1989b). Recently John Tromp has calculated the game-theoretic value for all 8-ply connectfour positions (Tromp, 1993). Techniques currently applied Both Allen and Tromp used a sophisticated implementation of - search. While Allen spent 300 hours of cpu time to determine the game-theoretic value of the position after 1: d1, Tromp's calculations took some 40,000 hours cpu time for his (vastly) more complex task. Our knowledge-based solution initially took 350 hours of cpu time. However, adding a knowledge rule in combination with changing the search algorithm to pn-search has resulted in a program which solves connect-four in less than 25 cpu hours. All these experiments were performed on comparable hardware. Obstacles to progress The current version of victor, in combination with the 8-ply database created by Tromp, can be used to determine the game-theoretic value of almost any connect-four position within minutes. Furthermore, victor's knowledge-based component is able to provide us with an explanation why 164 Chapter 6. Which Games Will Survive? a position is won. Therefore, we believe that no challenges remain within connect-four and no obstacles to progress have been discovered. 6.3.3 Go-moku Game properties Go-moku is a diverging, perfect-information game with sudden death. An upper bound to the state-space complexity is 3 10 . To estimate the game-tree complexity, we assume an average game length of 30 ply. With 225 ; i legal moves in a position at ply i, the game-tree complexity of gomoku is approximately 10 . For the professional variant of go-moku, with opening restrictions for black, the average game length will be somewhat larger, resulting in a higher game-tree complexity. 225 105 70 The state of-the-art As described in chapter 5, two variants of go-moku without opening restrictions have been solved in August 1992, proving that the game-theoretic value is a win for the player to move rst. The current computer gomoku world champion (according to the rules of professional go-moku) is the program Vertex written by Shaposhnikov (Uiterwijk, 1992a). It is unclear at what performance level Vertex plays in relation to the strongest human players. Techniques currently applied As described in chapter 5, the two variants of go-moku without opening restrictions were solved using a combination of db-search and pn-search. world champion Vertex is based on standard game-tree search techniques: a xed-depth (16-ply) - search for the most-promising 14 moves in each position. Vertex has been provided with expert pattern knowledge and opening knowledge of two-fold world correspondence Renju champion Nosovsky . Obstacles to progress During the solution process of go-moku, it became apparent that through its tactical knowledge Victoria was able to suggest strong positional moves in many positions. In other words, many positionally strong moves could be explained through tactical calculations. We believe that a combination of 6.3 The games of the Olympic List 165 db-search and pn-search, without the multi-move reply and other e ciency measures, can be implemented to outperform all human players in the search for deep winning threat trees. With similar positional bene ts as encountered during the solution process of the free-style and standard go-moku, we conjecture that the best human players can be defeated at any variant of go-moku. It is also possible that standard techniques as applied in Vertex would prove su cient for the task. Therefore, we conclude that no obstacles have been discovered in go-moku. 6.3.4 Nine men's morris Game properties Nine men's morris is a converging, perfect-information game. The game has a sudden-death element: if a player is unable to make a move, she loses. Even though this plays a role in practice, its in uence on the game is much less than that of the main feature: closing mills and thereby capturing men of the opponent. Therefore, it seems more appropriate to classify nine men's morris as a xed-termination game than as a sudded-death game. The state-space complexity of nine men's morris, calculated by Gasser (1990), is the smallest of all games of the Olympic List: 10 . Nine men's morris' game-tree complexity is much larger. During the opening phase of the game, the branching factor is 16 on the average In the middle game and end game, the branching factor ranges from 1 to over 50, resulting in our conservative estimate of the average branching factor of 10. Setting the average game length at 50 ply (again a conservative estimate), the game-tree complexity of nine men's morris is calculated to be at least 10 . 10 50 The state of the art Nine men's morris has been solved in October 1993 by Ralph Gasser, proving that the game-theoretic value is a draw. In the years preceding the solution of the game, the program Bushy, also by Gasser, has shown itself to be stronger than the best human players, as illustrated by defeating the British champion by 5 to 1 in an exhibition match during the 2nd Computer Olympiad (Levy and Beal, 1991). Techniques currently applied Nine men's morris has been solved through the creation of databases by retrograde analysis for all positions which may occur during the middle game 166 Chapter 6. Which Games Will Survive? or endgame (Gasser, 1993). For the opening phase, which takes exactly 18 ply, a forward search using - search was applied. Obstacles to progress During the solution of nine men's morris through the application of standard search techniques, no obstacles to progress on the game have been discovered. 6.3.5 Awari Game properties Awari is a converging, perfect-information game, with xed termination. Only in rare circumstances may a player run out of moves early in the game, terminating it. The chances of this happening, however, are quite remote, which is why awari is not a sudden-death game. The state-space complexity of awari is calculated by Allis et al. (1991c) to be 10 . The game-tree complexity of awari, based on an average branching factor of 3.5 and an average game length of 60 ply, is estimated at 10 . 12 32 The state of the art Although lack of o cial human awari champions makes it di cult to prove, empirical evidence suggests that today's strongest awari program, Lithidion (Allis et al., 1991c), outperforms the strongest human players. Lithidion has lost games against human opponents, but in each of these cases the game revealed a serious bug in the program. All other games against human opponents were won, most by large margins. We believe that awari will be the next game to be solved. Its state-space complexity is such that, using 2 terabyte of disk space, awari can be solved. It is only because solving awari is not a high-priority project, that it will take several years and advances in technology before the hardware becomes available to solve the game through full enumeration. A similar approach as applied to nine men's morris, i.e. endgame-database construction in combination with a forward search, may reduce the memory requirements for solving awari. Techniques currently applied For a detailed description of the techniques applied to today's strongest awari programs, we refer to section 2.4.3. 6.3 The games of the Olympic List 167 Obstacles to progress Given the current strength of awari programs, and the impending solution of the game, no obstacles have been found on awari. 6.3.6 Othello Game properties Othello is a diverging, perfect-information game with xed termination. The state-space complexity of othello has an upper bound of 3 10 . Several legality tests, such as that the four center squares should not be empty and that the occupied squares must form a connected set, reduced the upper bound in a Monte-Carlo analysis to approximately 10 . To calculate the game-tree complexity of othello, we assume an average game length of 58 ply. With a conservative estimate of the average number of moves per position set at 10, we obtain a game-tree complexity of 10 . 64 30 28 58 The state of the art Othello programs have played at the level of the human world champion since 1980. In that year the program The Moor won a game against the reigning world champion. Since then, programs have continued to improve. Currently, rating lists for othello players show that several programs clearly exceed the strongest human players in playing strength. Today's strongest program is Logistello by Michael Buro, which, among other tournaments, has won the 1st Paderborn othello tournament. Techniques currently applied All strong othello programs are based on standard game-playing techniques: (1) a deep - search (2) a large opening database (3) an endgame search determining the outcome of a game after approximately 36 ply and (4) a nely-tuned evaluation function. The chances that othello will be solved in the near future are extremely remote. The state-space complexity rules out the option of full enumeration, while the game-tree complexity renders a full-depth forward search impossible. The diverging nature of othello makes creation of endgame databases unfeasible. Finally, the property of xed termination of othello renders solving the game in similar fashion to the solution of qubic and go-moku impossible. Only if a so far unidenti ed structure in the game is discovered, resulting in 168 Chapter 6. Which Games Will Survive? knowledge rules which prove the value of nodes early in the game tree, may othello be solved in the coming decades. Obstacles to progress The strongest othello programs have already surpassed their human opponents. Even though solving the game is out of reach, human players do not possess knowledge or skill not shown by their arti cial opponents. We conclude that no obstacles have been found in the research on othello. 6.3.7 Checkers Game properties Checkers is a converging, perfect-information game with xed termination. In checkers a game is lost by a player who runs out of moves. Although in exceptional cases this may happen while both players still have most of their pieces, in practice to win a game, (almost) all of the opponent's pieces must be captured. The state-space complexity of checkers is estimated at 10 (Schae er et al., 1991). The average branching factor is surprisingly low: 2.8, which is mostly due to the forced-capture rule (Schae er, 1993a). With an estimated average game length of 70 ply, we obtain a game-tree complexity of 10 . 18 31 The state of the art As stated in section 1.1, Samuel's learning checkers program has, at least by some, been wrongfully credited with solving the game, which has clouded the history of the performance of checkers programs. Recent e orts by Schae er et al. (1992) have led to the development of a true world-championship level checkers program, named Chinook. Chinook has challenged the human world champion, Marion Tinsley, for his title. In a rather close match, 4 wins, 2 losses and 33 draws, Tinsley successfully defended his title. A rematch is scheduled for August 1994 in Boston. With the extra e orts spent on Chinook (see below), it is not unlikely that 1994 will see a computer program become the strongest checkers entity in the world. Techniques currently applied Chinook consists of (1) a deep - searcher (averaging approximately 20 6.3 The games of the Olympic List 169 ply) (2) a ne-tuned evaluation function (3) a large, man-made, computer checked opening book and (3) endgame databases comprising all endgame positions of 7 pieces or less, and all endgame positions of 4 pieces against 4. The in uence of the endgame databases in checkers should not be underestimated. Due to forced captures in checkers, removing 16 men o the board may happen rather quickly. With regard to solving checkers, we mention that full enumeration of the game is ruled out by the size of the state-space complexity. A complete forward search, even if the game-tree is perfectly ordered, is also out of reach of current technology. However, convergence in checkers has allowed the creation of large endgame databases, which decrease the size of the game-tree signi cantly. Therefore, we do not rule out that the combination of forward search (either pn-search or - search) and endgame databases may prove su cient to solve (some of the openings of) checkers, as stated by Schae er (1993a). Obstacles to progress While Chinook's strength is its deep tactical searches, combined with perfectendgame knowledge, its main weakness is that the value of each pattern not available in the evaluation function must be compensated for by search. In contrast, Tinsley's pattern knowledge is such, that he knows of many positions for which a search of 50 or more plies is necessary to reveal the value of the position. Each of such patterns corresponds to a weakness of the program with respect to human players. Although Chinook's tactical and endgame ability make up for most of the lack of pattern knowledge, it reveals traces of an obstacle to progress in checkers: the inability to gain experience from previous plays. The suitability of checkers to alternative approaches, such as the brute-force approach applied by Chinook shows that this experience obstacle has not prevented checkers programs from successfully challenging the strongest human players. 6.3.8 Draughts Game properties Draughts is a converging, perfect-information game with xed termination, in many ways similar to checkers. The state-space complexity of draughts is signi cantly larger than that of checkers, and we have calculated an upper bound of 10 . The game-tree complexity of draughts is also larger than that 30 170 Chapter 6. Which Games Will Survive? of checkers. Conservatively estimating the average branching factor at 4, and the average game length at 90 ply, we obtain an estimated game-tree complexity of 4 10 . 90 54 The state of the art The strongest draughts program is Truus written by Stef Keetman (Keetman, 1993). Truus' current level of play at tournament speed is ranked around the 40th position in the world. In speed draughts, Truus has beaten reigning world champion Alexei Tsjizjow once, and reached the 9th position in a tournament entered by almost all strong human players. Currently, Keetman works towards the goal of creating a tournament program able to defeat the human world champion. These e orts may improve Truus' level of play in the near future. Techniques currently applied Truus consists of (1) a deep - searcher (averaging a nominal depth of approximately 10 ply) (2) a ne-tuned evaluation function (3) a large, manmade, computer-checked opening book and (4) a set of about 1,000 tactical patterns, which Truus learned through automatic generalization. According to its author, Truus' undefeated record amongst draughts programs since 1990, is mostly due to its learning of tactical patterns (Keetman, 1993). In the near future, Truus' learning abilities will be extended to positional patterns, which have so far been hand-coded by the author. The large state-space complexity, in combination with the large game-tree complexity, make draughts unsolvable in the foreseeable future. Obstacles to progress Truus' strength is mostly based on its knowledge of tactical patterns and deep tactical searches. Although it has been argued by Keetman (1993) that tactical knowledge in draughts enhances positional play, positional knowledge is Truus' main weakness in comparison with human experts. Like in checkers, each pattern not available in the evaluation function must be compensated for by search revealing similar traces of an obstacle to progress as in checkers: the inability to gain experience while playing the game. 6.3 The games of the Olympic List 171 6.3.9 Chess Game properties Chess is a converging, perfect-information game with sudden-death. While convergence and sudden-death are major contributors to high-level play in games like qubic, go-moku and checkers, there is only a slight in uence on tournament play in chess. Convergence in chess is slow, and a large majority of all chess games are decided long before endgame databases come into play. In chess practice the subgoal of obtaining material superiority often dominates the sudden-death goal of checkmate. Thus, both convergence and sudden-death are less pronounced in chess, than in games like checkers and draughts, or qubic and go-moku, respectively. In our calculation of the state-space complexity of chess, we have included all states obtained through various minor promotions. Using rules to determine the number of possible promotions, given the number of pieces and pawns captured by either side, an upper bound of 5 10 was calculated. Not all of these positions will be legal, due to the king of the player who just moved being in check, or due to the position being unreachable through a series of legal moves. Therefore, we assume the true state-space complexity to be close to 10 . A state-space complexity of 10 , as mentioned by various authors (Schae er et al., 1991), is in our opinion too low an estimate. The game-tree complexity of chess, 10 is based on an average branching factor of 35 and an average game length of 80 ply. 52 50 43 123 The state of the art Today's strongest chess program is Deep Thought (Hsu, 1990). Its estimated elo rating of 2550 ranks it between positions 100 and 150 on the world rating list. Current e orts to create Deep Blue, a parallel program consisting of 1000 Deep Thoughts, aim at surpassing the human world champion. Possibly as early as 1994 a new match with today's strongest computer chess player and one of the reigning world champions, Garry Kasparov, will be held. So far, all previous games between Kasparov and Deep Thought have been won by the human Grand Master (Van den Herik and Herschberg, 1989). Techniques currently applied Most ai research on games has focused on chess. Several di erent approaches have been tried, ranging from purely knowledge-based (Reznitsky and 172 Chapter 6. Which Games Will Survive? Chudako , 1990) to purely brute-force (Hsu, 1990). Deep Thought consists of (1) a deep - searcher (averaging approximately 10 ply) (2) a ne-tuned evaluation function (3) a move-generator embedded in hardware and (4) a large, man-made, computer-checked opening book. Even though deep searches have had a large impact on the strength of today's chess programs, we should not ignore the contribution of the improved evaluation functions developed alongside the deeper searches. A strong example is Ed Schroder's 1992 world champion program, which compensates for one or more plies of search through a highly sophisticated evaluation function, manually ne-tuned through years of development and testing. Obstacles to progress In chess, just as in checkers, many strategic concepts known to human Grand Masters are based on gains achieved after a large number of moves. For many of these patterns, programs cannot compensate for their lack of knowledge by simply searching a few ply deeper. Again, but more clearly than in checkers and draughts, the contours of lack of experience as obstacle to progress in chess becomes visible. The extent to which this obstacle prevents programs from attaining dominance over their human counterparts through brute-force alone is unclear. While some believe that it will still take decades before computers will defeat the human world champion, others have stated that this event will occur before the year 2000 (Van den Herik, 1983). 6.3.10 Chinese chess Chinese chess is similar to (Western) chess in many ways: (1) it is a converging, perfect-information game with sudden-death (2) its state-space complexity, at 10 , is similar to that of chess (at 10 ). (3) the approaches to creating computer programs for playing Chinese chess have been similar to that of chess. Its game-tree complexity, estimated at 38 10 (Tsao et al., 1991), is somewhat larger than the game-tree complexity of chess, at 10 . In our opinion, the main reason why Chinese chess programs fall somewhat behind in their challenge of the stronger human players is the lesser amount of e ort invested in Chinese-chess research. 48 50 95 150 123 6.3 The games of the Olympic List 173 6.3.11 Renju Game properties Renju (see also section 5.2.1) is a variant of go-moku, played by professional players. It is a diverging, perfect-information game with sudden death. Its state-space complexity and game-tree complexity are similar to that of gomoku. The state of the art In its purest form, without special opening rules restricting black (see section 5.2.1), we believe renju can be proved a rst-player win, in the same way as go-moku has been solved. The main extension needed consists of the de nition of special types of threats white can create, using squares forbidden to black (squares where black would create a double three, a double four or an overline). Using these extra threat types, white may be able to counter threat sequences which cannot be countered otherwise. Furthermore, potential winning threat sequences by black must be checked for the occupation by black of forbidden squares. Despite the extra complications in the program, and the somewhat enlarged solution complexity, we believe that renju should be solvable in at most ten times the e ort required for the go-moku solution. Professional renju, as described in section 5.2.1, is a game with virtually equal chances for both players. As go-moku could only be solved through black's opening advantage, we believe that professional renju will be unsolvable in the foreseeable future. Today's strongest renju programs, such as Vertex by Shaposhnikov, are estimated to play at a level of 2 or 3 kyu (Ohta, 1993), which is the level of intermediate to strong club players. Techniques currently applied World champion Vertex is based on standard game-tree search techniques: a xed-depth (16 ply) - search for the most-promising 14 moves in each position. Vertex has been provided with expert pattern knowledge and opening knowledge by two-fold world correspondence Renju champion Nosovsky. Obstacles to progress In go-moku we have seen that the availability of a strong tactical module allows a program to determine positionally strong moves: through refutation 174 Chapter 6. Which Games Will Survive? of positionally weak moves by tactically forced sequences, the positionally strong moves automatically emerge as the only options. In renju, a strong tactical module can be created using the same principles as applied to gomoku, albeit somewhat more complex. So far, it has not been shown that it is necessary to master deep positional knowledge as applied by human master players. In other words, so far no obstacles to progress in renju have been discovered. 6.3.12 Go Game properties Go is a diverging, perfect-information game with xed termination, We remark that, in theory, go should be regarded as an unchangeable game, instead of a diverging game, as any legal state can be reached from any other legal state, if both players cooperate to this end. However, in practice, the board is slowly lled with stones until the board is divided into territories for both players. For practical purposes, therefore, go is a diverging game.. Go's state-space complexity, bounded by 3 10 , is far larger than that of any of the other perfect-information games of the Olympic List. Its game-tree complexity, with an average branching factor of 250, and average game length of 150 ply, is approximately 10 . 361 172 360 The state of the art The strongest programs, such as Goliath by Mark Boon and Go-Intellect by Ken Chen, have achieved ratings roughly between 8 and 10 kyu (Boon, 1991 Chen, 1992), a level equivalent to weak club players. The low playing strength in comparison to human players cannot be attributed to the lack of interest by strong players or by nanciers: both Mark Boon and Ken Chen have a go-rating of 5 dan, while large sums of money can be won by the strongest go programs. The explanation for the low playing strength of current go programs is found in the nature of the game. While the potential branching factor averages 250, human players only consider a small number of these, through extensive knowledge of patterns relevant to go. Similarly, while evaluating a position, humans determine the strengths and weaknesses of groups on the board with pattern knowledge. Thus, either programs must obtain pattern knowledge similar to human experts, or compensate for a lack of such knowledge through search or other means. 6.3 The games of the Olympic List 175 Techniques currently applied We restrict our description to two-fold computer world champion Goliath, written by Mark Boon. Goliath's main strength is its evaluation function. As part of the evaluation function, heuristics determine the value of groups under attack, as well as the result of many forcing sequences, without having to analyze these sequences in detail. The evaluation function is used in a selective search, where moves are generated using pattern knowledge indicating candidate moves. A future version of Goliath, aimed at achieving a playing level of 5 kyu, is currently being developed. Obstacles to progress The main progress made by human go novices can be attributed to learning important patterns, in go terminology called good shape and bad shape. Furthermore, after each life-and-death attack, their pattern knowledge regarding the liveliness of each group on the go board is enhanced. After playing a few hundred games, a novice go player will have acquired su cient pattern knowledge to defeat today's strongest go programs. While lack of pattern knowledge is not unique to go (cf. checkers, draughts and chess), the main reason why it stands out in go is that deep search fails to mask the lack of pattern knowledge. As a result, in go, the experience obstacle is clearly visible. 6.3.13 Scrabble Game properties Scrabble is a diverging imperfect-information game with xed termination. The imperfect information in scrabble consists of not knowing the contents of the rack of the opponent and of the chance element in drawing tiles from the heap. The state of the art During our investigations we have not been able to determine the current level of the strongest scrabble programs. While some people stated that scrabble programs such as TSP by Jim Homan and Tyler by Alan Frank (the two competitors at the third Computer Olympiad) (Uljee, 1992) are stronger 176 Chapter 6. Which Games Will Survive? than the best human players, others believe that human players still have the edge. Techniques currently applied Scrabble as a family game may be best known for its potential of family disputes: while one player maintains that a word is valid, another may dispute it. At o cial scrabble tournaments, the set of legal words is strictly de ned. Either the British O cial Scrabble Words or the American O cial Scrabble Players Dictionary determine the legal words. For words of nine or more letters, Webster's Ninth Collegiate is decisive. All strong scrabble programs have these dictionaries in memory. Generally, a set of legal moves is selected, of which each move is evaluated according to (1) the number of points scored (2) the remaining board position (i.e., the average score the opponent may obtain after the move) and (3) the potential of the letters remaining on the rack, in combination with the letters likely to be drawn from the heap. The endgame of scrabble (i.e., once all letters from the heap have been drawn) is a perfect-information game. A standard forward search can be applied to such positions to determine optimal play for both sides. Obstacles to progress Scrabble programs have shown to be capable of high-level play, even though relatively little research has been performed in this area. We believe that using existing techniques, scrabble programs will surpass their human opponents, if this is not already the case. Summarizing, no obstacles to progress in scrabble have been encountered. 6.3.14 Backgammon Game properties Backgammon is a converging, imperfect-information game of xed termination. Although both players have access to all information determining the current state, dice determine the legal continuations. Not until a player is bearing her stones o or until the game has converted into a running game, are conversion moves made. 6.3 The games of the Olympic List 177 The state of the art In 1980, the human world champion in backgammon, Luigi Villa, was beaten in a short match by the backgammon program bkg (Berliner, 1980). However, both the length of the match, and the fact that Villa seems not to have taken the match as seriously as he should have done, suggest that bkg may not have been truly stronger than the top human players of that time. Recently, Gerald Tesauro created the program TD-gammon, which narrowly lost a match against former world champion Bill Robertie: 40-39. Tesauro's investigations suggest that TD-gammon is signi cantly stronger than bkg (approximately 0.35 points per game), while being close to current human world-champion level (Tesauro, 1993). Techniques currently applied While bkg has been created through expert knowledge, TD-gammon is a three-layer neural network, which is trained through the unsupervised TD( ) learning algorithm. The input to TD-gammon consists of the board position in combination with some fairly basic backgammon knowledge. From the input and a random initialized network, TD-gammon has trained itself on 1.5 million games of self play, resulting in world-class level play (Tesauro, 1993). Using the neural network as the evaluation function, TD-gammon performs a 3-ply search. Doubling is handled by a separate algorithm, as well as part of bearing o , for which an endgame database is used. Obstacles to progress Tesauro's work on TD-gammon indicates that a neural network is capable of capturing pattern knowledge in backgammon as well as the strongest human players. Therefore, we do not see obstacles which have become apparent through research on backgammon. 6.3.15 Bridge Game properties By declaring bridge to be a two-player game, it was possible to include it in the Olympic List. Arguments can be adduced for bridge being a twoplayer, three-player or four-player game. During the bidding phase, four players participate in the bidding. During the playing phase, three players 178 Chapter 6. Which Games Will Survive? participate, while the fourth player becomes the dummy. On the score card, two partnerships are recognized as the players in bridge. Like Blair et al. (1993), we have chosen to regard bridge as a two-player game. We remark that double-dummy bridge problems are two-player perfectinformation games, while bridge problems assuming optimal counterplay can be regarded as two-player imperfect-information games. Finally, we mention that Blair et al. (1993) call the three and four player phases in bridge, twoplayer games without perfect recall. Restricting ourselves to the playing phase of bridge, it is a converging, imperfect-information game with xed termination. The state of the art Instead of trying to master the whole game at once, several researchers have concentrated on single aspects, such as Lindelof (1983), who developed a special bidding system for computer programs and Berlekamp (1963), who created a double-dummy analyzer. Recently, Schoo (1992) has created a program which determines optimal play in single suits. Despite progress on parts of bridge, the strength of today's best bridge programs may at best be called amateur level. An example of leading bridge programs is Bridge Baron by Tom Throop and Tony Guilfoyle, winner of the bridge tournament at the second and third Computer Olympiads. Techniques currently applied Bridge Baron consists of knowledge rules which determine what to bid and the information each bid contains. A major problem not yet solved is interpreting the bids of the opponents when they are using vastly di erent bidding systems. Knowledge rules containing standard playing patterns form the basis for the playing phase, in combination with search. The heuristic nature of the patterns is the source of errors, as shown in a deciding hand in the nal of the third Computer Olympiad (Throop and Guilfoyle, 1992). Except for double-dummy problems and single-suit problems, exhaustive search has so far not been successful, predictions by Levy (1989) notwithstanding that a world-champion level program based on brute-force search could be created with today's technology. 6.4 Reviewing the problem statement 179 Obstacles to progress The main reason for the slow progress on bridge seems the inability of programs to truly understand the vague information they are processing. Instead, programs are taught a bidding system by specifying for each bid the hands for which the bid may be applicable, and the information transferred by the bid. The creation of a bidding program in this way su ers from the knowledge-acquisition bottleneck (Feigenbaum, 1979). Furthermore, extracting information from the bidding phase for use during the playing phase has proved to be rather di cult. Novice human players learning to play bridge experience similar problems. However, through experience, they learn to interpret bids, judge hands, and transfer information gained during bidding to the playing phase. We believe that the experience obstacle blocks progress in bridge. 6.4 Reviewing the problem statement In section 1.4, we have formulated the problem statement consisting of two questions. To nd an answer to the questions in the problem statement, we formulated three research questions. In this section we summarize the answers found to the three research questions (section 6.4.1) and review the problem statement (section 6.4.2). 6.4.1 The research questions Solvable games In this section, we summarize the answers found to the three research questions of section 1.4. We discuss each of the questions separately. The rst research question reads: `Which games can be solved and what techniques may contribute to the solution. With respect to the rst part of the question, solvable games, we have found the following answer. 1. Four games (qubic, connect-four, go-moku and nine men's morris) have been solved. 2. Awari and renju without opening restrictions will be solved in the near future. 3. Checkers is a likely candidate for solution in the future. 180 Chapter 6. Which Games Will Survive? With respect to the second part of the question, contributing techniques, we have found the following answer. 1. For qubic, go-moku and renju, db-search has been, or will be, a contributor to nding winning threat sequences. 2. For qubic, connect-four, go-moku, renju and checkers, pn-search has been, or may be, a contributor to performing a forward search to solve the game. 3. For nine men's morris, awari and checkers, retrograde analysis has been, or will be, a contributor to create endgame databases which reduce the size of the search tree necessary to solve the game. 4. In connect-four applying knowledge rules to determine the gametheoretic value of game positions has proved to be successful. 5. Variants of - search have proved e ective as contributors to the solution of qubic, connect-four and nine men's morris, while they may aid in solving checkers. Outperforming the best human players The second research question reads: `For which games can we create programs outperforming the best human players in the near future, and what techniques contribute to their performance.' With respect to the rst part of the question, outperforming the best human players, we have found the following answers (we ignore the games listed in the answers to the rst research question.) 1. Today's othello programs are stronger than the best human players. 2. Today's draughts, backgammon and scrabble programs are close to world champion level. Expected progress in the near future, possibly just by technological advances, seem su cient to outperform the best human players. 3. For chess, Chinese chess and (professional) renju, current techniques may prove su cient to obtain world-champion level, although it is rather di cult to predict when the last human hurdle will be taken. With respect to the second part of the question, contributing techniques, we have found the following answer. 6.4 Reviewing the problem statement 181 1. The most important techniques for obtaining high-level tournament programs have been sophisticated variants of - search, with netuned static evaluation function. It is a contributing factor in othello, draughts, chess, Chinese chess and professional renju. 2. Db-search in combination with pn-search may prove a contributing factor for professional renju. 3. Neural networks are the basis for the high performance level in backgammon. Human superiority The third research question reads: `In which games will humans continue to reign in the near future (say, at least the next decade) and what are the main obstacles to progress for computer programs?' With respect to the rst part of the question, human superiority, we have found the following answer. 1. For chess, Chinese chess and (professional) renju it is unclear whether the, seemingly inevitable, defeat of the strongest human players will take place within the coming decade. 2. For bridge and go the current performance level as well as the obstacles to progress suggest that humans will remain superior for at least the coming decade, if not for much longer. With respect to the second part of the question, we have found that the main obstacle to progress apparent in several games, but most clearly in bridge and go, is the lacking ability to gain experience. Through the answers to the three research questions, as presented in section 6.4.1, we are now able to discuss the questions raised in the problem statement. As an answer to the rst question, concerning new ai techniques applicable to other domains, we have found in the course of our research two new search techniques, pn-search and db-search. Pn-search is applicable to and/or trees (see chapter 2), and can thus be applied outside the area of games. Db-search is a single-agent search (see chapter 3), for which we have presented examples including production systems. The applicability of dbsearch to problems outside the domains discussed in this thesis needs to be 6.4.2 The problem statement 182 Solved connect-four qubic nine men's morris go-moku awari Chapter 6. Which Games Will Survive? Predicted program strengths in the year 2000 Over Champion World Champion Grand Master Amateur checkers renju chess Chinese chess othello go scrabble draughts bridge backgammon Table 6.1: Predictions for the Olympic Games in the year 2000 investigated in the future. Clearly, as challenges remain within the domain of games, with as speci c examples bridge and go, new ai techniques may be developed through further investigation of these games. As answer to the second question, concerning obstacles emerging through investigation of games, we have found a single obstacle, apparent in several games, but most pronounced in bridge and go: the lack of an ability to gain experience. The ability to gain experience is based on learning and exibility. Flexibility is necessary to generalize while learning, and to recognize the applicability of patterns learned. While these concepts are not at all new revelations, we believe that their importance in relation to our research consists of showing that even without other interfering obstacles, such as common-sense knowledge, gaining experience is an obstacle in itself. We believe that to overcome such an obstacle, a recommended approach is to research it in separation from other known obstacles. Stated di erently, we believe that bridge and go are suitable test beds for investigating the nature of the experience obstacle. In conclusion, we state that our research has contributed two new search techniques which may be applied in ai, as well as some additional insight in the importance of one obstacle to game research. 6.5 Predictions In 1990 we have predicted the strength of computer programs in the year 2000 for each of the games of the Olympic List (Allis et al., 1991a). These predictions have been reproduced in table 6.1. In 1990, we were only aware of the solution to connect-four even though qubic had been solved over a decade before. Currently, four of the ve games listed as predicted to be solved in 6.5.1 Future playing strength 6.5 Predictions 183 2000 are solved. In the Over Champion category (i.e., signi cantly stronger than the human world champion), renju is listed. If we were to recreate table 6.1 today, we would put renju without opening restrictions in the Solved category, while we would put professional renju at the Grand Master category. The Over Champion entry should thus be regarded as a compromise between these two. Of the ve games in the Over Champion category, currently only othello is known to have achieved true Over Champion level. To be at world champion level means having a rating close to that of the human world champion. For both games mentioned (chess and draughts), an o cial rating system exists, which makes it possible to check such a claim. Equivalent to such a rating would be a close match over a large number of games. Thus, Chinook is considered by us to be of world-champion level in checkers. The main reason for listing Chinese chess at Grand Master level, instead of at world-champion level, is the little e ort invested in comparison with chess. Therefore, we believe that progress in Chinese chess will keep trailing several years behind that of chess. The bridge entry at Grand Master level in retrospect seems somewhat optimistic. Had we introduced a Master level, this is where we would categorize it with our 1994 knowledge. However, having to choose between amateur level and Grand Master level, we opted for the latter. Finally, the go entry speaks for itself. In go terminology, the term amateur may be ambiguous. To be clear, any dan rating in the year 2000 for computer programs (even amateur dan ratings) would be above our current expectations. 6.5.2 The future of games Even where computers have failed to achieve perfection, which we see as solving the game, they may succeed at the simpler task of outwitting human beings. In table 6.1, we predict that for the majority of the games of the Olympic List computers will have the advantage over their human opponents before the turn of the century. This being so, we nevertheless argue that all games will continue to be played at all levels, from youngsters enjoying tic-tac-toe to Grand Masters competing in chess tournaments for titles and money. Neither known gametheoretic values nor the availability of silicon opponents of superior strength will extinguish man's urge to compete. It has also been argued that, once a program of over-champion strength exists, programs will cease to improve. Not so: while human beings construct 184 Chapter 6. Which Games Will Survive? programs, competition among programmers will see to it that programs will continue to rise in strength. We therefore conclude: all games will survive at all levels. Appendix A Domain-speci c solution to DLP In this appendix we describe the algorithm triangle to determine the solution to an instance of the double-letter puzzle. Triangle has storage complexity in the order of 2 and time complexity in the order of 3. To simplify the description of triangle, we index the letters in the axiom of dlp from 0 to ; 1, where is the length of the axiom. We de ne a substring of the axiom as any range of letters from a start index to an end index , with 0 ; 1. Triangle uses a triangular array of 1 ( + 1) entries, where each entry 2 can store any subset of f g. Rows in the array represent start indices, and columns represent end indices, i.e., each row consists of column entries to ; 1. In the triangular array, triangle stores for each substring of the axiom, the single letters to which that substring can be reduced. After nishing this task for all substrings, the solution to dlp is found in the entry representing the substring with start index 0 and end index ; 1, which represents the whole axiom. The triangular array is lled in steps. First, the entries with the start index equal to the end index (entries ], for 0 ; 1) are initialized to the singleton set containing the letter 1 at position in the axiom. The other 2 ( ; 1) entries are not initialized. Second, we concentrate on entries representing substrings of two letters (i.e., entries + 1], for 0 ; 2). In general, the value of + 1] can be determined by looking at the sets at table entries ] and + 1 + 1]. The intersection of these sets indicates pairs of equal letters which can be replaced by the predecessor or successor of the letters in . These n n n n i j i j n nn abcde i i n n n n ii i n i nn ii i n ii ii i i S S 185 186 0 0 a 1 1 be a 2 Appendix A. Domain-specific solution to DLP 2 ac − b 3 3 − − − d 4 4 − − − − c 5 5 − − − − − b 6 6 bd − − ce bd ac b 7 7 ce − − − ce − − d 8 8 bd − − ce bd − − − c 9 9 10 − abce − − − − − ad − ac − − − − − − − − a be 10 a Figure A.1: Solution to instance aabdcbbdcaa of dlp. predecessors and successors are then stored at the entry + 1]. Third, we determine the value of the entries representing substrings of three letters (i.e., entries + 2], for 0 ; 3). To determine the value of + 2] we must look at the intersection 1 of the sets at entry ] and + 1 + 2], and at the intersection 2 of the sets at entry + 1] and + 2 + 2]. The union of 1 and 2 determines the letters from which the predecessors and successors are included in entry + 2]. In general, entry ] is the set of predecessors and successors of the S; letters in = 1 ( ] \ + 1 ]). Figure A.1 depicts the array of entries created to solve the instance aabdcbbdcaa of dlp (the example of section 3.2). The set of letters stored in entry 0 ; 1] yields the solution. As mentioned in section 3.2, only is not a solution. ii ii i n ii i S ii i S ii i i S S ii pq i q i p pi q n d Summary In this thesis "intelligent" games are investigated from the perspective of Arti cial Intelligence (ai) research. Games were selected in which, at least partially, human expert players outperformed their arti cial opponents. By investigating a game, we envision at least two possible outcomes. If we achieve a playing strength su cient to defeat the best human players, analysis of the means which led to this improvement may uncover new ai techniques. If the playing strength keeps falling short, even after prolonged attempts, of that of the best human players a better understanding of the problems inherent in playing the game at a high level may be acquired. We remark that there is a possibility that the results do not lead to progress (i.e., no new ai techniques and no better understanding of the inherent problems). In the rst case, the improvement may be due to entirely domain-speci c techniques which cannot be generalized to ai techniques. In the second case, we may nd that we have di culty in isolating the problems from our failed attempts. By investigating a representative set of games, the probability increases that new ai techniques are developed or insight into problems hindering progress is obtained. For our investigations, we have selected a set of games called the Olympic List, consisting of: awari, backgammon, bridge, chess, Chinese chess, checkers, connect-four, draughts, go, go-moku, nine men's morris, othello, qubic, renju and scrabble. The research is in two parts. First, we have investigated three games which we believed could be solved: awari, qubic and go-moku. Games can be solved if it is possible to determine strategies leading to the best possible result for both players. For qubic and go-moku we have been able to nd strategies which guarantee a win for the rst player. For awari this has 187 188 Summary not yet been achieved, but we did create a program that outperforms the strongest human players. Analysis and generalization of the methods used in solving qubic and go-moku resulted in two new ai techniques: the search techniques proof-number search (pn-search) and dependency-based search (dbsearch). Awari is close to its solution, indeed so close that we believe that extant techniques su ce to solve it. Second, for each game of the Olympic List we have investigated whether the di erence in playing skill of human beings and computer programs gives us reason to believe that there is an intrinsic obstacle to progress. We have found that, based on insu cient exibility and learning ability, an experience obstacle exists. This obstacle is particularly conspicuous in bridge and go. We conjecture that, while such obstacles exist in the games domain, these same obstacles will stand in the way of progress in other domains. This thesis consists of six chapters. In chapter 1, the relevance of investigating games is discussed, leading to the formulation of a problem statement and three research questions. In chapter 2, pn-search is de ned. It is shown that pn-search traverses a set of state spaces much more e ciently than alternative search algorithms awari serves to provide an example. In chapter 3, db-search is de ned, a search algorithm that traverses a state space signi cantly reduced when compared to traditional search algorithms. It is shown that under clearly de ned conditions the reduced state space is complete, which means that it contains all solutions present in the original state space. The potential of db-search is demonstrated on an example domain. In chapter 4, it is demonstrated how pn-search and db-search solved qubic. Similarly, in chapter 5 it is demonstrated how pn-search and db-search combined solved go-moku. In chapter 6 all games of the Olympic List are investigated, resulting in, among others, a prediction of the playing strengths of the strongest computer programs in the year 2000 and a discussion of the future of games in our society. Samenvatting Dit proefschrift beschrijft onderzoek naar denkspelen in het kader van de Kunstmatige Intelligentie. Uitgegaan is van denkspelen waarin de sterkste menselijke spelers hun kunstmatige opponenten, in elk geval op onderdelen, nog de baas waren. Dergelijke onderzoekingen kunnen leiden tot tenminste twee nuttige uitkomsten. Wanneer de achterstand op de menselijke topspelers geheel wordt ingelopen, dan leidt analyse van de wijze waarop dit bereikt wordt mogelijk tot het vinden van nieuwe ai-technieken. Wanneer ook na langdurige pogingen het niveau van de mens onhaalbaar blijkt, kan analyse van de gevonden problemen leiden tot het ontdekken van algemene obstakels voor vooruitgang in de Kunstmatige Intelligentie. Het is natuurlijk ook mogelijk dat de achterstand op de mens in een bepaald denkspel wordt ingehaald, maar dat reeds bestaande technieken gebruikt kunnen worden, of dat de gebruikte technieken geheel speci ek zijn voor dat spel en geen algemenere toepassing zullen vinden. Ook zou het zich kunnen voordoen dat langdurige pogingen tot analyse van de gevonden problemen tot niets leiden. Door een representatieve verzameling denkspelen te onderzoeken, achten wij de kans groot dat onderzoek bij een aantal daarvan tot nieuwe inzichten zal leiden. Deze verzameling, die der Olympische Denkspelen, bestaat uit: awari, backgammon, bridge, Chinees schaken, checkers, dammen, go, go-moku, molenspel, othello, qubic, renju, schaken, scrabble en vier-op-een-rij. In het onderzoek hebben we ons allereerst geconcentreerd op drie denkspelen die mogelijk opgelost konden worden: awari, qubic, en go-moku. Dit zijn denkspelen waarvoor het mogelijk lijkt uitspraken te bewijzen over strategieen die tot het best bereikbare resultaat leiden voor beide spelers. 189 190 Samenvatting Voor qubic en go-moku hebben we een strategie kunnen vaststellen die de eerste speler winst garandeert. Voor awari zijn we nog niet zover wel is een programma gecreeerd dat sterker speelt dan menselijke topspelers. Analyse en generalisatie van de methoden die tot de oplossing van qubic en gomoku leidden, hebben twee nieuwe ai technieken opgeleverd, namelijk de zoektechnieken proof-number search (pn-search) en dependency-based search (db-search). Awari staat op het punt opgelost te worden. We geloven dan ook dat bestaande technieken hiervoor afdoende zullen blijken te zijn. Vervolgens is voor elk van de Olympische Denkspelen nagegaan in hoeverre de afwijking tussen de speelniveau's van mensen en computers aanleiding geeft te veronderstellen dat een belangrijk obstakel de vooruitgang in de weg staat. Wij hebben gevonden dat met name het feit dat computerprogramma's onvoldoende in staat zijn relevante ervaring op te doen, door gebrek aan exibiliteit en lerend vermogen, dit bij sommige spelen leidt tot een wezenlijke achterstand ten opzichte van menselijke spelers. Het duidelijkst wordt dit gebrek bij bridge en go. We vermoeden dat zolang bij begrensde onderzoeksgebieden, zoals denkspelen, dergelijke obstakels vooruitgang in de weg staan, diezelfde obstakels een hindernis vormen bij vooruitgang in andere onderzoeksgebieden. Het proefschrift bestaat uit zes hoofdstukken. In hoofdstuk 1 worden de mogelijke produkten van onderzoek naar denkspelen beschreven. Er wordt een probleemstelling geformuleerd, evenals drie onderzoeksvragen. In hoofdstuk 2 wordt pn-search gede nieerd. Aan de hand van experimenten op awari wordt aangetoond dat pn-search een bepaald type zoekruimte aanzienlijk e cienter onderzoekt dan alternatieve zoekalgoritmen. In hoofdstuk 3 wordt db-search gede nieerd, een zoekalgoritme dat de zoekruimte die door traditionele zoektechnieken wordt onderzocht aanzienlijk verkleint. Er wordt aangetoond dat onder nauwkeurig gede nieerde omstandigheden de door db-search verkleinde zoekruimte volledig is, wat wil zeggen dat zij alle oplossingen van de oorspronkelijke ruimte bevat. Aan de hand van een voorbeeld wordt de potentie van db-search ge llustreerd. In hoofdstuk 4 wordt gedemonstreerd hoe pn-search en db-search qubic hebben opgelost, terwijl in hoofdstuk 5 het oplossen van go-moku met pn-search en db-search wordt beschreven. In hoofdstuk 6 worden alle Olympische Denkspelen onder de loep genomen, resulterend in, onder andere, een voorspelling van de speelsterkte van de beste computerprogramma's in het jaar 2000 en van de toekomst van denkspelen in onze samenleving. Curriculum Vitae Name: Date of birth: Place of birth: Nationality: Married to: Daughter: Email: L. Victor Allis May 19, 1965 Gemert, The Netherlands Dutch Petra Allis-Meinsma Cindy victor@cs.vu.nl Education Sept '77{Aug '83 Hermann Wesselink College, Amstelveen. Sept '83{Oct '88 Vrije Universiteit, Amsterdam, Master degree (with honors) Jan '90 {Aug '93 University of Limburg, Maastricht. Ph.D. student in Arti cial Intelligence. Supervisor: H.J. van den Herik. Work Experience Sept '85 {June '87 Teaching assistant at the Vrije Universiteit, Amsterdam. April '88{Sept '88 Free-lance Computer Science Teacher, novi, Maarssen. Jan '89 {Nov '89 Programmer, Analist, Project Leader at Advanced Management Systems, Takapuna, New Zealand. Jan '90 {Aug '93 Free-lance Computer Science Teacher, novi. Sept '93 { Assistant professor of Arti cial Intelligence at the Vrije Universiteit, Amsterdam. 191 192 Curriculum Vitae Bibliography 1] Allen J. D. (1989). A Note on the Computer Solution of ConnectFour. Heuristic Programming in Arti cial Intelligence 1: the rst computer olympiad (eds. D.N.L. Levy and D.F. Beal), pp. 134{135. Ellis Horwood, Chichester, England. (163) 2] Allis L.V. and Schoo P.N.A. (1992). Qubic Solved Again. Heuristic Programming in Arti cial Intelligence 3: the third computer olympiad (eds. H.J. Van den Herik and L.V. Allis), pp. 192{204. Ellis Horwood, Chichester, England. (95, 97) 3] Allis L.V. (1988). A Knowledge-Based Approach of Connect-Four. The Game is Solved: White wins. M.Sc. Thesis, Faculty of Mathematics and Computer Science, Vrije Universiteit, Amsterdam. (9, 60, 95, 163) 4] Allis L.V., Van den Herik H.J., and Herschberg I.S. (1991a). Which Games Will Survive? Heuristic Programming in Arti cial Intelligence 2: the second computer olympiad (eds. D.N.L. Levy and D.F. Beal), pp. 232{243. Ellis Horwood, Chichester, England. (182) 5] Allis L.V., Van der Meulen M., and Van den Herik H.J. (1991b). Conspiracy-Number Search. Advances in Computer Chess 6 (ed. D.F. Beal), pp. 73{95. Ellis Horwood, Chichester, England. (60, 62) 6] Allis L.V., Van der Meulen M., and Van den Herik H.J. (1991c). Databases in Awari. Heuristic Programming in Arti cial Intelligence 2: the second computer olympiad (eds. D.N.L. Levy and D.F. Beal), pp. 73{86. Ellis Horwood, Chichester, England. (46, 47, 166) 7] Allis L.V., Van den Herik H.J., and Huntjens M.P.H. (1993). GoMoku Solved by New Search Techniques. Proceedings of the 1993 193 194 BIBLIOGRAPHY AAAI Fall Symposium on Games: Planning and Learning. AAAI Press Technical Report FS93-02, Menlo Park, CA. (95, 97) Allis L.V., Van der Meulen M., and Van den Herik H.J. (1994). ProofNumber Search. Arti cial Intelligence, Vol. 66, No. 1, pp. 91{124. (62, 95) Anantharaman T.S., Campbell M.S., and Hsu F.-h. (1989). Singular Extensions: Adding Selectivity to Brute-Force Searching. Arti cial Intelligence, Vol. 43, No. 1, pp. 99{109. (47) Beal D.F. (1984). Mixing Heuristic and Perfect Evaluations: Nested Minimax. ICCA Journal, Vol. 7, No. 1, pp. 10{15. (46) Beasley J.D. (1985). The Ins & Outs of Peg Solitaire. Oxford University Press, Oxford. (6) Berlekamp Elwyn R. (1963). Programs for Double-Dummy Bridge Problems - A New Strategy for Mechanical Game Playing. Journal of the Association for Computing Machinery, Vol. 10, No. 4, pp. 357{ 364. (178) Berlekamp E.R., Conway J.H., and Guy R.K. (1982). Winning Ways for your mathematical plays II. Academic Press, London. (5) Berliner H.J. (1979). The B* Tree Search Algorithm: A Best-First Proof Procedure. Arti cial Intelligence, Vol. 12, pp. 23{40. (16, 62) Berliner H.J. (1980). Backgammon Computer Program Beats World Champion. Arti cial Intelligence, Vol. 14, pp. 205{220. (177) Blair J.R.S., Mutchler D., and Liu C. (1993). Games with Imperfect Information. Proceedings of the 1993 AAAI Fall Symposium on Games: Planning and Learning, pp. 59{67. AAAI Press Technical Report FS93-02, Menlo Park, CA. (178) Boon M. (1991). Overzicht van de ontwikkeling van een go spelend programma. M.Sc. Thesis, University of Amsterdam, The Netherlands. (174) Breuker D.M., Allis L.V., and Herik H.J. van den (1994). How to Mate: Applying Proof-Number Search. Advances in Computer Chess 7, pp. 251{272. (60, 61) 8] 9] 10] 11] 12] 13] 14] 15] 16] 17] 18] BIBLIOGRAPHY 195 19] Buchanan B.C. and Shortli e E.H. (1984). Rule-Based Expert Programs: the MYCIN Experiments of the Stanford Heuristic Programming Project. Addison-Wesley, Reading, MA. (2) 20] Campbell M.S. and Marsland T.A. (1983). A Comparison of Minimax Tree Search Algorithms. Arti cial Intelligence, Vol. 20, No. 4, pp. 347{367. (15, 61) 21] Carroll C.M. (1975). The Great Chess Automaton. Publications, Inc., New York. (1) Dover 22] Charniak E. (1978). On the Use of Framed Knowledge in Language Comprehension. Arti cial Intelligence, Vol. 11, pp. 225{265. (3) 23] Chen K. (1992). Attack and Defense. Heuristic Programming in Arti cial Intelligence 3: the third computer olympiad (eds. H.J. Van den Herik and L.V. Allis), pp. 146{156. Ellis Horwood Ltd, Chichester. (174) 24] De Groot A.D. (1965). Thought and Choice in Chess. Mouton Publishers, The Hague-Paris-New York. Translation, with additions, of a Dutch Ph.D. thesis from 1946. Second edition 1978. (3, 128) 25] Deledicq A. and Popova A. (1977). Wari et Solo: le jeu de calcul African. CEDIC, Paris. (43) 26] Dreyfus H.L. (1980). Why Computers Can't Be Intelligent. Creative Computing, Vol. 6, No. 3, pp. 72{78. (4) 27] Feigenbaum E.A. (1979). Themes and Case Studies of Knowledge Engineering. Expert Systems in the Micro-Electronic Age (ed. D. Michie), pp. 3{25. Edinburgh University Press, Edinburgh, Scotland. (3, 179) 28] Fikes R.E. and Nilsson N.J. (1971). STRIPS: A new approach to the application of theorem proving to arti cial intelligence. Arti cial Intelligence, Vol. 1, No. 2. (65) 29] Gasser R. (1990). Heuristic Search and Retrograde Analysis: their application to Nine Men's Morris. Diploma thesis, Swiss Federal Institute of Technology, Zurich. (165) 196 BIBLIOGRAPHY 30] Gasser R. (1991). Endgame Database Compression for Humans and Machines. Heuristic Programming in Arti cial Intelligence 3: the third computer olympiad (eds. H.J. Van den Herik and L.V. Allis), pp. 180{191. Ellis Horwood, Chichester, England. (9) 31] Gasser R. (1993). Personal Communication. (166) 32] Gnodde J. (1993). A da, New Search Techniques Applied to Othello. M.Sc. Thesis, University of Leiden, The Netherlands. (39, 60) 33] Greenblatt R.D., Eastlake III D.E., and Crocker S.D. (1967). The Greenblatt Chess Program. Proceedings of the Fall Joint Computing Conference, pp. 801{810. San Francisco. (39, 75) 34] Hall M.R. and Loeb D.E. (1992). Thoughts on Programming a Diplomat. Heuristic Programming in Arti cial Intelligence 3: the third computer olympiad (eds. H.J. Van den Herik and L.V. Allis), pp. 123{145. Ellis Horwood Ltd, Chichester. (6) 35] Hart P.E., Nilsson N.J., and Raphael B. (1968). A Formal Basis for the Heuristic Determination of Minimum Cost Paths. IEEE Transactions on SSC, Vol. 4. (17, 63) 36] Hart P.E., Nilsson N.J., and Raphael B. (1972). Correction to 'A Formal Basis for the Heuristic Determination of Minimum Cost Paths'. SIGART Newsletter, Vol. 37. (63) 37] Hofstadter D.R. (1979). Godel, Escher, Bach: an Eternal Golden Braid. Basic Books, New York. (13) 38] Hofstadter D.R. (1985). Metamagical Themas. Bantam Books, Toronto. (6) 39] Hsu F.H. (1990). Large Scale Parallelization of Alpha-Beta Search: An Algorithmic Architectural Study with Computer Chess. PhD thesis, Carnegie Mellon University, Pittsburgh, USA. (171, 172) 40] Keetman S. (1993). Personal Communication. (170) 41] Klingbeil N. and Schae er J. (1988). Search Strategies for Conspiracy Numbers. Canadian Arti cial Intelligence Conference, pp. 133{139. (60) BIBLIOGRAPHY 197 42] Klingbeil N. (1989). Search Strategies for Conspiracy Numbers. M.Sc. Thesis, University of Alberta, Edmonton, Alberta, Canada. (31, 60) 43] Knuth D.E. and Moore R.W. (1975). An Analysis of Alpha-Beta Pruning. Arti cial Intelligence, Vol. 6, No. 4, pp. 293{326. (15, 160) 44] Knuth D.E. (1969). The Art of Computer Programming, Vol. 2. Addison-Wesley Publishing Company. (p. 192 in the second (1981) edition). (8) 45] Korf R.E. (1985). Depth-First Iterative-Deepening: an Optimal Admissable Tree Search. Arti cial Intelligence, Vol. 27, pp. 97{109. (6) 46] Levy D.N.L. and Beal D.F. (eds.) (1989). Heuristic Programming in Arti cial Intelligence: the rst computer olympiad. Ellis Horwood, Chichester, England. (6, 156) 47] Levy D.N.L. and Beal D.F. (eds.) (1991). Heuristic Programming in Arti cial Intelligence 2: the second computer olympiad. Ellis Horwood, Chichester, England. (6, 46, 128, 156, 165) 48] Levy D.N.L. (1989). The Million Pound Bridge Program. Heuristic Programming in Arti cial Intelligence: the rst computer olympiad (eds. D.N.L. Levy and D.F. Beal), pp. 95{103. Ellis Horwood, Chichester, England. (178) 49] Lindelof E.T. (1983). COBRA - The Computer Designed Bidding System. London, Gollancz. (178) 50] Lister L. and Schae er J. (1994). An Analysis of the Conspiracy Numbers Algorithm. Computers and Mathematics with Applications, Vol. 27, No. 1, pp. 41{64. (60) 51] Marr D. (1977). Ariti cial Intelligence - A Personal View. Arti cial Intelligence, Vol. 9, pp. 37{48. (3) 52] McAllester D.A. (1988). Conspiracy Numbers for Min-Max Search. Arti cial Intelligence, Vol. 35, pp. 287{310. (16, 60) 53] Michalski R.S., Carbonell J.G., and Mitchell T.M. (1983). Machine Learning: An Arti cial Intelligence Approach, Vol. 1. Tioga, Palo Alto, CA. (3) 198 BIBLIOGRAPHY 54] Michalski R.S., Carbonell J.G., and Mitchell T.M. (1986). Machine Learning: An Arti cial Intelligence Approach, Vol. 2. Morgan Kaufmann, Los Altos, CA. (3) 55] Michie D. (1982). Information and Complexity in Chess. Advances in Computer Chess 3 (ed. M.R.B. Clarke), pp. 139{143. Pergamon Press, Oxford. (4) 56] Newell A., Shaw J.C., and Simon H.A. (1957). Preliminary Description of General Problem Solving Program-I (GPS-I). Report CIP Working Paper 7. (2) 57] Nilsson N.J. (1971). Problem Solving Methods in Arti cial Intelligence. McGraw-Hill, New York. (15) 58] Nilsson N.J. (1980). Principles of Arti cial Intelligence. Tioga, Palo Alto, CA. (14, 63) 59] Ohta T. (1993). Personal communication. (173) 60] Palay A.J. (1982). The B* tree search algorithm - new results. Arti cial Intelligence, Vol. 19, pp. 145{163. (63) 61] Patashnik O. (1980). Qubic: 4x4x4 Tic-Tac-Toe. Mathematics Magazine, Vol. 53, pp. 202{216. (95, 96, 109, 110, 112, 116, 119, 162) 62] Reinefeld A. (1994). A Minimax Algorithm Faster than AlphaBeta. Advances in Computer Chess 7 (eds. H.J. Van den Herik, I.S. Herschberg, and J.W.H.M. Uiterwijk), pp. 237{250. University of Limburg, Maastricht. (15) 63] Reznitsky A. and Chudako M. (1990). Pioneer: A Chess Program Modelling a Chess Master's Mind. International Computer Chess Association Journal, Vol. 13, No. 4, pp. 175{195. (171) 64] Sakata G. and Ikawa W. (1981). Five-In-A-Row. Renju. The Ishi Press, Inc., Tokyo. (122, 123, 129, 149) 65] Samuel A.L. (1959). Some Studies in Machine Learning Using the Game of Checkers. IBM Journal of Research and Development, Vol. 3, No. 3. (2) BIBLIOGRAPHY 199 66] Samuel A.L. (1967). Some Studies in Machine Learning Using the Game of Checkers II. Recent Progress. IBM Journal of Research and Development, Vol. 11, No. 6. (2) 67] Schae er J. (1989). Conspiracy Numbers. Arti cial Intelligence, Vol. 43, No. 1, pp. 67{84. (16, 60, 61) 68] Schae er J., Culberson J., Treloar N., Knight B., Lu P., and Szafron D. (1991). Reviving the Game of Checkers. Heuristic Programming in Arti cial Intelligence 2: the second computer olympiad (eds. D.N.L. Levy and D.F. Beal), pp. 119{136. Ellis Horwood Ltd., Chichester, England. (2, 168, 171) 69] Schae er J., Culberson J., Treloar N., Knight B., Lu P., and Szafron D. (1992). A World Championship Caliber Checkers Program. Arti cial Intelligence, Vol. 53, pp. 273{289. (2, 168) 70] Schae er J. (1993a). Personal Communication. (168, 169) 71] Schae er J. (1993b). A Re-Examination of Brute-Force Search. Proceedings of AAAI Fall Symposium on Games: Planning and Learning, pp. 51{58. AAAI Press Technical Report FS93-02, Menlo Park, CA. (5) 72] Schae er J. (1994). Personal Communication. (30) 73] Schijf M. (1993). Proof-Number Search and Transpositions. M.Sc. Thesis, University of Leiden, The Netherlands. (39, 40, 41, 42) 74] Schijf M., Allis L.V., and Uiterwijk J.W.H.M. (1994). Proof-Number Search and Transpositions. ICCA Journal, Vol. 17, No. 2, pp. 63{74. (39) 75] Schoo P.N.A. (1992). Optimal Play in a Single Bridge Suit. Personal Communication. (178) 76] Shortli e E.H. (1976). MYCIN: Computer-based Medical Consultations. Based on a PhD thesis, Stanford University, Stanford, CA, 1974. (2) 77] Stiller L. (1989). Parallel Analysis of Certain Endgames. ICCA Journal, Vol. 12, No. 2, pp. 55{64. (9) 200 BIBLIOGRAPHY 78] Stockman G. (1979). A Minimax Algorithm Better than Alpha-beta? Arti cial Intelligence, Vol. 12, pp. 179{196. (15, 61) 79] Tesauro G. (1993). TD-Gammon, A Self-Teaching Backgammon Program, Achieves Master-Level Play. AAAI Technical report FS9302 Games: Planning and Learning, pp. 19{23. AAAI Press Technical Report FS93-02, Menlo Park, CA. (177) 80] Thompson K. (1982). Computer Chess Strength. Advances in Computer Chess 3 (ed. M.R.B. Clarke), pp. 55{56. Pergamon Press. (5) 81] Thompson K. (1986). Retrograde Analysis of Certain Endgames. ICCA Journal, Vol. 9, No. 3, pp. 131{139. (9, 158) 82] Throop T. and Guilfoyle T. (1992). A Thrilling Hand. Heuristic Programming in Arti cial Intelligence 3: the third computer olympiad (eds. H.J. Van den Herik and L.V. Allis), pp. 27{28. Ellis Horwood Ltd, Chichester. (178) 83] Tromp J.T. (1993). Aspects of Algorithms and Complexity. University of Amsterdam. Ph.D. Thesis. (163) 84] Tsao Kuo-Ming, Li Horng, and Hsu Shun-Chin (1991). Design and Implementation of a Chinese Chess Program. Heuristic Programming in Arti cial Intelligence 2: the second computer olympiad (eds. D.N.L. Levy and D.F. Beal), pp. 108{118. Ellis Horwood Ltd, Chichester. (172) 85] Uiterwijk J.W.H.M., Van den Herik H.J., and Allis L.V. (1989a). A Knowledge-Based Approach to Connect-Four. The Game is Over: White to Move Wins! Heuristic Programming in Arti cial Intelligence: the rst computer olympiad (eds. D.N.L. Levy and D.F. Beal), pp. 113{133. Ellis Horwood Ltd, Chichester. (9, 163) 86] Uiterwijk J.W.H.M., Van den Herik H.J., and Allis L.V. (1989b). A Knowledge-Based Approach to Connect-Four. The Game is Over: White to Move Wins! Report CS 89-04, Department of Computer Science, Faculty of General Sciences, University of Limburg. (163) 87] Uiterwijk J.W.H.M. (1992a). Go-Moku still far from Optimality. Heuristic Programming in Arti cial Intelligence 3: the third computer BIBLIOGRAPHY 201 88] 89] 90] 91] 92] 93] 94] 95] 96] 97] 98] olympiad (eds. H.J. Van den Herik and L.V. Allis), pp. 47{50. Ellis Horwood Ltd, Chichester. (122, 164) Uiterwijk J.W.H.M. (1992b). Knowledge and Strategies in Go-Moku. Heuristic Programming in Arti cial Intelligence 3: the third computer olympiad (eds. H.J. Van den Herik and L.V. Allis), pp. 165{179. Ellis Horwood Ltd, Chichester. (126) Uljee I.H. (1992). Letters beyond Numbers. Heuristic Programming in Arti cial Intelligence 3: the third computer olympiad (eds. H.J. Van den Herik and L.V. Allis), pp. 63{66. Ellis Horwood Ltd, Chichester. (175) Van den Herik H.J. and Allis L.V. (eds.) (1992). Heuristic Programming in Arti cial Intelligence 3: the third computer olympiad. Ellis Horwood, Chichester, England. (6, 46, 128, 156) Van den Herik H.J. and Herschberg I.S. (1985). The Construction of an Omniscient Endgame Data Base. ICCA Journal, Vol. 8, No. 2, pp. 66{87. (9) Van den Herik H.J. and Herschberg I.S. (1989). Champ meets Champ. ICCA Journal, Vol. 12, No. 4. (171) Van den Herik H.J. (1983). Computerschaak, Schaakwereld en Kunstmatige Intelligentie. Academic Service, 's-Gravenhage. (4, 172) Van den Herik H.J. (1991). Kunnen computers rechtspreken? Gouda Quint BV, Arnhem. (2) Van der Meulen M. (1990). Conspiracy-Number Search. ICCA Journal, Vol. 13, No. 1, pp. 3{14. (16, 60, 61) von Neumann J. and Morgenstern O. (1944). Theory of Games and Economic Behavior. Princeton University Press, Princeton. (157) Winston P.H. (1992). Arti cial Intelligence. Addison Wesley, Reading, MA. 3rd edition. (5) Witmans P.A. (1994). Personal communication. (117) 202 BIBLIOGRAPHY Index Symbols C checkers, 2, 4{6, 34, 37, 157, 162, 169{172, 175, 179, 180, 182, 183, 187 chess, 1, 4{6, 8, 16, 34, 37, 39, 41, 49, 60, 61, 156, 157, 159, 160, 162, 171, 172, 175, 180{183, 187 Chinese chess, 6, 41, 162, 172, 180{ 183, 187 class, 74 closed attacker three, 104 closed defender three, 104 cn-search, 16, 60 collisions, 49 combination stage, 85, 104, 105 common-sense knowledge, 3 complexity, 155, 156 Computer Olympiad, 6, 46, 97 connect-four, 6, 9, 39, 40, 60, 61, 95, 157, 161, 163, 164, 179, 180, 182, 187 convergence, 155{157 conversion, 39, 157 CPU time, 50, 150 current node, 32 -cn search, 62 - search, 8, 15, 39, 43, 46 15-puzzle, 6 A A*, 63 add set, 65 adversary agent, 99, 102, 130 ancestor, 79 Arti cial Intelligence, 1 attribute, 65, 71, 102 automorphism, 97, 99, 144 awari, 6, 9, 10, 16, 31, 33, 37, 43{ 51, 59{61, 95, 162, 166, 167, 179, 180, 182, 188 awele, 43 B B*, 16, 62 backgammon, 6, 162, 176, 177, 180{182, 187 best- rst search, 15, 18, 62 breadth- rst search, 14, 66, 67, 91 bridge, 4{6, 156, 157, 162, 177{ 179, 181{183, 187, 188 broken three, 125 bug, 118, 152 203 D DAG, 39 database, 148 204 INDEX db-search, 10, 67, 97, 122 DCG, 39 defender four, 104 delete set, 65 depend on, 78 dependency stage, 85, 104, 105 depth- rst search, 14, 15, 29, 66, 67, 74, 91 diplomacy, 6 directional search, 15 disproof number, 19, 21 disproof set, 19 divergence, 157 double four, 123, 131 double threat, 98, 127 double three, 123, 131 double-letter puzzle, 68, 88 draughts, 2, 6, 31, 34, 37, 162, 169{ 172, 175, 180{183, 187 free-style go-moku, 124, 129, 131, 141{143, 148, 150, 153 G E e ort, 33 endgame database, 45 evaluation delayed, 17, 26 immediate, 17, 26, 33 evaluation function, 107 execution time, 31 experience, 169, 170, 172, 175, 179, 181, 182 extension, 73, 141 extension set, 141 F Godel code, 49 game property, 155 game-theoretic value, 43 game-tree complexity, 158, 160 game-tree search, 60 games non-trivial, 6 skill, 6 solving, 7 two-player, 5 well-known, 6 zero-sum, 6 General Problem Solver, 2 give-away chess, 6, 33{35, 60 global refutation, 139 go, 4{6, 41, 123, 162, 174, 175, 181{183, 187, 188 go-moku, 6, 9, 10, 33, 38{40, 43, 60, 90, 92, 93, 95{97, 121{ 126, 128{133, 135, 137, 141, 143, 144, 147{153, 155, 157, 158, 161, 164, 165, 167, 171, 174, 179, 180, 182, 187, 188 goal square, 137 goal state, 72, 103, 135 good shape, 129 graph, 39 group, 97, 99, 109 ve, 125 xed termination, 158 four, 125, 144 four-three, 131 H heuristic, 139, 140, 144 hex, 8 human expert, 128 INDEX 205 I imperfect information, 156 implicit threat, 147 initial state, 72, 103, 135 intuition, 3 iterative deepening, 15 N K key class, 75 key operator, 75, 106 knowledge representation, 13 L learning, 3 level, 85 life, 5 line of ve, 133 line of seven, 133 line of six, 133 nim, 6, 8 nine men's morris, 6, 9, 161, 165, 166, 179, 180, 182, 187 node and, 17, 143 child, 17 developing, 17, 18, 33 evaluation, 17, 27 expansion, 17 frontier, 17 internal, 17 leaf, 17 or, 17, 143 parent, 17 solved, 29 terminal, 17 traversals, 31 type, 17 value, 17, 18 non-uniformity, 60, 95 null-move heuristic, 147 M mancala, 43 merge, 79, 104 meta-move, 102, 104, 132 meta-operator, 80, 106 micro world, 4 mixed strategy, 157 mobility, 108 monotonicity, 76, 77, 103, 135, 137 Monte-Carlo simulation, 159 most-proving node, 22 MST*, 64 mu-puzzle, 13, 14 multiple-stone reply, 131 MYCIN, 2 O Olympic List, 6, 9, 96, 121 open attacker three, 104 open defender three, 104, 106 operator, 65, 71, 102, 133, 141 othello, 6, 37, 38, 60, 157, 158, 162, 167, 168, 180{183, 187 overline, 123, 141 P parent, 78 path, 73 peg solitaire, 6 perfect information, 7, 155, 156 206 INDEX ply, 5 pn-search, 10, 16, 143 assumptions, 19, 22, 32, 38 poker, 6 position easy, 52 hard, 52 potential winning threat sequence, 130 precede, 78 precondition set, 65 prisoners' dilemma, 6 problem solving, 13 problem statement, 7, 155, 179, 181 production system, 65, 81 proof number, 19, 20 proof set, 19 pure strategy, 156 reply, 130, 131 representation, 65 research question, 7, 95, 155, 179 Rubik's cube, 6 S Q qubic, 6, 9, 10, 39, 40, 43, 60, 90, 92, 93, 95{97, 99, 101{104, 106{110, 114, 116, 118, 119, 121, 122, 148, 155, 157, 158, 161{163, 167, 171, 179, 180, 182, 187, 188 R redundancy, 76, 77, 103, 135, 137 related square, 145, 147 related-squares heuristic, 147 relevant parent, 79 reliability, 118, 152 renju, 6, 97, 122{124, 162, 164, 173, 174, 179{183, 187 repetition, 40 scrabble, 6, 162, 175, 176, 180, 182, 187 search, 13, 65 shogi, 157, 158 single-agent search, 63, 99, 102, 126, 130, 135 singularity, 76, 77, 103, 135, 137 solution, 73 solution depth, 160 solution search tree, 160 solved, 2, 7, 9, 95, 96, 121, 148, 150, 153, 158, 159, 161{ 167, 179, 182 strongly, 7{9 ultra-weakly, 7{9 weakly, 7{9 SSS*, 15, 61 standard go-moku, 124, 129, 141{ 143, 148, 150, 151, 153 state atomic, 65 structured, 65 state space, 66, 71 state-space complexity, 9, 158, 159 straight four, 125 strategic move, 116 sudden death, 155, 156, 158 support, 78 T temporary black, 124 temporary white, 124 INDEX 207 test position, 51 threat, 101, 130, 145 threat category, 136 threat sequence, 101, 130 threat tree, 126, 127 threat-space search, 97, 122 three, 125, 144 tic-tac-toe, 6, 97, 122, 159, 160, 183 transposition, 39, 74 transposition table, 48, 75 tree and/or, 15, 16, 18, 107, 143 broad, 38 deep, 38 disproved, 18 game, 15 model, 17 narrow, 38 non-uniform, 16 proved, 18 shallow, 38 single-agent, 15 solved, 18, 55 uniform, 15 Triangle, 91, 185 Turk, 1 working memory, 14, 16, 29 U unchangeability, 157 W wari, 43 weak methods, 5 winning threat sequence, 101, 106, 130, 139 winning threat tree, 127, 139 winning threat variation, 127 ...
View Full Document

Ask a homework question - tutors are online