This preview shows pages 1–2. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Journal of Machine Learning Research 7 (2006) 877-917 Submitted 6/05; Revised 12/05; Published 5/06 Evolutionary Function Approximation for Reinforcement Learning Shimon Whiteson SHIMON@CS.UTEXAS.EDU Peter Stone PSTONE@CS.UTEXAS.EDU Department of Computer Sciences University of Texas at Austin 1 University Station, C0500 Austin, TX 78712-0233 Editor: Georgios Theocharous Abstract Temporal difference methods are theoretically grounded and empirically effective methods for ad- dressing reinforcement learning problems. In most real-world reinforcement learning tasks, TD methods require a function approximator to represent the value function. However, using function approximators requires manually making crucial representational decisions. This paper investi- gates evolutionary function approximation , a novel approach to automatically selecting function approximator representations that enable efficient individual learning. This method evolves indi- viduals that are better able to learn . We present a fully implemented instantiation of evolutionary function approximation which combines NEAT, a neuroevolutionary optimization technique, with Q-learning, a popular TD method. The resulting NEAT+Q algorithm automatically discovers ef- fective representations for neural network function approximators. This paper also presents on-line evolutionary computation , which improves the on-line performance of evolutionary computation by borrowing selection mechanisms used in TD methods to choose individual actions and using them in evolutionary computation to select policies for evaluation. We evaluate these contributions with extended empirical studies in two domains: 1) the mountain car task, a standard reinforcement learning benchmark on which neural network function approximators have previously performed poorly and 2) server job scheduling, a large probabilistic domain drawn from the field of autonomic computing. The results demonstrate that evolutionary function approximation can significantly im- prove the performance of TD methods and on-line evolutionary computation can significantly im- prove evolutionary methods. This paper also presents additional tests that offer insight into what factors can make neural network function approximation difficult in practice. Keywords: reinforcement learning, temporal difference methods, evolutionary computation, neu- roevolution, on-line learning 1. Introduction In many machine learning problems, an agent must learn a policy for selecting actions based on its current state . Reinforcement learning problems are the subset of these tasks in which the agent never sees examples of correct behavior. Instead, it receives only positive and negative rewards for the actions it tries. Since many practical, real world problems (such as robot control, game playing, and system optimization) fall in this category, developing effective reinforcement learning algorithms is critical to the progress of artificial intelligence....
View Full Document