LNAI98-tpot-rl - Team-Partitioned, Opaque-Transition...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
Team-Partitioned, Opaque-Transition Reinforcement Learning Peter Stone and Manuela Veloso Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 pstone,veloso @cs.cmu.edu http://www.cs.cmu.edu/ ˜pstone,˜mmv In RoboCup-98: Robot Soccer World Cup II , M. Asada and H. Kitano (eds.), 1999. Springer Verlag, Berlin. Abstract In this paper, we present a novel multi-agent learning paradigm called team-partitioned, opaque-transition rein- forcement learning (TPOT-RL). TPOT-RL introduces the concept of using action-dependent features to generalize the state space. In our work, we use a learned action-dependent feature space. TPOT-RL is an effective technique to allow a team of agents to learn to cooperate towards the achievement of a specific goal. It is an adaptation of traditional RL methods that is applicable in complex, non-Markovian, multi-agent domains with large state spaces and limited training opportunities. Multi-agent scenarios are opaque-transition, as team members are not always in full communication with one another and adversaries may affect the environment. Hence, each learner cannot rely on having knowledge of future state transitions after acting in the world. TPOT-RL enables teams of agents to learn effective policies with very few training examples even in the face of a large state space with large amounts of hidden state. The main responsible features are: dividing the learning task among team members, using a very coarse, action-dependent feature space, and allowing agents to gather reinforcement directly from observation of the environment. TPOT-RL is fully implemented and has been tested in the robotic soccerdomain,a complex, multi-agent framework. This paper presents the algorithmic details of TPOT-RL as well as empirical results demonstrating the effectiveness of the developed multi-agent learning approach with learned features. 1 Introduction Reinforcement learning (RL) is an effective paradigm for training an artificial agent to act in its environment in pursuit of a goal. RL techniques rely on the premise that an agent’s action policy affects its overall reward over time. As surveyed in [Kaelbling, Littman, & Moore, 1996], several popular RL techniques use dynamic programming to enable a single agent to learn an effective control policy as it traverses a stationary (Markovian) environment. Dynamic programming requires that agents have or learn at least an approximate model of the state transitions resulting from its actions. Q-values encode future rewards attainable from neighboring states. A single agent can keep track of state transitions as its actions move it from state to state. This paper focusses on teams of agents learning to collaborate towards a common goal in adversarial environments.
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 08/25/2011 for the course EGN 3060c taught by Professor Sukthankar,g during the Fall '08 term at University of Central Florida.

Page1 / 10

LNAI98-tpot-rl - Team-Partitioned, Opaque-Transition...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online