blackboxes-sosp03 - Performance Debugging for Distributed...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
Performance Debugging for Distributed Systems of Black Boxes Marcos K. Aguilera Jeffrey C. Mogul Janet L. Wiener HP Labs, Palo Alto Marcos.Aguilera,Jeff.Mogul, Janet.Wiener @hp.com Patrick Reynolds Duke University reynolds@cs.duke.edu Athicha Muthitacharoen MIT Lab for Computer Science athicha@lcs.mit.edu ABSTRACT Many interesting large-scale systems are distributed systems of multiple communicating components. Such systems can be very hard to debug, especially when they exhibit poor performance. The problem becomes much harder when systems are composed of “black-box” components: software from many different (per- haps competing) vendors, usually without source code available. Typical solutions-provider employees are not always skilled or ex- perienced enough to debug these systems efFciently. Our goal is to design tools that enable modestly-skilled programmers (and ex- perts, too) to isolate performance bottlenecks in distributed systems composed of black-box nodes. We approach this problem by obtaining message-level traces of system activity, as passively as possible and without any knowledge of node internals or message semantics. We have developed two very different algorithms for inferring the dominant causal paths through a distributed system from these traces. One uses tim- ing information from RPC messages to infer inter-call causality; the other uses signal-processing techniques. Our algorithms can ascribe delay to speciFc nodes on speciFc causal paths. Unlike previous approaches to similar problems, our approach requires no modiFcations to applications, middleware, or messages. Categories and Subject Descriptors D.2.5 [ Software Engineering ]: Testing and Debugging— distrib- uted debugging, testing tools General Terms Algorithms, Performance, Measurement Keywords Performance debugging, black box systems, distributed systems, performance analysis The order of author names is random. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proFt or commercial advantage and that copies bear this notice and the full citation on the Frst page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior speciFc permission and/or a fee. SOSP'03, October 19–22, 2003, Bolton Landing, New York, USA. Copyright 2003 ACM 1-58113-757-5/03/0010 . ..$5.00. 1. INTRODUCTION Many commercially-important systems, especially Web-based applications, are composed of a number of communicating com- ponents. These are often structured as distributed systems, with components running on different processors or in different pro- cesses. ±or example, a multi-tiered system might start with requests from Web clients that ²ow through a Web-server front-end and then to a Web “application server,” which in turn makes calls to a data- base server, and perhaps additional services (authentication, name service, credit-card authorization, customer relationship manage- ment, etc.).
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 16

blackboxes-sosp03 - Performance Debugging for Distributed...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online