nsdi06preprint - In Proc. 3rd Symp. on Networked Systems...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
In Proc. 3rd Symp. on Networked Systems Design and Implementation (NSDI) , San Jose, CA, May, 2006 Pip: Detecting the Unexpected in Distributed Systems Patrick Reynolds * , Charles Killian , Janet L. Wiener , Jeffrey C. Mogul , Mehul A. Shah , and Amin Vahdat * Duke University UC San Diego HP Labs, Palo Alto Abstract Bugs in distributed systems are often hard to find. Many bugs reflect discrepancies between a system’s be- havior and the programmer’s assumptions about that be- havior. We present Pip 1 , an infrastructure for comparing actual behavior and expected behavior to expose struc- tural errors and performance problems in distributed sys- tems. Pip allows programmers to express, in a declara- tive language, expectations about the system’s communi- cations structure, timing, and resource consumption. Pip includes system instrumentation and annotation tools to log actual system behavior, and visualization and query tools for exploring expected and unexpected behavior 2 . Pip allows a developer to quickly understand and debug both familiar and unfamiliar systems. We applied Pip to several applications, including FAB, SplitStream, Bullet, and RanSub. We generated most of the instrumentation for all four applications au- tomatically. We found the needed expectations easy to write, starting in each case with automatically generated expectations. Pip found unexpected behavior in each ap- plication, and helped to isolate the causes of poor perfor- mance and incorrect behavior. 1 Introduction Distributed systems exhibit more complex behavior than applications running on a single node. For instance, a single logical operation may touch dozens of nodes and send hundreds of messages. Distributed behavior is also more varied, because the placement and order of events can differ from one operation to the next. Bugs in distributed systems are therefore hard to find, because they may affect or depend on many nodes or specific se- quences of behavior. In this paper, we present Pip, a system for auto- matically checking the behavior of a distributed sys- tem against a programmer’s expectations about the sys- tem. Pip classifies system behaviors as valid or invalid, groups behaviors into sets that can be reasoned about, and presents overall behavior in several forms suited to discovering or verifying the correctness of system behav- ior. Bugs in distributed systems can affect structure, per- formance, or both. A structural bug results in process- ing or communication happening at the wrong place or in the wrong order. A performance bug results in pro- cessing taking too much or too little of any important resource. For example, a request that takes too long may indicate a bottleneck, while a request that finishes too quickly may indicate truncated processing or some other error. Pip supports expressing expectations about both structure and performance and so can find a wide variety of bugs. We wrote Pip for three broad types of users:
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 12/08/2011 for the course CS 525 taught by Professor Gupta during the Spring '08 term at University of Illinois, Urbana Champaign.

Page1 / 14

nsdi06preprint - In Proc. 3rd Symp. on Networked Systems...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online