friday-nsdi07 - Friday Global Comprehension for Distributed...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
Friday : Global Comprehension for Distributed Replay Dennis Geels * , Gautam Altekar , Petros Maniatis φ , Timothy Roscoe , Ion Stoica * Google, Inc., University of California at Berkeley, φ Intel Research Berkeley, ETH Z¨urich Abstract Debugging and profiling large-scale distributed applica- tions is a daunting task. We present Friday , a system for debugging distributed applications that combines de- terministic replay of components with the power of sym- bolic, low-level debugging and a simple language for ex- pressing higher-level distributed conditions and actions. Friday allows the programmer to understand the col- lective state and dynamics of a distributed collection of coordinated application components. To evaluate Friday , we consider several distributed problems, including routing consistency in overlay net- works, and temporal state abnormalities caused by route flaps. We show via micro-benchmarks and larger-scale application measurement that Friday can be used inter- actively to debug large distributed applications under re- play on common hardware. 1 Introduction Distributed applications are complex, hard to design and implement, and harder to validate once deployed. The difficulty derives from the distribution of applica- tion state across many distinct execution environments, which can fail individually or in concert, span large ge- ographic areas, be connected by brittle network chan- nels, and operate at varying speeds and capabilities. Cor- rect operation is frequently a function not only of single- component behavior, but also of the global collection of states of multiple components. For instance, in a mes- sage routing application, individual routing tables may appear correct while the system as a whole exhibits rout- ing cycles, flaps, wormholes or other inconsistencies. To face this difficulty, ideally a programmer would be able to debug the whole application , inspecting the state of any component at any point during a debugging ex- ecution, or even creating custom invariant checkers on global predicates that can be globally evaluated continu- ously as the system runs. In the routing application ex- ample, a programmer would be able to program her de- bugger to check continuously that no routing cycles exist across the running state of the entire distributed system, as easily as she can read the current state of program vari- ables in typical symbolic debuggers. Friday , the system we present in this paper, is a first step towards realizing this vision. Friday (1) cap- tures the distributed execution of a system, (2) replays the captured execution trace within a symbolic debug- ger, and (3) extends the debugger’s programmability for complex predicates that involve the whole state of the re- played system. To our knowledge, this is the first replay- based debugging system for unmodified distributed ap- plications that can track arbitrary global invariants at the fine granularity of source symbols. Capture and replay in
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 12/08/2011 for the course CS 525 taught by Professor Gupta during the Spring '08 term at University of Illinois, Urbana Champaign.

Page1 / 14

friday-nsdi07 - Friday Global Comprehension for Distributed...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online