osdi2008-chess - Finding and Reproducing Heisenbugs in...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
Finding and Reproducing Heisenbugs in Concurrent Programs Madanlal Musuvathi Microsoft Research Shaz Qadeer Microsoft Research Thomas Ball Microsoft Research Gerard Basler ETH Zurich Piramanayagam Arumuga Nainar University of Wisconsin, Madison Iulian Neamtiu University of California, Riverside Abstract Concurrency is pervasive in large systems. Unexpected interference among threads often results in “Heisenbugs” that are extremely difficult to reproduce and eliminate. We have implemented a tool called CHESS for finding and reproducing such bugs. When attached to a program, CHESS takes control of thread scheduling and uses ef- ficient search techniques to drive the program through possible thread interleavings. This systematic explo- ration of program behavior enables CHESS to quickly uncover bugs that might otherwise have remained hid- den for a long time. For each bug, CHESS consistently reproduces an erroneous execution manifesting the bug, thereby making it significantly easier to debug the prob- lem. CHESS scales to large concurrent programs and has found numerous bugs in existing systems that had been tested extensively prior to being tested by CHESS. CHESS has been integrated into the test frameworks of many code bases inside Microsoft and is used by testers on a daily basis. 1 Introduction Building concurrent systems is hard. Subtle interactions among threads and the timing of asynchronous events can result in concurrency errors that are hard to find, reproduce, and debug. Stories are legend of so-called “Heisenbugs” [18] that occasionally surface in systems that have otherwise been running reliably for months. Slight changes to a program, such as the addition of debugging statements, sometimes drastically reduce the likelihood of erroneous interleavings, adding frustration to the debugging process. The main contribution of this paper is a new tool called CHESS for systematic and deterministic testing of con- current programs. When attached to a concurrent pro- gram, CHESS takes complete control over the scheduling of threads and asynchronous events, thereby capturing all the interleaving nondeterminism in the program. This provides two important benefits. First, if an execution re- sults in an error, CHESS has the capability to reproduce the erroneous thread interleaving. This substantially im- proves the debugging experience. Second, CHESS uses systematic enumeration techniques [10, 37, 17, 31, 45, 22] to force every run of the program along a differ- ent thread interleaving. Such a systematic exploration greatly increases the chances of finding errors in exist- ing tests. More importantly, there is no longer a need to artificially “stress” the system, such as increasing the number of threads, in order to get interleaving coverage — a common and recommended practice in testing con- current systems. As a result, CHESS can find in simple configurations errors that would otherwise only show up in more complex configurations. To build a systematic testing tool for real-world con-
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 02/24/2012 for the course CSE 503 taught by Professor Davidnotikin during the Winter '11 term at University of Washington.

Page1 / 14

osdi2008-chess - Finding and Reproducing Heisenbugs in...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online