Finding and Reproducing Heisenbugs in Concurrent Programs
Piramanayagam Arumuga Nainar
University of Wisconsin, Madison
University of California, Riverside
Concurrency is pervasive in large systems. Unexpected
interference among threads often results in “Heisenbugs”
that are extremely difficult to reproduce and eliminate.
We have implemented a tool called CHESS for finding
and reproducing such bugs. When attached to a program,
CHESS takes control of thread scheduling and uses ef-
ficient search techniques to drive the program through
possible thread interleavings.
This systematic explo-
ration of program behavior enables CHESS to quickly
uncover bugs that might otherwise have remained hid-
den for a long time. For each bug, CHESS consistently
reproduces an erroneous execution manifesting the bug,
thereby making it significantly easier to debug the prob-
lem. CHESS scales to large concurrent programs and
has found numerous bugs in existing systems that had
been tested extensively prior to being tested by CHESS.
CHESS has been integrated into the test frameworks of
many code bases inside Microsoft and is used by testers
on a daily basis.
Building concurrent systems is hard. Subtle interactions
among threads and the timing of asynchronous events
can result in concurrency errors that are hard to find,
reproduce, and debug. Stories are legend of so-called
“Heisenbugs”  that occasionally surface in systems
that have otherwise been running reliably for months.
Slight changes to a program, such as the addition of
debugging statements, sometimes drastically reduce the
likelihood of erroneous interleavings, adding frustration
to the debugging process.
The main contribution of this paper is a new tool called
CHESS for systematic and deterministic testing of con-
current programs. When attached to a concurrent pro-
gram, CHESS takes complete control over the scheduling
of threads and asynchronous events, thereby capturing
the interleaving nondeterminism in the program. This
provides two important benefits. First, if an execution re-
sults in an error, CHESS has the capability to reproduce
the erroneous thread interleaving. This substantially im-
proves the debugging experience. Second, CHESS uses
systematic enumeration techniques [10, 37, 17, 31, 45,
22] to force every run of the program along a differ-
ent thread interleaving. Such a systematic exploration
greatly increases the chances of finding errors in exist-
ing tests. More importantly, there is no longer a need
to artificially “stress” the system, such as increasing the
number of threads, in order to get interleaving coverage
— a common and recommended practice in testing con-
current systems. As a result, CHESS can find in simple
configurations errors that would otherwise only show up
in more complex configurations.
To build a systematic testing tool for real-world con-