Using Execution History to Solve Concurrency Problems
Senior Member, IEEE
, and J.D. Slingwine
The problems of synchronization overhead, contention, and deadlock can pose serious
challenges to those designing and implementing parallel programs.
Therefore, many researchers
have proposed parallel update disciplines that greatly reduce these problems in restricted but
commonly occurring situations, for example, read-mostly data structures [7, 10, 11, 13, 18].
However, these proposals rely either on garbage collectors [10, 11], termination of all processes
currently using the data structure , or expensive explicit tracking of all processes accessing the
data structure [7, 18].
These mechanisms are inappropriate in many cases, such as within many
operating-system kernels and server applications.
This paper proposes a novel and extremely
efficient mechanism, called read-copy update, and compares its performance to that of conventional
locking primitives under conditions of both low and high contention in read-intensive data structures.
—Shared memory, mutual exclusion, reader-writer locking, performance, contention.
synchronization overhead will continue to be
implementation because increases in CPU-
core instruction-execution rate are expected
to continue to outstrip reductions in global
latency for large-scale multiprocessors [3, 6,
This trend will cause global lock and
becoming more costly relative to instructions
that manipulate local data.
lock operations are particularly troublesome
when the locks are used to guard read-mostly
In this common special case,
reading processes pay a heavy penalty to
guard against very rare events.
To see how
rare these events can be, consider the
following two examples.
The first example is a routing table for a
system connected to the Internet.
Internet routing protocols process routing
changes at most every minute or so.
Therefore, a system transmitting at the low
rate of 100 packets per second would need to
perform a routing-table update at most once
per 6,000 packets, for an update fraction
less than 10
The second example is a system with 100
mirrored disks, each of which has an MTBF