Programming
R. Morris
Techniques
Editor
Reducing the
Retrieval Time of
Scatter Storage
Techniques
Richard P. Brent,
IBM Thomas J. Watson Research Center
A new method for entering and retrieving
information in a hash table is described. The method is
intended to be efficient if most entries are looked up several
times. The expected number of probes to look up an entry,
predicted theoretically and verified by Monte Carlo
experiments, is considerably less than for other comparable
methods if the table is nearly full. An example of a possible
Fortran implementation is given.
Key Words and Phrases: address calculation, content
addressing, file searching, hash addressing, hash code,
linear probing, linear quotient method, scatter storage,
searching, symbol table
CR Categories: 3.7, 3.73, 3.74, 4.1, 4.9
1. Introduction
Scatter storage (hash coding) techniques are used to
minimize the time required to enter and retrieve infor
mation in tables. 'Rather similar techniques can be used
for internal tables, such as the symbol tables of com
pilers and assemblers, and large files which are stored
on randomaccess devices such as disks or drums.
Some of these techniques are described in an excellent
survey paper [5] and more recently in [1, 2, and 6].
Our aim is to describe a method for entering infor
Copyright © 1973, Association for Computing Machinery, Inc.
General permission to republish, but not for profit, all or part
of this material is granted, provided that reference is made to this
publication, to its date of issue, and to the fact that reprinting
privileges were granted by permission of the Association for Com
puting Machinery.
Author's present address: Computer Centre. Australian Na
tional University, P.O. Box 4, Canberra, ACT 2600, Australia.
105
mation so that subsequent retrievals are very efficient.
Suppose that each item consists of an identifying name
or
key,
which may be regarded as an integer, and an
associated
value.
If m keys kx, . ,km are stored at
addresses
a(kl),
...,
a(km)
in a table T of length
n _> m (i.e.
T(a(ki)) = kl
for i =
1, ... , m) and a
key k is given, the problem is to determine efficiently
whether k is in T, and if so, to find
a(k).
In order to
compare the efficiency of different algorithms, we count
the number of fetches of elements of
T, i.e. probes,
that
they require.
In practical applications it usually happens that most
entries in the table are looked up several times. Bell and
Kaman [2] found that their hashing routine was en
tered 10,988 times, but with only 735 different keys,
when a typical COBOL program was compiled. As a
more extreme example, a table of opcode mnemonics
or reserved words may be built up once and thereafter
used purely for retrieval [1]. Thus it is very important to
minimize the number of probes required to look up keys
which are already in the table. The number of probes
required to look up (and perhaps insert) keys which are
not already there is not so important.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
This is the end of the preview.
Sign up
to
access the rest of the document.
 Fall '08
 staff
 hash function, Cauchy distribution, linear quotient method

Click to edit the document details