1Q4EA7B6Bd01

1Q4EA7B6Bd01 - P rogramming T echniques R. Morris Editor...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
Programming R. Morris Techniques Editor Reducing the Retrieval Time of Scatter Storage Richard P. Brent, IBM Thomas J. Watson Research Center A new method for entering and retrieving information in a hash table is described. The method is intended to be efficient if most entries are looked up several times. The expected number of probes to look up an entry, predicted theoretically and verified by Monte Carlo experiments, is considerably less than for other comparable methods if the table is nearly full. An example of a possible Fortran implementation is given. Key Words and Phrases: address calculation, content addressing, file searching, hash addressing, hash code, linear probing, linear quotient method, scatter storage, searching, symbol table CR Categories: 3.7, 3.73, 3.74, 4.1, 4.9 1. Introduction Scatter storage (hash coding) techniques are used to minimize the time required to enter and retrieve infor- mation in tables. 'Rather similar techniques can be used for internal tables, such as the symbol tables of com- pilers and assemblers, and large files which are stored on random-access devices such as disks or drums. Some of these techniques are described in an excellent survey paper [5] and more recently in [1, 2, and 6]. Our aim is to describe a method for entering infor- Copyright © 1973, Association for Computing Machinery, Inc. General permission to republish, but not for profit, all or part of this material is granted, provided that reference is made to this publication, to its date of issue, and to the fact that reprinting privileges were granted by permission of the Association for Com- puting Machinery. Author's present address: Computer Centre. Australian Na- tional University, P.O. Box 4, Canberra, ACT 2600, Australia. 105 mation so that subsequent retrievals are very efficient. Suppose that each item consists of an identifying name or key, which may be regarded as an integer, and an associated value. If m keys kx, -.- ,km are stored at addresses a(kl), . .., a(km) in a table T of length n _> m (i.e. T(a(ki)) = kl for i = 1, . .. , m) and a key k is given, the problem is to determine efficiently whether k is in T, and if so, to find a(k). In order to compare the efficiency of different algorithms, we count the number of fetches of elements of T, i.e. probes, that they require. In practical applications it usually happens that most entries in the table are looked up several times. Bell and Kaman [2] found that their hashing routine was en- tered 10,988 times, but with only 735 different keys, when a typical COBOL program was compiled. As a more extreme example, a table of opcode mnemonics or reserved words may be built up once and thereafter used purely for retrieval [1]. Thus it is very important to minimize the number of probes required to look up keys which are already in the table. The number of probes required to look up (and perhaps insert) keys which are not already there is not so important. The idea of our method, which is described in de- tail in Section 2, is to take more care than usual when keys are inserted, in an attempt to reduce the number of probes required for subsequent lookups. Although we
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 01/31/2010 for the course CSE 101 taught by Professor Staff during the Fall '08 term at UCSD.

Page1 / 5

1Q4EA7B6Bd01 - P rogramming T echniques R. Morris Editor...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online