Hashing:
We have seen various data structures (e.g., binary trees, AVL trees, splay trees, skip lists)
that can perform the dictionary operations insert(), delete() and find(). We know that
these data structures provide
O
(log
n
) time access. It is unreasonable to ask any sort of tree
based structure to do better than this, since to store
n
keys in a binary tree requires at least
(log
n
) height. Thus one is inclined to think that it is impossible to do better. Remarkably,
there is a better method, at least if one is willing to consider expected case rather than worst
case performance.
Hashing
is a method that performs all the dictionary operations in
O
(1) (i.e. constant) expected
time, under some assumptions about the hashing function being used. Hashing is considered so
good, that in contexts where just these operations are being performed, hashing is the method
of choice (e.g. symbol tables for compilers are almost always implemented using hashing).
Treebased data structures are generally prefered in the following situations:
_
When storing data on
secondary storage
(e.g. using Btrees),
_
When knowledge of the relative
order
of elements is important (e.g. if a find() fails, I
may want to know the nearest key. Hashing cannot help us with this.)
The idea behind hashing is very simple. We have a table containing
m
entries. We select a
hash function h
(
x
), which is an easily computable function that maps a key
x
to a \virtually
random" index in the range [0.
.m1]. We will then attempt to store the key in index
h
(
x
)
in the table. Of course, it may be that di_erent keys are mapped to the same location. This
is called a
collision
. We need to consider how collisions are to be handled, but observe that if
the hashing function does a good job of scattering the keys around the table, then the chances
of a collision occuring at any index of the table are about the same. As long as the table size
is at least as large as the number of keys, then we would expect that the number of keys that
are map to the same cell should be small.
Hashing is quite a versatile technique. One way to think about hashing is as a means of
implementing a
contentaddressable array
. We know that arrays can be addressed by an integer
index. But it is often convenient to have a lookup table in which the elements are addressed
by a key value which may be of any discrete type, strings for example or integers that are over
such a large range of values that devising an array of this size would be impractical. Note that
hashing is not usually used for continuous data, such as floating point values, because similar
keys 3
:
14159 and 3
:
14158 may be mapped to entirely di_erent locations.
There are two important issues that need to be addressed in the design of any hashing system.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '10
 MITIN
 hash function, hashing function, Lecture Notes CMSC

Click to edit the document details