H
ASHING
Contents
z
Introduction
{
Test Yourself
#1
z
Lookup, Insert, and Delete
z
Choosing the Hashtable Size
z
Choosing the
Hash Function
{
Test Yourself #2
z
Summary
Introduction
Recall that for a balanced tree (e.g., a 2-3 tree), the insert, lookup, and delete operations can be
performed in time logarithmic in the number of values stored in the tree. Can we do better? Yes! We
can use a technique called
hashing
that is logarithmic in the worst case, but has expected time O(1)!
The basic idea is to store values (unique keys plus perhaps some associated data) in an array,
computing each key's position in the array as a function of its value. This takes advantage of the fact
that we can do a subscripting operation on an array in constant time.
For example, suppose we want to store information about 50 employees, each of whom has a unique
ID number. The ID numbers start with 100, and the highest ID number for any employee is 200 (i.e.,
there are a total of 101 possible ID numbers, all in the range 100 to 200). In this case, we can use an
array of size 101, and we can store the information about the employee with ID number k in array[k-
100]. The insert, lookup, and delete operations will all be constant time. The only disadvantage is
that some of the array entries will be empty (i.e., will contain null to indicate that no information is
stored there), so some space will be wasted.
Before we go on, here is some terminology:
z
The array is called the
hashtable
.
z
We will refer to the size of the array as
TABLE_SIZE
.
z
The function that maps a key value to the array index in which that key (and its associated
data) will be stored is called the
hash function
. For this example, the key is the employee's ID
number, and the hash function is: hash(k) = k - 100.
Now, think about the problem of storing information about
students
based on their ID numbers. The
problem is that student ID numbers are in a
large
range (student IDs have 10 digits, so there are 10
10
possible values). In this case, it is probably not practical to use an array with one place for each
possible value.
The solution is to use a "reasonable" sized array (more about this later), and to use a hash function
that maps ID numbers to places in that array. If we can find a hash function that, given only a small
Page 1 of 5
Hashing
2008/3/27
http://pages.cs.wisc.edu/~cs367-1/topics/Hashing/