Algorithms
Lecture 7: Hash Tables
Calvin:
There! I finished our secret code!
Hobbes:
Let’s see.
Calvin:
I assigned each letter a totally random number, so the code will be hard to
crack. For letter “A”, you write 3,004,572,688. “B” is 28,731,569
1
/
2
.
Hobbes:
That’s a good code all right.
Calvin:
Now we just commit this to memory.
Calvin:
Did you finish your map of our neighborhood?
Hoobes:
Not yet. How many bricks does the front walk have?
— Bill Watterson, “Calvin and Hobbes” (August 23, 1990)
7
Hash Tables
7.1
Introduction
A
hash table
is a data structure for storing a set of items, so that we can quickly determine whether an
item is or is not in the set. The basic idea is to pick a
hash function
h
that maps every possible item
x
to
a small integer
h
(
x
)
. Then we store
x
in slot
h
(
x
)
in an array. The array is the hash table.
Let’s be a little more specific. We want to store a set of
n
items. Each item is an element of some
finite
1
set
U
called the
universe
; we use
u
to denote the size of the universe, which is just the number of
items in
U
. A hash table is an array
T
[
1
..
m
]
, where
m
is another positive integer, which we call the
table size
. Typically,
m
is much smaller than
u
. A
hash function
is any function of the form
h
:
U
→ {
0,1,...,
m

1
}
,
mapping each possible item in
U
to a slot in the hash table. We say that an item
x
hashes
to the slot
T
[
h
(
x
)]
.
Of course, if
u
=
m
, then we can always just use the trivial hash function
h
(
x
) =
x
. In other words,
use the item itself as the index into the table. This is called a
direct access table
, or more commonly, an
array
. In most applications, though, the universe of possible keys is orders of magnitude too large for
this approach to be practical. Even when it is possible to allocate enough memory, we usually need to
store only a small fraction of the universe. Rather than wasting lots of space, we should make
m
roughly
equal to
n
, the number of items in the set we want to maintain.
What we’d like is for every item in our set to hash to a different position in the array. Unfortunately,
unless
m
=
u
, this is too much to hope for, so we have to deal with
collisions
. We say that two items
x
and
y
collide
if the have the same hash value:
h
(
x
) =
h
(
y
)
. Since we obviously can’t store two items
in the same slot of an array, we need to describe some methods for
resolving
collisions. The two most
common methods are called
chaining
and
open addressing
.
7.2
Chaining
In a
chained
hash table, each entry
T
[
i
]
is not just a single item, but rather (a pointer to) a linked list of
all the items that hash to
T
[
i
]
. Let
‘
(
x
)
denote the length of the list
T
[
h
(
x
)]
. To see if an item
x
is in
the hash table, we scan the entire list
T
[
h
(
x
)]
. The worstcase time required to search for
x
is
O
(
1
)
to
compute
h
(
x
)
plus
O
(
1
)
for every element in
T
[
h
(
x
)]
, or
O
(
1
+
‘
(
x
))
overall. Inserting and deleting
x
also take
O
(
1
+
‘
(
x
))
time.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
This is the end of the preview.
Sign up
to
access the rest of the document.
 Spring '09
 A
 hash function, uniform hashing assumption

Click to edit the document details