Introduction to Algorithms
Lecture 5

2
Recap
•
Show that
(
n
log
n
)
is the best possible
running time for a sorting algorithm.
•
Design an algorithm that sorts in
linear
time.
•
Order statistics

3
Today’s topics
•
Direct-accessible table
•
Hash tables
•
Hash functions
–
Universal hashing
•
Perfect Hashing
•
Open addressing

4
Data Structures
•
Role of data structures:
–
Encapsulate data
–
Support certain operations (e.g., INSERT,
DELETE, SEARCH)
•
Our focus: efficiency of the operations
•
Algorithms vs. data structures

5
Symbol-table problem
Symbol table
T
holding
n
records
:
How should the data structure
T
be organized?
record
key
[
x
]
x
Other
fields
containing
satellite data
Operations on
T:
•
INSERT
(
T,x
)
•
DELETE
(
T,x
)
•
SEARCH
(
T,k
)

6
Direct-accessible table
IDEA:
Suppose that the set of keys is
K
{0,
1, …,
m
–1}
, and keys are distinct.
Set up an array
T
[0 . .
m
–1]
:
Then, operations take
(1)
time.
Problem:
The range of keys can be large:
•
64
-bit numbers (which represent
18,446,744,073,709,551,616
different keys),
•
character strings (even larger!).
NIL
x
k
T
]
[
if
k
K
and
keys
[
x
]
= k
otherwise.

7
Hash functions
Solution:
Use a
hash function
h
to map the
universe
U
of all keys into
{0, 1, …,
m
–1}
:
When a record
to be inserted maps to an already
occupied slot in
T
,
a
collision
occurs.
T
0
h
(
k
1
)
h
(
k
4
)
h
(
k
2
) =
h
(
k
5
)
h
(
k
3
)
m
-1
k
1
k
5
k
4
k
2
k
3
K

8
Resolving collisions by
chaining
•
Records in the same slot are linked into a
list.
T
49
86
52
i
h
(49) =
h
(86) =
h
(52) =
i

9
Analysis of chaining
We make the assumption of
simple uniform
hashing:
•
Each key
k
K
of keys is equally likely to be
hashed to any slot of table
T
, independent of where
other keys are hashed.
Let
n
be the number of keys in the table, and
let
m
be the number of slots.
Define the
load factor
of
T
to be
=
n
/
m
=
average number of keys per slot.

10
Search cost
Expected time to search for a record with
a given key
=
(1 +
)
.
Expected search time
=
(1)
if
=
O
(1)
,
or equivalently, if
n
=
O
(
m
)
.
apply hash
function and
access slot
search
the list

11
Choosing a hash function
The assumption of simple uniform hashing is hard to
guarantee, but several common techniques tend to
work well in practice as long as their deficiencies
can be avoided.
Desirata:
•
A good hash function should distribute the keys
uniformly into the slots of the table.
•
Regularity in the key distribution should not affect
this uniformity.

12
Division method
Assume all keys are integers, and define
h
(
k
) =
k
mod
m
.
Deficiency:
Don’t pick an
m
that has a small
divisor
d
. A preponderance of keys that are
congruent modulo
d
can adversely affect
uniformity.

13
Division method
Extreme deficiency:
If
m
= 2
r
, then the hash
doesn’t even depend on all the bits of
k
:
•
If
k
= 1011000111011010
2
and
r
= 6
, then
h
(
k
) = 011010
2
.

#### You've reached the end of your free preview.

Want to read all 54 pages?

- Fall '05
- RudolfFleischer