lecture12hashing

Hashing Art Covert 11/03/2005 Hash Functions – key to memory mapping; compute the memory address from the key [without comparing keys] Collision resolution

hashing To look up record with key KEY Compute h(KEY) = ADDRESS Check ADDRESS for KEY O(1) hash Function H(KEY) KEY ADDRESS
Motivation So far we have seen structures that can support the following options Insert Delete Search Can we design a data structure with constant time ops? Idea: Given a key, if we compute its address (index) in memory, then search can be done in O(1) by direct access. Can we define the address of a key as a function of the key? This is the idea of hashing.

Motivation Say we have a group of students in a computer science department with unique id's Moe:10001 Larry:10002 Curly: 10003 Is there some way we can store these entry's in an array of fixed size so that we can find them in O(1) time?
Inserting – Hash Table Use a simple 'hash' function h(c) = stuid mod ARRAY_SIZE h(<Moe, 10001>) = 10001 % 10 = 1 h(<Larry, 10002>) = 10002 % 10 = 2 h(<Currly, 10003>) = 10003 % 10 = 3 Now when we search for Moe, we can find him simply by using this hash function 0 1 Moe 2 Larry 3 Currly 4 5 6 7 8 9

Inserting Insert a student Art with id 10008 h(<Art, 10008>) = 8 0 1 Moe 2 Larry 3 Currly 4 5 6 7 8 Art 9
Collisions: what if H(key1) = H(key2)? Insert a student Art with id 10008 h(<Art, 10008>) = 8 Insert a student Shemp with id 10023 h(<Shemp, 10023>) = 3 This collides with Currly!!! What to do? 0 1 Moe 2 Larry 3 Currly 4 5 6 7 8 Art 9

General possibilities Exactly 1 key has hash(KEY) = ADDRESS No key has hash(KEY) = ADDRESS Multiple keys have hash(KEY) = ADDRESS == compare and done Failed search IF ADDRESS specifies a bin, then search that bin for KEY bin can be list, array, disk track
A good Hash function satisfies (approximately) the assumption of simple uniform hashing each key is equally likely to hash to any of the m memory bins. It must be

