Hashing - Hashing Page 1 of 5 HASHING Contents Introduction...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
H ASHING Contents z Introduction { Test Yourself #1 z Lookup, Insert, and Delete z Choosing the Hashtable Size z Choosing the Hash Function { Test Yourself #2 z Summary Introduction Recall that for a balanced tree (e.g., a 2-3 tree), the insert, lookup, and delete operations can be performed in time logarithmic in the number of values stored in the tree. Can we do better? Yes! We can use a technique called hashing that is logarithmic in the worst case, but has expected time O(1)! The basic idea is to store values (unique keys plus perhaps some associated data) in an array, computing each key's position in the array as a function of its value. This takes advantage of the fact that we can do a subscripting operation on an array in constant time. For example, suppose we want to store information about 50 employees, each of whom has a unique ID number. The ID numbers start with 100, and the highest ID number for any employee is 200 (i.e., there are a total of 101 possible ID numbers, all in the range 100 to 200). In this case, we can use an array of size 101, and we can store the information about the employee with ID number k in array[k- 100]. The insert, lookup, and delete operations will all be constant time. The only disadvantage is that some of the array entries will be empty (i.e., will contain null to indicate that no information is stored there), so some space will be wasted. Before we go on, here is some terminology: z The array is called the hashtable . z We will refer to the size of the array as TABLE_SIZE . z The function that maps a key value to the array index in which that key (and its associated data) will be stored is called the hash function . For this example, the key is the employee's ID number, and the hash function is: hash(k) = k - 100. Now, think about the problem of storing information about students based on their ID numbers. The problem is that student ID numbers are in a large range (student IDs have 10 digits, so there are 10 10 possible values). In this case, it is probably not practical to use an array with one place for each possible value. The solution is to use a "reasonable" sized array (more about this later), and to use a hash function that maps ID numbers to places in that array. If we can find a hash function that, given only a small Page 1 of 5 Hashing 2008/3/27 http://pages.cs.wisc.edu/~cs367-1/topics/Hashing/
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
set of ID numbers, is likely to map each ID number to a different place in the array, then we still have fast lookup, insert, and delete operations, without requiring a huge array! Here's an example: Suppose we decide to use an array of size 10, and we use the hash function: hash(ID) = sum of digits in ID mod 10 Here are some ID numbers, the sums of their digits, and the array indexes to which they hash: Note that we have a problem: both the second and the fourth ID have the same hash value (8). This is called a collision . How can we store both keys in array[8]? The answer is that we can make the array an array of linked lists, or an array of search trees, so that in case of collisions (if multiple keys
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 03/27/2008 for the course CS 367 taught by Professor Marvinsolomon during the Spring '08 term at Wisconsin.

Page1 / 5

Hashing - Hashing Page 1 of 5 HASHING Contents Introduction...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online