This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Algorithms Lecture 12: Hash Tables [ Fa’10 ] Calvin: There! I finished our secret code! Hobbes: Let’s see. Calvin: I assigned each letter a totally random number, so the code will be hard to crack. For letter “A”, you write 3,004,572,688. “B” is 28,731,569½. Hobbes: That’s a good code all right. Calvin: Now we just commit this to memory. Calvin: Did you finish your map of our neighborhood? Hoobes: Not yet. How many bricks does the front walk have? — Bill Watterson, “Calvin and Hobbes” (August 23, 1990) 12 Hash Tables 12.1 Introduction A hash table is a data structure for storing a set of items, so that we can quickly determine whether an item is or is not in the set. The basic idea is to pick a hash function h that maps every possible item x to a small integer h ( x ) . Then we store x in slot h ( x ) in an array. The array is the hash table. Let’s be a little more specific. We want to store a set of n items. Each item is an element of some finite 1 set U called the universe ; we use u to denote the size of the universe, which is just the number of items in U . A hash table is an array T [ 1 .. m ] , where m is another positive integer, which we call the table size . Typically, m is much smaller than u . A hash function is any function of the form h : U → { 0,1,..., m 1 } , mapping each possible item in U to a slot in the hash table. We say that an item x hashes to the slot T [ h ( x )] . Of course, if u = m , then we can always just use the trivial hash function h ( x ) = x . In other words, use the item itself as the index into the table. This is called a direct access table , or more commonly, an array . In most applications, though, the universe of possible keys is orders of magnitude too large for this approach to be practical. Even when it is possible to allocate enough memory, we usually need to store only a small fraction of the universe. Rather than wasting lots of space, we should make m roughly equal to n , the number of items in the set we want to maintain. What we’d like is for every item in our set to hash to a different position in the array. Unfortunately, unless m = u , this is too much to hope for, so we have to deal with collisions . We say that two items x and y collide if the have the same hash value: h ( x ) = h ( y ) . Since we obviously can’t store two items in the same slot of an array, we need to describe some methods for resolving collisions. The two most common methods are called chaining and open addressing . 12.2 Chaining In a chained hash table, each entry T [ i ] is not just a single item, but rather (a pointer to) a linked list of all the items that hash to T [ i ] . Let ‘ ( x ) denote the length of the list T [ h ( x )] . To see if an item x is in the hash table, we scan the entire list T [ h ( x )] . The worstcase time required to search for x is O ( 1 ) to 1 This finiteness assumption is necessary for several of the technical details to work out, but can be ignored in practice....
View
Full
Document
 Spring '11
 Smith
 hash function, uniform hashing assumption

Click to edit the document details