This preview shows pages 1–3. Sign up to view the full content.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Robert Sedgewick and Kevin Wayne • Copyright © 2006 • http://www.Princeton.EDU/~cos226 4.2 Hashing 2 Optimize Judiciously Reference : Effective Java by Joshua Bloch. More computing sins are committed in the name of efficiency (without necessarily achieving it) than for any other single reason - including blind stupidity.- William A. Wulf We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.- Donald E. Knuth We follow two rules in the matter of optimization: Rule 1: Don't do it. Rule 2 (for experts only). Don't do it yet - that is, not until you have a perfectly clear and unoptimized solution.- M. A. Jackson 3 Hashing: Basic Plan Save items in a key-indexed table . Index is a function of the key. Hash function. Method for computing table index from key. Collision resolution strategy. Algorithm and data structure to handle two keys that hash to the same index. Classic space-time tradeoff. ! No space limitation: trivial hash function with key as address. ! No time limitation: trivial collision resolution with sequential search. ! Limitations on both time and space: hashing (the real world) . 4 Choosing a Good Hash Function Idealistic goal: scramble the keys uniformly. ! Efficiently computable. ! Each table position equally likely for each key. Ex: Social Security numbers. ! Bad: first three digits. ! Better: last three digits. Ex: date of birth. ! Bad: birth year. ! Better: birthday. Ex: phone numbers. ! Bad: first three digits. ! Better: last three digits. 573 = California, 574 = Alaska assigned in chronological order within a given geographic region thoroughly researched problem 5 Hash Codes and Hash Functions Hash code. A 32-bit int (between -2147483648 and 2147483647 ). Hash function. An int between and M-1 . Bug. Don't use (code % M) as array index. Subtle bug. Don't use (Math.abs(code) % M) as array index. OK. Safe to use ((code & 0x7fffffff) % M) as array index. String s = "call"; int code = s.hashCode(); int hash = code % M; 3045982 7121 8191 6 Implementing Hash Code in Java API for hashCode() . ! Return an int . ! If x.equals(y) then x and y must have the same hash code. ! Repeated calls to x.hashCode() must return the same value. Default implementation. Memory address of x . Customized implementations. String, URL, Integer, Date. User-defined implementaitons. Tricky to get right, black art. inherited from Object 7 Designing a Good Hash Code Java 1.5 string library. ! Equivalent to h = 31 L-1 · s + … + 31 2 · s L-3 + 31 · s L-2 + s L-1 . ! Horner's method to hash string of length L: O(L). Ex. public int hashCode () { int hash = ; for ( int i = ; i < length (); i ++) hash = ( 31 * hash ) + s [ i ]; return hash ; } String s = "call"; int code = s.hashCode(); 3045982 = 99·31 3 + 97·31 2 + 108·31 1 + 108·31 ith character of s Unicode char … … 'a' 97 'b' 98 'c' 99 … … 8 Designing a Bad Hash Code Java 1.1 string library....
View Full Document