Hashing - Hashing: We have seen various data structures...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
Hashing: We have seen various data structures (e.g., binary trees, AVL trees, splay trees, skip lists) that can perform the dictionary operations insert(), delete() and find(). We know that these data structures provide O (log n ) time access. It is unreasonable to ask any sort of tree- based structure to do better than this, since to store n keys in a binary tree requires at least (log n ) height. Thus one is inclined to think that it is impossible to do better. Remarkably, there is a better method, at least if one is willing to consider expected case rather than worst case performance. Hashing is a method that performs all the dictionary operations in O (1) (i.e. constant) expected time, under some assumptions about the hashing function being used. Hashing is considered so good, that in contexts where just these operations are being performed, hashing is the method of choice (e.g. symbol tables for compilers are almost always implemented using hashing). Tree-based data structures are generally prefered in the following situations: _ When storing data on secondary storage (e.g. using B-trees), _ When knowledge of the relative order of elements is important (e.g. if a find() fails, I may want to know the nearest key. Hashing cannot help us with this.) The idea behind hashing is very simple. We have a table containing m entries. We select a hash function h ( x ), which is an easily computable function that maps a key x to a \virtually random" index in the range [0. .m-1]. We will then attempt to store the key in index h ( x ) in the table. Of course, it may be that di_erent keys are mapped to the same location. This is called a collision . We need to consider how collisions are to be handled, but observe that if the hashing function does a good job of scattering the keys around the table, then the chances of a collision occuring at any index of the table are about the same. As long as the table size is at least as large as the number of keys, then we would expect that the number of keys that are map to the same cell should be small. Hashing is quite a versatile technique. One way to think about hashing is as a means of implementing a content-addressable array . We know that arrays can be addressed by an integer index. But it is often convenient to have a look-up table in which the elements are addressed by a key value which may be of any discrete type, strings for example or integers that are over such a large range of values that devising an array of this size would be impractical. Note that hashing is not usually used for continuous data, such as floating point values, because similar keys 3 : 14159 and 3 : 14158 may be mapped to entirely di_erent locations. There are two important issues that need to be addressed in the design of any hashing system.
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 10/31/2010 for the course EE 423 taught by Professor Mitin during the Spring '10 term at SUNY Buffalo.

Page1 / 7

Hashing - Hashing: We have seen various data structures...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online