7_hashtables

7_hashtables - CS161 - Hashtables David Kauchak B-Tree demo...

Info iconThis preview shows pages 1–4. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: CS161 - Hashtables David Kauchak B-Tree demo - http://people.ksp.sk/ kuko/bak/index.html What data structures have we seen so far: arrays - get and set particular indices in constant time linked list - insert and delete in constant time stack - LIFO queue - FIFO heap - max/min in logarithmic time BST - search in logarithmic time B-Trees - search on disk in logarithmic disk accesses Hashtables: constant time insertion and search (and deletion in some cases), i.e. Search ( S,x ), Insert ( S,x ) and Delete ( S,x ) are (1) Applications Is x S ? I use them all the time compilers databases search engines? any time youd like to store and retrieve non-squential data sparse data: save memory over an array Key/Data pair - For anything were trying to store in a hashtable we need a key that is a number. This raises two related issues: 1 1. Are we using the entire data as the key or just some representative data? For example, a bank may have full information about a client (name, address, phone number, birth data, social), but only use one piece of data to index the user, like the name or social security number. 2. The key into a hash table needs to be a number. If the data were storing is numerical, then were fine. Other times, we need to do a transformation, e.g. strings. Why do we need hashtables? Why not just use arrays? Picture of space of universe of keys and array It is very common that the possible set of values, denoted U , is very large. This can be challenging because 1. If we want to directly index the items, the size of the array may be prohibitively large, e.g. strings. 2. Even if we could generate an array large enought, the data may be very sparse, so directly indexing the items would be very wasteful. Let n be the number of keys and m be the size of the array, we defined the load factor as = n/m If is small, then the amount of unused, wasted space is large Similarly, if the range of possible keys is large, then m must be large Hash function - A hash function is a function that maps the universe of keys U to the slots in the hash table, i.e. h : U , 1 ,...,m- 1 Collisions A collision occurs when h ( x ) = h ( y ), but x negationslash = y We can try to minimize this by having a good hash function, but because | U | > m collisions are inevitable 2 Collision resolution by chaining The hashtable consists of an array of linked lists If a collision occurs, add the element into the linked list at that location. Specifically, if two elements x and y are inserted where h ( x ) = h ( y ), then the hashtable entry...
View Full Document

This document was uploaded on 05/25/2011.

Page1 / 9

7_hashtables - CS161 - Hashtables David Kauchak B-Tree demo...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online