Lecture 15 Indexing 2

Lecture 15 Indexing 2 - Hash-based indexes Hash-based...

Info iconThis preview shows pages 1–6. Sign up to view the full content.

View Full Document Right Arrow Icon
19 Hash-based indexes v Hash-based indexes are best for equality selections . Cannot support range searches. v Static and dynamic hashing techniques exist; trade-offs for dynamic data
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
20 Hash-Based Indexes v Good for equality selections . v Index is a collection of buckets. § Bucket = primary page plus zero or more overflow pages . § Buckets contain data entries. v Hashing function h : h ( k ) = bucket of data entries of the search key value k . § No need for “index entries” in this scheme. h 1 2 3 N-1 Search key value k
Background image of page 2
21 Static Hashing v h ( k ) mod N = bucket to which data entry with key k belongs . k1 k2 can lead to the same bucket. v Static : # buckets (N) fixed § main pages allocated sequentially, never de-allocated; § overflow pages if needed. h(key) mod N h key Primary bucket pages Overflow pages 2 0 N-1
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
22 Static Hashing (Contd.) v Hash function works on search key field of record r. Must distribute values over range 0 . .. N-1. § h ( key ) mod N = (a * key + b) mod N usually works well. § a and b are constants; lots known about how to tune h . v Buckets contain data entries . v Long overflow chains can develop and degrade performance. § Extendible Hashing : a dynamic technique to fix this problem.
Background image of page 4
23 Extendible Hashing v Situation: Bucket (primary page) becomes full. Why not re-organize file by doubling # of buckets? § Reading and writing all pages is expensive! § Idea : Use directory of pointers to buckets , double # of buckets by (1) doubling the directory, (2) splitting just the bucket that overflowed! § Directory much smaller than file, so doubling it is much cheaper. Only one page of data entries is split. No overflow page !
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 6
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 03/27/2012 for the course CLASSICS 122 taught by Professor Smith during the Spring '12 term at UMass (Amherst).

Page1 / 17

Lecture 15 Indexing 2 - Hash-based indexes Hash-based...

This preview shows document pages 1 - 6. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online