Notes05 - CS 245: Database System Principles Notes 5:...

Info iconThis preview shows pages 1–5. Sign up to view the full content.

View Full Document Right Arrow Icon
CS 245 Notes 5 1 CS 245: Database System Principles Steven Whang Notes 5: Hashing and More CS 245 Notes 5 2 key h(key) Hashing <key> . . . Buckets (typically 1 disk block) CS 245 Notes 5 3 . . . Two alternatives records . . . (1) key h(key) CS 245 Notes 5 4 (2) key h(key) Index record key 1 Two alternatives • Alt (2) for “secondary” search key CS 245 Notes 5 5 Example hash function • Key = ‘x 1 x 2 … x n n byte character string • Have b buckets • h: add x 1 + x 2 + ….. x n compute sum modulo CS 245 Notes 5 6 This may not be best function … Read Knuth Vol. 3 if you really need to select a good function. Good hash Expected number of function: keys/bucket is the same for all buckets
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
CS 245 Notes 5 7 Within a bucket: • Do we keep keys sorted? • Yes, if CPU time critical & Inserts/Deletes not too frequent CS 245 Notes 5 8 Next: example to illustrate inserts, overflows, deletes h(K) CS 245 Notes 5 9 EXAMPLE 2 records/bucket INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = 0 0 1 2 3 d a c b h(e) = 1 e CS 245 Notes 5 10 0 1 2 3 a b c e d EXAMPLE: deletion Delete: e f f g maybe move “g” up c d CS 245 Notes 5 11 Rule of thumb: • Try to keep space utilization between 50% and 80% Utilization = # keys used total # keys that fit • If < 50%, wasting space • If > 80%, overflows significant depends on how good hash function is & on # keys/bucket CS 245 Notes 5 12 How do we cope with growth? • Overflows and reorganizations • Dynamic hashing • Extensible • Linear
Background image of page 2
CS 245 Notes 5 13 Extensible hashing: two ideas (a) Use i of b bits output by hash function h(K) use grows over time…. 00110101 CS 245 Notes 5 14 (b) Use directory h(K)[ i ] to bucket . . . . . . CS 245 Notes 5 15 Example: h(k) is 4 bits; 2 keys/bucket = 1 1 1 0001 1001 1100 Insert 1010 1 1100 1010 New directory 2 00 01 10 11 = 2 2 CS 245 Notes 5 16 1 0001 2 1001 1010 2 1100 Insert: 0111 0000 00 01 10 11 2 i = Example continued 0111 0000 0111 0001 2 2 CS 245 Notes 5 17 00 01 10 11 2 = 2 1001 1010 2 1100 2 0111 2 0000 0001 Insert: 1001 Example continued 1001 1001 1010 000 001 010 011 100 101 110 111 3 = 3 3 CS 245 Notes 5 18 Extensible hashing: deletion • No merging of blocks • Merge blocks and cut directory if possible (Reverse insert procedure)
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
CS 245 Notes 5 19
Background image of page 4
Image of page 5
This is the end of the preview. Sign up to access the rest of the document.

This document was uploaded on 03/08/2011.

Page1 / 11

Notes05 - CS 245: Database System Principles Notes 5:...

This preview shows document pages 1 - 5. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online