This preview shows page 1. Sign up to view the full content.
Unformatted text preview: ECS 165B: Database System Implementa6on Lecture 10 UC Davis April 19, 2010 Acknowledgements: por6ons based on slides by Raghu Ramakrishnan and Johannes Gehrke. Class Agenda Last 6me: A taste of database theory, Part 2: containment, equivalence, and minimiza6on of conjunc6ve queries Today: Overview of DavisDB project, Part 2: indexing Short lecture Reading Chapter 10 of Ramakrishnan and Gehrke (or Chapter 12 of Silberschatz et al) Announcements Thanks for your hard work on Part 1!!! Hopefully, it will get easier from here. Part 2 of project out tonight, due Sunday, 5/2 @ 11:59pm Quiz #1 in class next Wednesday DavisDB, Part 2: Indexing Second part of project: indexing component Provides classes and methods for managing persis6ng indices on data in unordered heap files (i.e., record files) Like RecordManager, uses page files underneath Sits side-by-side with RecordManager on top of PageFileManager Indexing structure we'll use: B+ tree (with some simplifica6ons) Recall: B+ Trees We already covered B+ trees in depth in Lectures 4 and 6 Root
17 Example B+ Tree After Inserting 8* 5 13 24 30 2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* 3 alterna6ves for storing records (records themselves, ids, or lists of ids) v Notice that root was split, leading to increase in Inser6ons: need to split nodes when they become full height. Dele6ons: example, erge nodes when they become less than half- v In this need to mwe can avoid split by re-distributing full entries; however, this is usually not done in practice.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 14 B+ Trees and Page Files Example B+ Tree After Inserting 8*
pageNo Root a page 17 pageNo 5 13 a page 24 30 pageNo 2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* a page Will also need root was split, leading ito Record Min height. v Notice that a header page, just as n increase anager (with pageNo of root node, perhaps some sta6s6cs, etc) v In this example, we can avoid split by re-distributing entries; however, this is usually not done in practice. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 14 Simplifica6ons Only need to support-agribute index (recall that B+ tree may in general index several agributes) Dele6ons: you may use tombstones instead of merging/redistribu6on When an entry is deleted, it is replaced by a special marker indica6ng an empty slot (which may be reused later) Tree nodes are never deleted or merged Extra credit for implemen6ng full textbook dele6on algorithm (with merging/redistribu6on) --- tricky! May help reduce I/Os when index is used subsequently for answering queries No special support for bulk loading Which of Three Alterna6ves? Alterna6ve 1 (keep record itself in tree): don't do this; you should keep record ids, not records Alterna6ve 2 (<key,rid>) or Alterna6ve 3 (<key, list of rids>): either is OK; think about the tradeoffs before coding Note, cannot assume all rids for a given key will fit on one page Handling Duplicates If using Alterna6ve 2 (<key, rid>), may have duplicate key entries in internal nodes If using Alterna6ve 3 (<key, list of rids>),
have to worry about variable-length list of rids and page overflow Allowed simplica6on (described in R&G): include record id in the key no duplicates, by construc6on! hence <key, rid> becomes the key downside: index uses more space (=> more I/Os) upside: dele6ons are faster (don't have to scan the duplicates) How it All Fits Together Three main classes: IndexManager, IndexHandle, and IndexScan IndexManager: create/delete/open/close B+ tree indices IndexHandle: insert/delete records IndexScan: perform comparison-based scans Headers and skeleton classes will be added to your repositories tonight Coding Tips "Use the debugger, not prins!" true in general; but every rule has an excep6on You may find it helpful to write an IndexHandle::dumpTree() method to output a human-readable picture of your B+ tree For dele6ons, can assume at most one scan is open; dele6ons can occur during the scan, of a certain form: you need to allow dele6ons of some/all entries sa6sfying a condi6on Get inser6ons working first, then scans, then dele6ons, then dele6ons during scans Keep it simple Let's Go to the Doxygen Docs! Already online at the usual place: hgp://www.cs.ucdavis.edu/~green/courses/ecs165b/docs ...
View Full Document
This note was uploaded on 04/29/2010 for the course ECS 152 taught by Professor Mr. during the Spring '10 term at University of Great Falls.
- Spring '10