Lecture4 - ECS 165B: Database System Implementa6on Lecture...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ECS 165B: Database System Implementa6on Lecture 4 UC Davis April 5, 2010 Acknowledgements: based on slides by Raghu Ramakrishnan and Johannes Gehrke. Presenta6on of RCS based on slides by David Matuszek. Class Agenda Last 6me: Quick look at DavisDB Record Manager component Start review of indexing Today: Project-related logis6cs: subversion, code submission Finish indexing review; tree-structured indices in depth Reading: Chapter 10 of Ramarkrishnan & Gehkre (or Chapter 12 of Silberschatz et al.) Announcements Subversion repositories have been created, but with incorrect team assignments (Excel snafu)... Will be fixed soon... Code submission procedure has been finalized Textbook on reserve in Shields star6ng tomorrow Office hours tomorrow @11am (TJ), Wednesday @11am (Mingmin) Why use revision control systems? Scenario 1: Your program is working You change "just one thing" Your program breaks You change it back Your program is s6ll broken why? Has this ever happened to you? Why use revision control systems (2)? Your program worked well enough yesterday You made a lot of improvements last night... but you haven't gogen them to work yet You need to turn in your program now Has this ever happened to you? Revision control for teams Scenario: You change one part of a program -- it works Your co-worker changes another part -- it works You put them together -- it doesn't work Some change in one part must have broken something in the other part What were all the changes? 6 Revision Control for Teams (2) Scenario: You make a number of improvements to a class Your co-worker makes a number of different improvements to the same class How can you merge these changes? 7 Revision control systems A revision control system (aka version control system) does these things: Keeps mul6ple (older and newer) versions of source code, headers, etc Requests comments regarding every change Displays differences between versions Detect/resolve conflicts Many systems out there: sccs, rcs, cvs, Visual SourceSafe, svn Most popular in the past: cvs Most popular nowadays: svn 8 Subversion commands svn checkout (aka svn co) check out code from repository svn add add a new file/directory to the repository svn delete delete a file/directory from the repository svn commit commit local changes to repository svn diff view differences wrt current or old version svn status see local changes svn info get info about repository svn help list all commands See h*p://subversion.3gris.org Graphical front-ends: TortoiseSVN (Windows), RapidSVN (crossplatform), Subclipse (eclipse plug-in) Visual diffs, easier browsing of history, ... 9 Logis6cs: Repository Access Follow direc6ons on hgp://www.cs.ucdavis.edu/~green/courses/ecs165b/project.html [green@pc12 ~]$ svn co file:///home/cs165b/CSIF-Proj/ cs165b-0/svn/trunk/DavisDB! A DavisDB/RecordFileHandle.h! A DavisDB/FileHandle.h! A DavisDB/PageFileManager.cpp! A DavisDB/RecordManager.cpp! ...! A DavisDB/submit.sh! A DavisDB/CMakeLists.txt! A DavisDB/writeup.txt! A DavisDB/Common.h! [green@pc12 ~]$ ! Logis6cs: Repository Access Must tell repository about new files! [green@pc12 ~/DavisDB]$ svn add Foo.cpp Foo.h! A Foo.cpp! A Foo.h! [green@pc12 ~/DavisDB]$ svn commit -m ""! Adding Foo.cpp! Adding Foo.h! Transmitting file data ..! Committed revision 84.! To get changes from your teammate: [chenmi@pc10 ~/DavisDB]$ svn update! Logis6cs: Submitng Your Homework [green@pc12 DavisDB]$ ./submit.sh Usage: submit.sh <hw#> where <hw#> is a number in the range [1,5] Submits your project component by tagging the current version of your subversion repository as the submiged version. It may be executed mul3ple 3mes for the same <hw#>. The most recently submiged version is the one that will be used for grading (and its 6mestamp will determine any late penal6es). This script must be run from your subversion DavisDB directory. A{er submitng, the script will also run a test build of your project, by checking out the submiged version into a temporary directory and execu6ng "cmake ." then "make". Logis6cs: Submitng Your Homework (2) [green@pc12 DavisDB]$ ./submit.sh 1! Submitting HW1...! Submission successful.! Running a test build on the submitted code...! Test build successful.! Indexing Review Sparse vs. Dense Indices Not described in R&G; see Silberschatz et al Dense index: has search key value and data entry for every record in the file fast record lookup Sparse index: has search key values for only some records in the file (but all data entries) less space, reduced maintenance costs essen6ally, Alterna6ve 1 Cost of O Cost of Operations pera6ons (a) Scan (1) Heap BD (b) Equality (c ) Range (d) Insert (e) Delete Search +D (2) Sorted BD Dlog 2B Dlog 2 B + Search Search # matches + BD +BD (3) Clustered 1.5BD Dlog F 1.5B Dlog F 1.5B Search Search + # matches + D +D (4) Unclustered BD(R+0.15) D(1 + Dlog F 0.15B D(3 + Search Tree index log F 0.15B) + # matches log F 0.15B) + 2D (5) Unclustered BD(R+0.1 2D BD 4D Search Hash index 25) + 2D 0.5BD BD 2D B = # data pages; R = # of records per page; D = (average) 6me to * Several assumptions underlie these (rough) estimates! read or write a disk pages High-order bit: no one file organiza3on is uniformly superior in all situa3ons You will need to memorize this en6re matrix for Quiz #1 Just kidding Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 18 Tree-Structured Indices reading: Chapter 10 of Ramakrishnan and Gehrke / Chapter 12 of Silberschatz et al Introduc6on As for any index, 3 alterna6ves for data entries k* Data record with key value k <k, rid of data record with search key value k> <k, list of rids of data records with search key k> Choice is orthogonal to the indexing technique used to locate data entries k* Tree-structured indexing techniques support both range searches and equality searches ISAM ("Indexed sequenBal access method"): sta6c structure; B +-tree: dynamic, adjusts gracefully under inser6ons and dele6ons Range Range Searches Searches ``Find all students with .0" "Find all students with gpa > 3gpa > 3.0'' If dIf data sorted file, do binary search to find first such student, then ata is in is in sorted file, do binary search to find first such student, scan to find others then scan to find others. Cost of binary search can igh Cost of binary search can be quite hbe quite high. Simple idea: Create an le Simple idea: create an "index" fi`index' file. k1 k2 kN Index File Page 1 Page 2 Page 3 Page N Data File * can do binarysearch on (smaller) index file! file! Can do binary search on (smaller) index Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 3 ISAM P 0 index entry K P K 2 P K Pm 1 1 2 m Index file may still be quite large. But we can apply the idea repeatedly! Non-leaf Pages Leaf Pages Overflow page Primary pages 4 * Leaf pages contain data entries. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke Comments on ISAM Data Pages File creation: Leaf (data) pages allocated sequentially, sorted by search key; then index pages allocated, then space for overflow pages. Overflow pages Index entries: <search key value, page id>; they `direct' search for data entries, which are in leaf pages. Search: Start at root; use key comparisons to go to leaf. Cost ! log F N ; F = # entries/index pg, N = # leaf pgs Insert: Find leaf data entry belongs to, and put it there. Delete: Find and remove from leaf; if empty overflow page, de-allocate. * Index Pages Static tree structure: inserts/deletes affect only leaf pages. 5 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke Example ISAM Tree Each node can hold 2 entries; no need for `next-leaf-page' pointers. (Why?) Root 40 20 33 51 63 10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97* Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 6 After Inserting 23*, 48*, 41*, 42* ... Root Index Pages 20 33 51 63 40 Primary Leaf Pages 10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97* Overflow Pages 23* 48* 41* 42* Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 7 ... Then Deleting 42*, 51*, 97* Root 40 20 33 51 63 10* 15* 20* 27* 33* 37* 40* 46* 55* 63* 23* 48* 41* * Note that 51* appears in index levels, but not in leaf! Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 8 B+ Tree: Most Widely Used Index Insert/delete at log F N cost; keep tree heightbalanced. (F = fanout, N = # leaf pages) Minimum 50% occupancy (except for root). Each node contains d <= m <= 2d entries. The parameter d is called the order of the tree. Supports equality and range-searches efficiently. Index Entries (Direct search) Data Entries ("Sequence set") Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 9 Example B+ Tree Search begins at root, and key comparisons direct it to a leaf (as in ISAM). Search for 5*, 15*, all data entries >= 24* ... Root 13 17 24 30 2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* * Based on the search for 15*, we know it is not in the tree! Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 10 B+ Trees in Practice Typical order: 100. Typical fill-factor: 67%. average fanout = 133 Typical capacities: Height 4: 1334 = 312,900,700 records Height 3: 1333 = 2,352,637 records Can often hold top levels in buffer pool: Level 1 = 1 page = 8 Kbytes Level 2 = 133 pages = 1 Mbyte Level 3 = 17,689 pages = 133 MBytes Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 11 ...
View Full Document

Ask a homework question - tutors are online