Lecture2 - ECS 165B: Database System Implementa6on Lecture...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ECS 165B: Database System Implementa6on Lecture 2 UC Davis March 31, 2010 Acknowledgements: design of course project for this class borrowed from CS 346 @ Stanford's RedBase project, developed by Jennifer Widom, and used with permission. Slides based on earlier ones by Raghu Ramakrishnan, Johannes Gehrke, Jennifer Widom, Bertram Ludaescher, and Michael Gertz. Class Agenda Last 6me: Logis6cs and course overview Introduc6on to the DavisDB project Start file and buffer management review (Chapter 9 of textbook) Today: Finish file and buffer management review File and buffer management in DavisDB Reading: Chapter 9 of Ramarkrishnan & Gehkre (or Chapter 11 of Silberschatz et al.) Announcements Teaching assistant: Mingmin Chen (michen@ucdavis.edu) Office hours: Wednesdays, 11:00-11:50am, 055 Kemper Hall Please send your team requests to Mingmin by email (or edit the online spreadsheet) by end of day today! - We will finalize teams and set up your subversion repositories tomorrow Project overview posted! hdp://www.cs.ucdavis.edu/~green/courses/ecs165b/project.html Project Part I will be posted to web page tomorrow, due 4/11 File and Buffer Management, Part II Disk Space Management Lowest layer of DBMS sogware manages space on disk Higher levels call upon this layer to: allocate / de-allocate a page read / write a page Request for a sequence of pages must be sa6sfied by alloca6ng the pages sequen6ally on disk! Higher levels don't need to know how this is done, or how free space is managed Simplifying assump6on in DavisDB: no requests for sequences; pages are accessed one at a 6me Part of student extension? (Part 5 of project) Buffer Management n a DBMS Buffer Management in ia DBMS Page Requests from Higher Levels BUFFER POOL disk page free frame MAIN MEMORY DISK DB choice of frame dictated by replacement policy Data must be in RAM for DBMS to operate on it! Data must be in RAM for DBMS to operate on it! Table of <frame# , pageid> pairs is maintained. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke maintained Table of <frameNo, pageNo> pairs is 12 When a Page is Requested... If requested page is not in pool: Choose a frame for replacement If frame is dirty, write it to disk Read requested page into chosen frame Pin the page and return its address If requests can be predicted (e.g., sequen6al scans), pages can be pre-fetched several pages at a 6me Again, opportunity ignored in DavisDB for simplicity More on Buffer Management Requestor of page must unpin it, and indicate whether page has been modified Dirty bit is used for this Page in pool may be requested many 6mes A pin count is used. A page is a candidate for replacement iff its pin count = 0 Concurrency control and recovery may entail addi6onal I/O when a frame is chosen for replacement. (Write-Ahead Log protocol; more later...) No concurrency control or recovery in DavisDB (good topic for student extension!) Buffer Replacement Policy Frame is chosen for replacement by a replacement policy: Least-recently-used (LRU), Clock, MRU, etc DavisDB uses LRU Policy can have big impact on # of I/O's; depends on the access pa<ern Sequen>al flooding: nasty situa6on caused by LRU + repeated page scans # buffer frames < # pages in file means each page request causes an I/O. MRU much beder in this situa6on (but not in all situa6ons, of course). DBMS vs. OS File System OS does disk space and buffer management; why not let the OS manage these tasks? Differences in OS support: portability issues Some limita6ons, e.g., files can't span disks Buffer management in DBMS requires ability to: pin a page in buffer pool, force a page to disk (important for implemen6ng concurrency control and recovery) adjust replacement policy, and pre-fetch pages based on access paderns in typical DB opera6ons Record Formats: Fixed Length Record Formats: Fixed-Length F1 L1 F2 L2 F3 L3 F4 L4 Base address (B) Address = B+L1+L2 Informa6on about field types same for all records in a file; stored in system catalogs Information about field types same for all Finding i'th field requires storedecord records in a file; scan of r in system catalogs. DavisDB uses fixed-length records Finding i'th field requires scan of record. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke Record Formats: Variable Length Two alterna6ve formats formatsis fixed): is fixed): Two alternative (# fields (# fields F1 F2 F3 F4 Record Formats: Variable-Length 4 Field Count $ $ $ $ Fields Delimited by Special Symbols F1 F2 F3 F4 Array of Field Offsets Second offers direct access to i'th fismall directorytorage of eld, efficient s overhead. of nulls (special don't know value); nulls (special don't know value); small directory overhead Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke * Second offers direct access to i'th field, efficient storage 18 Page Formats: Fixed Length Records Page Formats: Fixed-Length Records Slot 1 Slot 2 Slot 1 Slot 2 Free Space Slot N Slot M N PACKED number of records 1 . . . 0 1 1M M ... 3 2 1 UNPACKED, BITMAP number of slots ... Slot N ... Record id = <page id, slot #>. In first alterna6ve, moving alternative, moving records for free ecord records for free space management changes rspace id; may management not be acceptable. changes rid; may not be acceptable. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 19 * Record id = <page id, slot # >. In first Page Formats: Variable-Length Records Page Formats: Variable Length Records Rid = (i,N) Rid = (i,2) Rid = (i,1) Page i 20 N ... 16 2 24 N 1 # slots SLOT DIRECTORY Pointer to start of free space Can move records on page without changing rid; Can move records on fixed-length records too. so, attractive for page without changing record id; so, * Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke adrac6ve for fixed-length records too! 20 Files of Records Page or block is OK when doing I/O, but higher levels of DBMS operate on records, and files of records. FILE: a collec6on of pages, each containing a collec6on of records. Must support: insert/delete/modify record read a par6cular record (specified using record id) scan all records (possibly with some condi6ons on the records to be retrieved) Unordered (Heap) Files Simplest file structure contains records in no par6cular order As file grows and shrinks, disk pages are allocated and de- allocated To support record-level opera6ons, we must: keep track of the pages in a file keep track of free space on pages keep track of the records on a page There are many alterna6ves for keeping track of this Heap Heap File Implemented as a LList File Implemented as a ist Data Page Header Page Data Page Data Page Data Page Pages with Free Space Data Page Data Page Full Pages The header page heap file name must name must The header page id and id and Heap file be stored someplace someplace. be stored Each page contains two "pointers" (page ids) plus data Each page contains 2 `pointers' plus data. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 23 Heap Heap Using a Page Directory File File Using a Page Directory Header Page Data Page 1 Data Page 2 DIRECTORY Data Page N The entry for a page can include the number number The entry for a page can include the of free bytes on the pfree bytes on the page. of age The directory is a cis a collection of lpages; linked The directory ollec6on of pages; inked list implementa6on is just one ajust one alternative. list implementation is lterna6ve Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke Much smaller than linked list of all heap fiHF ages! Much smaller than linked list of all le p pages! 24 System Catalogs For each index: structure (e.g., B+-tree) and search key fields For each rela6on name, file name, file structure (e.g., heap file) adribute name and type, for each adribute index name, for each index integrity constraints For each view: view name and defini6on Plus sta6s6cs, authoriza6on, buffer pool size, etc Catalogs are themselves stored as rela>ons! Attr_Cat(attr_name, rel_name, or Adributes Example: System Catalog Table f type, position) attr_name attr_name rel_name type position sid name login age gpa fid fname sal rel_name Attribute_Cat Attribute_Cat Attribute_Cat Attribute_Cat Students Students Students Students Students Faculty Faculty Faculty type string string string integer string string string integer real string string real position 1 2 3 4 1 2 3 4 5 1 2 3 26 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke Summary Disks provide cheap, non-vola6le, but slow storage Random access, but cost depends on loca6on of page on disk; important to arrange data sequen6ally to minimize seek delays DavisDB isn't very smart about this Buffer manager brings pages into RAM Page stays in RAM un6l released by requestor Wriden to disk when frame chosen for replacement (which is some 6me ager requestor releases the page) Choice of frame to replace based on replacement policy Tries to pre-fetch several pages at a 6me DavisDB doesn't worry about this Summary (Con6nued) File layer keeps track of pages in a file, and supports abstrac6on of a collec6on of records. Pages with free space iden6fied using linked list or directory structure (similar to how pages in file are kept track of) Indexes support efficient retrieval of records based on the values in some fields Catalog rela6ons store informa6on about rela6ons, indices, and views. (Informa>on that is common to all records in a given collec>on.) File and Buffer Management in DavisDB File and Buffer Management in DavisDB User results commands Command Parser (given) es ueri q Query Engine (4) index scans get metadata System Manager (3) read/write/scan records indic es create files, read/ write pages Indexing (2) Record Manager (1) data, metadata Disk Space Manager (given) Buffer Manager (given) OS File System + User Extension (5) Paged File Component (Provided) Paged File Component has two func6ons: provides in-memory buffer pool of pages/frames performs low-level file I/O at the granularity of pages Overview will be posted tomorrow: hdp://www.cs.ucdavis.edu/~green/courses/ecs165b/pageFile.html For now, see Doxygen docs: hdp://www.cs.ucdavis.edu/~green/courses/ecs165b/docs/annotated.html Where it all begins: PageFileManager... PageFileManager Your code will create one instance of this class Manages the buffer pool of in-memory pages allocate/de-allocate "scratch" pages coordinates with file handle objects to bring pages to/from disk uses LRU replacement policy Used to create/open/close/remove page files Returns FileHandle object to manage pages within a file FileHandle Returned by PageFileManager, used to: allocate/de-allocate pages in the file pages iden6fied by logical page number rather than physical offset mark page as dirty force page to disk scan pages in file Coding Tip: Don't Forget to Free Memory! DBMS is a long-running process; memory leaks are unacceptable Every new must have a matching delete With some coding discipline, can avoid many problems When possible, put new and delete close together in the code, so that a human can easily verify correctness Memory must always be freed, even when handling excep6onal condi6ons Use tools like valgrind to track down memory leaks We will check for memory leaks when grading your projects Coding Tip: Pinning/Unpinning Pages Whenever you access a page, you must remember to unpin it ager you're done (else you leak the page) Best coding pracDce: do both tasks nearby, ideally in the same func6on, so that correctness can easily be verified FileHandle* file;! PageHandle page;! ReturnCode code = file->getFirstPage(&page);! if (code == RC_OK) {! !// ... do stuff with page ...! !file->unpinPage(page.pageNo);! }! Same goes for memory alloca6on/de-alloca6on make it easy to match every new with its corresponding delete ...
View Full Document

Ask a homework question - tutors are online