Lecture13 - ECS 165B: Database System Implementa6on Lecture...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: ECS 165B: Database System Implementa6on Lecture 13 UC Davis April 26, 2010 Acknowledgements: por6ons based on slides by Raghu Ramakrishnan and Johannes Gehrke. Class Agenda Last 6me: Finish with external sor6ng Physical join operators Today: More physical operators: selec6on, projec6on, duplicate elimina6on, aggregates Quiz review Reading Chapter 14 of Ramakrishnan and Gehrke (or Chapter 13 of Silberschatz et al) Announcements Grades for Part 1: out tonight Quiz #1 in class on Wednesday Evalua6on of Rela6onal Opera6ons, cont. Selec6ons Selec6on above another operator: just evaluate the selec6on condi6on & R Selec6on above a source rela6on (common case): can use an index, if available & R S S Using an Index for Selec6ons Cost depends on # of qualifying tuples ("selec6vity") and clustering Cost of finding qualifying data entries (typically small) plus cost of retrieving records (could be large w/o clustering) Important refinement for unclustered indices: 1. Find qualifying data entries 2. Sort the record ids of the data records to be retrieved 3. Fetch record ids in order. This ensures that each data page is looked at just once (though # of such pages likely to be higher than with clustering) Complex Selec6ons (1) select ... ! from ...! where day<8/9/94 and bid=5 and sid=3! conjunc6on of =,<,etc terms First approach: find the most selec+ve access path, retrieve tuples using it, then apply any remaining selec6on condi6ons Most selec+ve access path: an index or file scan that we es6mate will require the fewest page I/Os Selec6on terms matching this index reduce the number of tuples retrieved from disk; remaining terms filter the retrieved tuples, but do not affect # of tuples fetched E.g., a B+ tree index on day can be used; then, bid=5 and sid=3 must be checked for each retrieved tuple. Or, a hash index on <bid,sid> could be used; day<8/9/94 must then be checked Complex Selec6ons (2) select ... ! from ...! where day<8/9/94 and bid=5 and sid=3! Second approach: use intersec+on of record ids (if we have 2 or more matching, non-clustered indices) Get sets of rids of data records using each matching index Then intersect these sets of rids (we'll discuss intersec6on soon) Retrieve the records and apply any remaining selec6on terms e.g., if we have a B+ tree index on day and another on sid, both using Alterna6ve (2), we can retrieve rids of records sa6sfying day<8/9/94 using the first, rids of records sa6sfying sid=3 using the second, intersect, retrieve records and check bid=5! Duplicate-Elimina6ng Projec6on (1) select distinct R.sid, S.bid! from Reserves R! First approach: use modified external merge-sort Modify Pass 0 of external sort to eliminate unwanted fields, to save space. Thus, runs of about 2B pages are produced (heapsort), but tuples in runs are smaller than input tuples. Size ra6o depends on # and size of fields that are dropped. Modify merging passes to eliminate duplicates. Thus, # of result tuples smaller than input. Difference depends on # of duplicates. Cost: in pass 0, read original rela6on (size M), write out same number of smaller tuples. In merging passes, fewer tuples wrijen out in each pass. Using Reserves example, 1000 input pages reduced to 250 in Pass 0 if size ra6o is 0.25. Duplicate-Elimina6ng Projec6on (2) select distinct R.sid, S.bid! from Reserves R! Second approach: use modified external hash sort Par++oning phase: read R using one input buffer. For each tuple, discard unwanted fields, apply hash func6on h on all fields to choose one of B-1 output buffers. Result is B-1 par66ons (of tuples with no unwanted fields). Tuples from different par66ons guaranteed to be dis6nct. Duplicate elimina+on phase: for each par66on, read and build in- memory hash table, using hash func6on h' ( h) on all fields, while discarding duplicates. If par66on does not fit in memory, can apply hash-based projec6on algorithm recursively to this par66on Cost: for par66on, read R, write out each tuple, but with fewer fields. This is read in next phase. Discussion of Projec6on Merge-sort based approach is the standard; bejer handling of skew and (as a bonus) result is sorted If an index on the rela6on contains all wanted ajributes in its search key, can do index-only scan Apply projec6on techniques to data entries (much smaller!) If an ordered (i.e., tree) index contains all wanted ajributes as prefix of search key, can do even bejer: Retrieve data entries in order (index-only scan), discard unwanted fields, compare adjacent tuples to check for duplicates Set Opera6ons intersect and cross-product: special cases of join union (dis6nct) and except are similar; we'll do union! Sort-based approach to union: Sort both rela6ons (on all ajributes) Scan sorted rela6ons and merge them, discarding duplicates Alterna+ve: merge runs from Pass 0 for both rela6ons Hash-based approach to union: Par66on R and S using hash func6on h (on all ajributes) For each S-par66on, build in-memory hash table (using h'), scan corresponding R-par66on and add tuples to table while discarding duplicates Aggregate Operators (avg, min, etc) select avg(S.age)! from Sailors S! where S.rating > 2! Without grouping: In general, requires scanning the rela6on Given index whose search key includes all ajributes in the select or where clauses, can do index-only scan Aggregate Operators (2) select rating, min(S.age)! from Sailors S! group by rating! With grouping: Sort on group-by ajributes, then scan rela6on and compute aggregate for each group. (Can improve upon this by combining sor6ng and aggregate computa6on.) Similar approach based on hashing on group-by ajributes Given tree index whose search key includes all ajributes in select, where, and group-by clauses, can do index-only scan; if group-by ajributes form prefix of search key, can retrieve data entries/tuples in group-by order Impact of Buffering If several opera6ons are execu6ng concurrently, es6ma6ng the # of available buffer pool pages is guesswork Repeated access paBerns interact with buffer replacement policy e.g., inner rela6on is scanned repeatedly in Simple Nested Loops Join. With enough buffer pages to hold inner, replacement policy does not majer. Otherwise, MRU is best, LRU is worst (sequen+al flooding). Does replacement policy majer for Block Nested Loops? What about Index Nested Loops? Summary A virtue of rela6onal DBMSs: queries are composed of a few basic operators; the implementa6on of these operators can (and must!) be carefully tuned Many alterna6ve implementa6on techniques for each operator; no universally superior technique for most operators Must consider available alterna6ves for each opera6on in a query and choose best one based on system sta6s6cs, etc. This is part of the broader task of op6mizing a query composed of several operators. Quiz #1 (Wednesday): Whirlwind Review What Have We Covered So Far in this Class? Buffer and file management Indexing External sor6ng Query evalua6on But not query op6miza6on Rela6onal calculus, rela6onal algebra, conjunc6ve queries DavisDB Parts 1 and 2; subversion Quiz will cover all of these topics! Study aids: (1) slides, (2) project documenta6on, (3) review ques6ons in textbook: Ch 8, 9, 10, 12, 13, 14 Buffer and File Management Buffer pool pinning replacement policies Record formats fixed-length versus variable-length Page formats for fixed-length versus variable-length records Record files / unordered heap files Schemes for keeping track of free space DavisDB Part 1 What are the roles of... PageFileManager FileHandle RecordManager RecordFileHandle RecordFileScan Indexing Clustered versus unclustered Alterna6ves for unclustered indices (alts. (2) vs (3)) Hash-based indices Tree-structured indices ISAM versus B+ tree B+ trees: lookups, range searches, inser6ons and dele6ons DavisDB Part 2 What are the roles of... IndexManager IndexHandle IndexScan External Sor6ng External merge sort (2-way, n-way, ...) Hash-based external sort Using B+ trees to sort Query Evalua6on Logical versus physical operators Join: Simple Nested Loops Join Index Nested Loops Join Block Nested Loops Join Sort-Merge Join Hash Join Selec6on: Index-based selec6on Projec6on, union, etc: duplicate elimina6on Rela6onal calculus, rela6onal algebra, conjunc6ve queries Rela6onal calculus: logic-based query language Expressive equivalence of rela6onal calculus and rela6onal algebra What is a conjunc6ve query (SQL, rela6onal calculus, rela6onal algebra) Miscellaneous Subversion! ...
View Full Document

Ask a homework question - tutors are online