05_btree

05_btree - CPS216: Dataintensive Computing Systems...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: CPS216: Dataintensive Computing Systems Operators for Data Access Shivnath Babu 1 Problem Relation: Employee (ID, Name, Dept, ...) 10 M tuples (Filter) Query: SELECT * FROM Employee WHERE Name = "Bob" Solution #1: Full Table Scan Storage: Query plan: Employee relation stored in contiguous blocks Scan the entire relation, output tuples with Name = "Bob" Cost: Size of each record = 100 bytes Size of relation = 10 M x 100 = 1 GB Time @ 20 MB/s 1 Minute 3 Solution #2 Storage: Employee relation sorted on Name attribute Binary search Query plan: 4 Solution #2 Cost: Size of a block: 1024 bytes Number of records per block: 1024 / 100 = 10 Total number of blocks: 10 M / 10 = 1 M Blocks accessed by binary search: 20 Total time: 20 ms x 20 = 400 ms 5 Solution #2: Issues Filters on different attributes: SELECT * FROM Employee WHERE Dept = "Sales" Inserts and Deletes 6 Indexes Data structures that efficiently evaluate a class of filter predicates over a relation Class of filter predicates: Single or multiattributes (indexkey attributes) Range and/or equality predicates (Usually) independent of physical storage of relation: Multiple indexes per relation 7 Indexes Disk resident Large to fit in memory Persistent Relation updates costlier Query cheaper Updated when indexed relation updated 8 Problem Relation: Employee (ID, Name, Dept, ...) (Filter) Query: SELECT * FROM Employee WHERE Name = "Bob" SingleAttribute Index on Name that supports equality predicates Name Roadmap Motivation SingleAttribute Indexes: Overview Orderbased Indexes Hashbased Indexes (May cover in future) MultiAttribute Indexes (Chapter 14 GMUW, May cover in future) 10 BTrees Extensible Hashing Linear Hashing Single Attribute Index: General Construction A a1 a2 ai an B b1 b2 bi bn Single Attribute Index: General Construction A a1 A = val A > low A < high B b1 b2 a1 a2 ai an a2 ai an bi bn Exceptions Sparse Indexes Require specific physical layout of relation Example: Relation sorted on indexed attribute More efficient 13 Single Attribute Index: General Construction Textbook: Dense Index a1 A = val A > low A < high A a1 a2 ai an B b1 b2 a2 ai an bi bn 14 Single Attribute Index: General Construction a1 A = val A > low A < high a2 ai an How do we organize (attribute, pointer) pairs? Idea: Use dictionary data structures Issue: Disk resident? Roadmap Motivation SingleAttribute Indexes: Overview Orderbased Indexes BTrees Hashbased Indexes Extensible Hashing Linear Hashing MultiAttribute Indexes 16 BTrees Adaptation of search tree data structure 23 trees Supports range predicates (and equality) 17 Use Binary Search Tree Directly? 71 32 16 54 83 74 92 16 32 54 71 74 83 92 Use Binary Search Tree Directly? Store records of type <key, leftptr, rightptr, dataptr> Remember position of root Question: will this work? Yes But we can do better! 19 Use Binary Search Tree Directly? Number of keys: 1 M Number of levels: log (2^20) = 20 Total cost index lookup: 20 random disk I/O 20 x 20 ms = 400 ms BTree: less than 3 random disk I/O 20 BTree vs. Binary Search Tree k k1 k2 k3 k40 1 Random I/O prunes tree by 40 1 Random I/O prunes tree by half 21 BTree Example 15 36 57 63 76 87 92 100 22 BTree Example 63 36 84 91 15 36 57 63 76 87 92 100 null 23 Meaning of Internal Node 84 91 key < 84 84 key < 91 91 key 24 BTree Example 63 36 84 91 15 36 57 63 76 87 92 100 null 25 Meaning of Leaf Nodes 63 76 Next leaf pointer to record 63 pointer to record 76 26 Equality Predicates 63 key = 87 36 84 91 15 36 57 63 76 87 92 100 null 27 Equality Predicates 63 key = 87 36 84 91 15 36 57 63 76 87 92 100 null 28 Equality Predicates 63 key = 87 36 84 91 15 36 57 63 76 87 92 100 null 29 Equality Predicates 63 key = 87 36 84 91 15 36 57 63 76 87 92 100 null 30 Range Predicates 63 57 key < 95 36 84 91 15 36 57 63 76 87 92 100 null 31 Range Predicates 63 57 key < 95 36 84 91 15 36 57 63 76 87 92 100 null 32 Range Predicates 63 57 key < 95 36 84 91 15 36 57 63 76 87 92 100 null 33 Range Predicates 63 57 key < 95 36 84 91 15 36 57 63 76 87 92 100 null 34 Range Predicates 63 57 key < 95 36 84 91 15 36 57 63 76 87 92 100 null 35 Range Predicates 63 57 key < 95 36 84 91 15 36 57 63 76 87 92 100 null 36 General BTrees Fixed parameter: n Number of keys: n Number of pointers: n + 1 37 BTree Example 63 n = 2 84 91 36 15 36 57 63 76 87 92 100 null 38 General BTrees Fixed parameter: n Number of keys: n Number of pointers: n + 1 All leaves at same depth All (key, record pointer) in leaves 39 BTree Example 63 n = 2 84 91 36 15 36 57 63 76 87 92 100 null 40 General BTrees: Space related constraints Use at least Root: 2 pointers Internal: Leaf: (n+1)/2 pointers (n+1)/2 pointers to data 41 n=3 Max Internal 5 15 21 15 Min Leaf 31 42 56 31 42 42 Leaf Nodes n key slots (n+1) pointer slots 43 Leaf Nodes unused n key slots (n+1) pointer slots k1 k 2 k 3 ... ... ... ... ... km next leaf record of k m 44 record of k1 record of k 2 Leaf Nodes m n key slots (n+1) pointer slots (n+1) 2 ... km ... unused k1 k 2 k 3 ... ... ... next leaf record of k m 45 record of k1 record of k 2 Internal Nodes n key slots (n+1) pointer slots 46 Internal Nodes unused n key slots (n+1) pointer slots k1 k 2 k 3 km key < k 1 k key < k 1 2 k key m 47 Internal Nodes (n+1) (m+1) 2 unused n key slots (n+1) pointer slots k1 k 2 k 3 km key < k 1 k key < k 1 2 k key m 48 Root Node (m+1) 2 unused n key slots (n+1) pointer slots k1 k 2 k 3 km key < k 1 k key < k 1 2 k key m 49 Limits Why the specific limits (n+1)/2 and (n+1)/2 ? Why different limits for leaf and internal nodes? Can we reduce each limit? Can we increase each limit? What are the implications? 50 ...
View Full Document

This document was uploaded on 01/17/2012.

Ask a homework question - tutors are online