07_qp_joins(1)

07_qp_joins(1) - CPS216: Data-Intensive Computing Systems...

Info iconThis preview shows pages 1–16. Sign up to view the full content.

View Full Document Right Arrow Icon
CPS216: Data-Intensive Computing Systems Query Execution (Sort and Join operators) Shivnath Babu
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Roadmap • A simple operator: Nested Loop Join • Preliminaries – Cost model – Clustering – Operator classes • Operator implementation (with examples from joins) – Scan-based – Sort-based – Using existing indexes – Hash-based • Buffer Management • Parallel Processing
Background image of page 2
• NLJ (conceptually) for each r R1 do for each s R2 do if r.C = s.C then output r,s pair Nested Loop Join (NLJ) B C a 10 a 20 b 10 d 30 C D 10 cat 40 dog 15 bat 20 rat R1 R2
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Nested Loop Join (contd.) • Tuple-based • Block-based • Asymmetric
Background image of page 4
- Basic algorithm - Scan-based (e.g., NLJ) - Sort-based - Using existing indexes - Hash-based (building an index on the fly) - Memory management - Tradeoff between memory and #IOs - Parallel processing Implementing Operators
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Roadmap • A simple operator: Nested Loop Join • Preliminaries – Cost model – Clustering – Operator classes • Operator implementation (with examples from joins) – Scan-based – Sort-based – Using existing indexes – Hash-based • Buffer Management • Parallel Processing
Background image of page 6
Operator Cost Model • Simplest: Count # of disk blocks read and written during operator execution • Extends to query plans – Cost of query plan = Sum of operator costs • Caution: Ignoring CPU costs
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Assumptions • Single-processor-single-disk machine – Will consider parallelism later • Ignore cost of writing out result – Output size is independent of operator implementation • Ignore # accesses to index blocks
Background image of page 8
Parameters used in Cost Model B(R) = # blocks storing R tuples T(R) = # tuples in R V(R,A) = # distinct values of attr A in R M = # memory blocks available
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Roadmap • A simple operator: Nested Loop Join • Preliminaries – Cost model – Clustering – Operator classes • Operator implementation (with examples from joins) – Scan-based – Sort-based – Using existing indexes – Hash-based • Buffer Management • Parallel Processing
Background image of page 10
Notions of clustering • Clustered file organization ….. • Clustered relation ….. • Clustering index R1 R2 S1 S2 R3 R4 S3 S4 R1 R2 R3 R4 R5 R5 R7 R8
Background image of page 11

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Clustering Index Tuples with a given value of the search key packed in as few blocks as possible A index 10 10 35 19 19 19 19 42 37
Background image of page 12
Examples T(R) = 10,000 B(R) = 200 If R is clustered , then # R tuples per block = 10,000/200 = 50 Let V(R,A) = 40 If I is a clustering index on R.A, then # IOs to access σ R.A = “a” (R) = 250/50 = 5 If I is a non-clustering index on R.A, then # IOs to access σ R.A = “a” (R) = 250 ( > B(R))
Background image of page 13

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Operator Classes Tuple-at-a-time Full-relation Unary Select Sort Binary Difference
Background image of page 14
Roadmap • A simple operator: Nested Loop Join • Preliminaries – Cost model – Clustering – Operator classes • Operator implementation (with examples from joins) – Scan-based – Sort-based – Using existing indexes – Hash-based • Buffer Management • Parallel Processing
Background image of page 15

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 16
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 77

07_qp_joins(1) - CPS216: Data-Intensive Computing Systems...

This preview shows document pages 1 - 16. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online