07_qp_joins

07_qp_joins - CPS216 Data-Intensive Computing Systems Query Execution(Sort and Join operators Shivnath Babu Roadmap A simple operator Nested Loop

Info iconThis preview shows pages 1–17. Sign up to view the full content.

View Full Document Right Arrow Icon
CPS216: Data-Intensive Computing Systems Query Execution (Sort and Join operators) Shivnath Babu
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Roadmap A simple operator: Nested Loop Join Preliminaries Cost model Clustering Operator classes Operator implementation (with examples from joins) Scan-based Sort-based Using existing indexes Hash-based Buffer Management Parallel Processing
Background image of page 2
NLJ (conceptually) for each r R1 do for each s R2 do if r.C = s.C then output r,s pair Nested Loop Join (NLJ) B C a 10 a 20 b 10 d 30 C D 10 cat 40 dog 15 bat 20 rat R1 R2
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Nested Loop Join (contd.) Tuple-based Block-based Asymmetric
Background image of page 4
- Basic algorithm - Scan-based (e.g., NLJ) - Sort-based - Using existing indexes - Hash-based (building an index on the fly) - Memory management - Tradeoff between memory and #IOs - Parallel processing Implementing Operators
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Roadmap A simple operator: Nested Loop Join Preliminaries Cost model Clustering Operator classes Operator implementation (with examples from joins) Scan-based Sort-based Using existing indexes Hash-based Buffer Management Parallel Processing
Background image of page 6
Operator Cost Model Simplest: Count # of disk blocks read and written during operator execution Extends to query plans Cost of query plan = Sum of operator costs Caution: Ignoring CPU costs
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Assumptions Single-processor-single-disk machine Will consider parallelism later Ignore cost of writing out result Output size is independent of operator implementation Ignore # accesses to index blocks
Background image of page 8
Parameters used in Cost Model B(R) = # blocks storing R tuples T(R) = # tuples in R V(R,A) = # distinct values of attr A in R M = # memory blocks available
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Roadmap A simple operator: Nested Loop Join Preliminaries Cost model Clustering Operator classes Operator implementation (with examples from joins) Scan-based Sort-based Using existing indexes Hash-based Buffer Management Parallel Processing
Background image of page 10
Notions of clustering Clustered file organization ….. Clustered relation ….. Clustering index R1 R2 S1 S2 R3 R4 S3 S4 R1 R2 R3 R4 R5 R5 R7 R8
Background image of page 11

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Clustering Index Tuples with a given value of the search key packed in as few blocks as possible A index 10 10 35 19 19 19 19 42 37
Background image of page 12
Examples T(R) = 10,000 B(R) = 200 If R is clustered , then # R tuples per block = 10,000/200 = 50 Let V(R,A) = 40 If I is a clustering index on R.A, then # IOs to access σ R.A = “a” (R) = 250/50 = 5 If I is a non-clustering index on R.A, then # IOs to access σ R.A = “a” (R) = 250 ( > B(R))
Background image of page 13

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Operator Classes Tuple-at-a-time Full-relation Unary Select Sort Binary Difference
Background image of page 14
Roadmap A simple operator: Nested Loop Join Preliminaries Cost model Clustering Operator classes Operator implementation (with examples from joins) Scan-based Sort-based Using existing indexes Hash-based Buffer Management Parallel Processing
Background image of page 15

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Implementing Tuple-at-a-time Operators One pass algorithm: Scan Process tuples one by one Write output Cost = B(R) Remember: Cost = # IOs, and we ignore the cost to write output
Background image of page 16
Image of page 17
This is the end of the preview. Sign up to access the rest of the document.

This document was uploaded on 01/17/2012.

Page1 / 77

07_qp_joins - CPS216 Data-Intensive Computing Systems Query Execution(Sort and Join operators Shivnath Babu Roadmap A simple operator Nested Loop

This preview shows document pages 1 - 17. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online