This preview shows page 1. Sign up to view the full content.
Unformatted text preview: distinct values of the attribute a • Estimating the cost:
• Important in optimization (next lecture)
• Compute I/O cost only
• We compute the cost to read the tables • We don’t compute the cost to write the result (because pipelining) Optimization fundamentally means searching for the best. The concept 8 Since different query plans have different costs, Sellinger decided that this can be used to optimize the query. The parameters that are ideal for utilization for this are listed on the slide. ** This is different from the sorting algorithms we have learned in our data structures courses. Sorting
• Two pass multi way merge sort
• Step 1:
• Read M blocks at a time, sort, write
• Result: have runs of length M on disk For example: • Step 2:
• Merge M 1 at a time, write to disk
• Result: have runs of length M(M 1)≈M2 • Cost: 3B(R), Assumption: B(R) ≤ M2 9 Suppose we want to sort R such that, memory to disk = 1000, B(R) = 20kb. First we read R. Memory can only hold 1000 elements, so R will run 20 times (each time 1000 elements, 20k total). Merge after one at a time. We are assuming B(R) <= M . If this is not satisfied, we will need more steps/iterations. That will make it more expensive. The cost is 3*B(R) if the assumption is qualified.
. 2 Tuples are clusters or unclustered. Scanning Tables If table is clustered, we mean they are contiguous on disk. Thus, when reading, you spend B(R) time to read it. • The table is clustered (I.e. blocks consists only of records from this table):
• Table scan: if we know wh...
View
Full
Document
This note was uploaded on 01/28/2014 for the course CS 411 taught by Professor Staff during the Fall '08 term at University of Illinois, Urbana Champaign.
 Fall '08
 Staff
 Sort

Click to edit the document details