lecture14 - Query Optimization Fundamentals of Database...

Info iconThis preview shows pages 1–13. Sign up to view the full content.

View Full Document Right Arrow Icon
Query Optimization Fundamentals of Database Systems: ch 15 Database System Concepts : ch 14
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Query Optimization Why ? Many different ways of executing a given query Huge differences in cost Example: select * from person where ssn = “123” Size of person = 1GB Sequential Scan: Takes 1GB / (20MB/s) = 50s Use an index on SSN (assuming one exists): Approx 4 Random I/Os = 40ms
Background image of page 2
Query Optimization Equivalent relational expressions Drawn as a tree List the operations and the order
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Query Optimization Execution plans Evaluation expressions annotated with the methods used
Background image of page 4
Query Optimization Steps: Generate all possible execution plans for the query Figure out the cost for each of execution plans Choose the best Not done exactly as listed above Too many different execution plans for that Typically interleave all of these into a single efficient search algorithm
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Query Optimization Steps: Generate all possible execution plans for the query First generate all equivalent expressions Then consider all annotations for the operations Figure out the cost for each of them Compute cost for each operation Using the formulas discussed before One problem: How do we know the number of result tuples Choose the best ) ( 2500 account balance
Background image of page 6
A Simple Case Queries with only selections on a single relation with no indexes select * from person where substr(name, 1, 1) in [‘A’, ‘B’, ‘C’] and 100ns zipcode = 94720 and 1ns date-of-birth > to_date('1978/05/31', 'yyyy/mm/dd') 1000ns Relation contains: 1,000,000 tuples CPU Costs
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
A Simple Case Possible execution plan: For each tuple Evaluate substr predicate If true, Evaluate zipcode predicate If true, evaluate date-of-birth predicate If true, output the tuple 6 different possibilities
Background image of page 8
A Simple Case Compute cost of each possibility substr() zipcode date-of-birth Need some more information selectivity : fraction of tuples expected to pass the predicates Let selectivity (substr predicate) = 3/26 Let selectivity(zipcode predicate) = 1/100 And, selectivity(date-of-birth predicate) = 1/3 How are selectivities computed ? Must keep track of some additional information about the relations
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
A Simple Case Compute cost of each possibility substr() zipcode date-of-birth Given that: Cost of the above plan = 1,000,000 * 100ns (Substr) + 1,000,000 * 3/26 * 1ns (Zipcode) + 1,000,000 * 3/26 * 1/100 * 1000ns (data-of-birth) = approx 100.5 ms Cost of the plan: zipcode substr() date-of-birth: Approx 12.92 ms About a factor of 10 better.
Background image of page 10
A Simple Case General algorithm: Don’t need to check all n! Possibilities Sort the predicates in the increasing order by: 1 – selectivity(predicate) cost of the predicate
Background image of page 11

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Query Optimization General case: Need: A way to enumerate all plans
Background image of page 12
Image of page 13
This is the end of the preview. Sign up to access the rest of the document.