# lecture14 - Query Optimization Fundamentals of Database...

This preview shows pages 1–13. Sign up to view the full content.

Query Optimization Fundamentals of Database Systems: ch 15 Database System Concepts : ch 14

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Query Optimization Why ? Many different ways of executing a given query Huge differences in cost Example: select * from person where ssn = “123” Size of person = 1GB Sequential Scan: Takes 1GB / (20MB/s) = 50s Use an index on SSN (assuming one exists): Approx 4 Random I/Os = 40ms
Query Optimization Equivalent relational expressions Drawn as a tree List the operations and the order

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Query Optimization Execution plans Evaluation expressions annotated with the methods used
Query Optimization Steps: Generate all possible execution plans for the query Figure out the cost for each of execution plans Choose the best Not done exactly as listed above Too many different execution plans for that Typically interleave all of these into a single efficient search algorithm

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Query Optimization Steps: Generate all possible execution plans for the query First generate all equivalent expressions Then consider all annotations for the operations Figure out the cost for each of them Compute cost for each operation Using the formulas discussed before One problem: How do we know the number of result tuples Choose the best ) ( 2500 account balance
A Simple Case Queries with only selections on a single relation with no indexes select * from person where substr(name, 1, 1) in [‘A’, ‘B’, ‘C’] and 100ns zipcode = 94720 and 1ns date-of-birth > to_date('1978/05/31', 'yyyy/mm/dd') 1000ns Relation contains: 1,000,000 tuples CPU Costs

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
A Simple Case Possible execution plan: For each tuple Evaluate substr predicate If true, Evaluate zipcode predicate If true, evaluate date-of-birth predicate If true, output the tuple 6 different possibilities
A Simple Case Compute cost of each possibility substr() zipcode date-of-birth Need some more information selectivity : fraction of tuples expected to pass the predicates Let selectivity (substr predicate) = 3/26 Let selectivity(zipcode predicate) = 1/100 And, selectivity(date-of-birth predicate) = 1/3 How are selectivities computed ? Must keep track of some additional information about the relations

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
A Simple Case Compute cost of each possibility substr() zipcode date-of-birth Given that: Cost of the above plan = 1,000,000 * 100ns (Substr) + 1,000,000 * 3/26 * 1ns (Zipcode) + 1,000,000 * 3/26 * 1/100 * 1000ns (data-of-birth) = approx 100.5 ms Cost of the plan: zipcode substr() date-of-birth: Approx 12.92 ms About a factor of 10 better.
A Simple Case General algorithm: Don’t need to check all n! Possibilities Sort the predicates in the increasing order by: 1 – selectivity(predicate) cost of the predicate

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Query Optimization General case: Need: A way to enumerate all plans
This is the end of the preview. Sign up to access the rest of the document.