p373-markl - Consistently Estimating the Selectivity of...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
Consistently Estimating the Selectivity of Conjuncts of Predicates V. Markl 1 N. Megiddo 1 M. Kutsch 2 T.M. Tran 3 P. Haas 1 U. Srivastava 4 1 IBM Almaden Research Center 2 IBM Germany 3 IBM Silicon Valley Lab 4 Stanford University {marklv, megiddo, peterh}@almaden.ibm.com, kutschm@de.ibm.com, minhtran@us.ibm.com, usriv@stanford.edu Abstract Cost-based query optimizers need to estimate the selectivity of conjunctive predicates when com- paring alternative query execution plans. To this end, advanced optimizers use multivariate statis- tics (MVS) to improve information about the joint distribution of attribute values in a table. The joint distribution for all columns is almost always too large to store completely, and the re- sulting use of partial distribution information raises the possibility that multiple, non-equiva- lent selectivity estimates may be available for a given predicate. Current optimizers use ad hoc methods to ensure that selectivities are estimated in a consistent manner. These methods ignore valuable information and tend to bias the opti- mizer toward query plans for which the least in- formation is available, often yielding poor re- sults. In this paper we present a novel method for consistent selectivity estimation based on the principle of maximum entropy (ME). Our method efficiently exploits all available infor- mation and avoids the bias problem. In the ab- sence of detailed knowledge, the ME approach reduces to standard uniformity and independence assumptions. Our implementation using a proto- type version of DB2 UDB shows that ME im- proves the optimizer’s cardinality estimates by orders of magnitude, resulting in better plan quality and significantly reduced query execution times. 1. Introduction Estimating the selectivity of predicates has always been a challenging task for a query optimizer in a relational data- base management system. A classic problem has been the lack of detailed information about the joint frequency distribution of attribute values in the table of interest. Per- haps ironically, the additional information now available to modern optimizers has in a certain sense made the se- lectivity-estimation problem even harder. Specifically, consider the problem of estimating the selectivity s 1,2,…, n of a conjunctive predicate of the form p 1 p 2 p n , where each p i is a simple predicate (also called a Boolean Factor, or BF) of the form “ column op literal ”. Here column is a column name, op is a relational comparison operator such as “=”, “>”, or “LIKE”, and literal is a literal in the domain of the col- umn; some examples of simple predicates are ‘make = “Honda”’ and ‘year > 1984’. By the selectivity of a predi- cate p , we mean, as usual, the fraction of rows in the table that satisfy p .
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 12

p373-markl - Consistently Estimating the Selectivity of...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online