p289-zhang - Statistical Learning Techniques for Costing...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
Statistical Learning Techniques for Costing XML Queries Ning Zhang 1 Peter J. Haas 2 Vanja Josifovski 2 Guy M. Lohman 2 Chun Zhang 2 1 University of Waterloo 2 IBM Almaden Research Center 200 University Ave. W., Waterloo, ON, Canada 650 Harry Road, San Jose, CA, USA [email protected] phaas,vanja,lohman,[email protected] Abstract Developing cost models for query optimization is sig- nificantly harder for XML queries than for traditional relational queries. The reason is that XML query operators are much more complex than relational operators such as table scans and joins. In this paper, we propose a new approach, called Comet , to modeling the cost of XML operators; to our knowledge, Comet is the first method ever proposed for addressing the XML query costing problem. As in relational cost estimation, Comet exploits a set of system catalog statistics that summarizes the XML data; the set of “simple path” statistics that we propose is new, and is well suited to the XML setting. Unlike the traditional approach, Comet uses a new statistical learning technique called “transform regres- sion” instead of detailed analytical models to predict the overall cost. Besides rendering the cost estimation problem tractable for XML queries, Comet has the further advantage of enabling the query optimizer to be self-tuning, automatically adapting to changes over time in the query workload and in the system environment. We demonstrate Comet ’s feasibility by developing a cost model for the recently proposed XNav navigational operator. Empirical studies with synthetic, benchmark, and real-world data sets show that Comet can quickly obtain accurate cost estimates for a variety of XML queries and data sets. 1 Introduction Management of XML data, especially the processing of XPath queries [5], has been the focus of considerable research and development activity over the past few years. A wide variety of join-based, navigational, and hybrid XPath processing techniques are now available; see, for example, [3, 4, 11, 25]. Each of these techniques can exploit structural and/or value- based indexes. An XML query optimizer can therefore Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment. Proceedings of the 31st VLDB Conference, Trondheim, Norway, 2005 choose among a large number of alternative plans for processing a specified XPath expression. As in the traditional relational database setting, the optimizer needs accurate cost estimates for the XML operators in order to choose a good plan.
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

Page1 / 12

p289-zhang - Statistical Learning Techniques for Costing...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online