p349-paparizos - Pattern tree algebras: sets or sequences?...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Pattern tree algebras: sets or sequences? Stelios Paparizos * H. V. Jagadish * University of Michigan Ann Arbor, MI, USA { spapariz, jag } @umich.edu Abstract XML and XQuery semantics are very sensitive to the order of the produced output. Although pattern-tree based algebraic approaches are be- coming more and more popular for evaluating XML, there is no universally accepted technique which can guarantee both a correct output order and a choice of efficient alternative plans. We address the problem using hybrid collections of trees that can be either sets or sequences or something in between. Each such collection is coupled with an Ordering Specification that de- scribes how the trees are sorted (full, partial or no order). This provides us with a formal basis for developing a query plan having parts that main- tain no order and parts with partial or full order. It turns out that duplicate elimination introduces some of the same issues as order maintenance: it is expensive and a single collection type does not always provide all the flexibility required to op- timize this properly. To solve this problem we associate with each hybrid collection a Duplicate Specification that describes the presence or ab- sence of duplicate elements in it. We show how to extend an existing bulk tree algebra, TLC [12], to use Ordering and Duplicate specifications and produce correctly ordered results. We also sug- gest some optimizations enabled by the flexibility of our approach, and experimentally demonstrate the performance increase due to them. 1 Introduction XML means many things to many people, and gets used in a variety of ways. The formal semantics of XML and XQuery require ordering, yet many database-style ap- plications could not care less about order. This leaves the Permission to copy without fee all or part of this material is granted pro- vided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment. Proceedings of the 31st VLDB Conference, Trondheim, Norway, 2005 query processing engine designer in a quandary: should or- der be maintained, as required by the semantics, irrespec- tive of the additional cost; or can order be ignored for per- formance reasons. What we would like is an engine where we pay the cost to maintain order when we need it, and do not incur this overhead when it is not necessary. In fact, we would like even more. Even when ordered fi- nal results are required, it may not be necessary to maintain order at each intermediate step. Exploiting this flexibility, provided the required order can eventually be established (or recovered), can lead to significant performance bene- fits....
View Full Document

Page1 / 12

p349-paparizos - Pattern tree algebras: sets or sequences?...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online