This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Efficient Evaluation of XQuery over Streaming Data Xiaogang Li Gagan Agrawal Department of Computer Science and Engineering Ohio State University, Columbus OH 43210 xgli,agrawal @cse.ohio-state.edu Abstract With the growing popularity of XML and emer- gence of streaming data model, processing queries over streaming XML has become an important topic. This paper presents a new framework and a set of techniques for processing XQuery over streaming data. As compared to the existing work on supporting XPath/XQuery over data streams, we make the following three contributions: 1. We propose a series of optimizations which transform XQuery queries so that they can be cor- rectly executed with a single pass on the dataset. 2. We present a methodology for determining when an XQuery query, possibly after the trans- formations we introduce, can be correctly exe- cuted with only a single pass on the dataset. 3. We describe a code generation approach which can handle XQuery queries with user-defined ag- gregates, including recursive functions. We ag- gressively use static analysis and generate exe- cutable code, i.e., do not require a query plan to be interpreted at runtime. We have evaluated our implementation using sev- eral XMark benchmarks and three other XQuery queries driven by real applications. Our ex- perimental results show that as compared to Qizx/Open, Saxon, and Galax, our system: 1) is at least 25% faster on XMark queries with small datasets, 2) is significantly faster on XMark queries with larger datasets, 3) at least one or- der of magnitude faster on the queries driven by real applications, as unlike other systems, we can Permission to copy without fee all or part of this material is granted pro- vided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment. Proceedings of the 31st VLDB Conference, Trondheim, Norway, 2005 transform them to execute with a single pass, and 4) executes queries efficiently on large datasets when other systems often have memory over- flows. 1 Introduction XML is a flexible exchange format that has gained pop- ularity for representing many classes of data, including structured documents, heterogeneous and semi-structured records, data from scientific experiments and simulations, digitized images, among others. As a result, querying XML documents has received much attention. At the same time, a new model of data processing has also emerged in the database community. In this data model, data arrives in the form of continuous streams , usually from a data col- lection instruments or a long running computer simulation....
View Full Document
This note was uploaded on 03/01/2010 for the course ICT ... taught by Professor ... during the Three '10 term at University of Sydney.
- Three '10