p157-ivanova - Customizable Parallel Execution of...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Customizable Parallel Execution of Scientific Stream Queries Milena Ivanova Tore Risch Department of Information Technology Uppsala University, Sweden { milena.ivanova, tore.risch } @it.uu.se Abstract Scientific applications require processing high- volume on-line streams of numerical data from instruments and simulations. We present an extensible stream database system that allows scalable and exible continuous queries on such streams. Application dependent streams and query functions are defined through an object-relational model. Distributed execu- tion plans for continuous queries are described as high-level data ow distribution templates. Using a generic template we define two par- titioning strategies for scalable parallel execu- tion of expensive stream queries: window split and window distribute. Window split provides operators for parallel execution of query func- tions by reducing the size of stream data units using application dependent functions as pa- rameters. By contrast, window distribute pro- vides operators for customized distribution of entire data units without reducing their size. We evaluate these strategies for a typical high volume scientific stream application and show that window split is favorable when expen- sive queries are executed on limited resources, while window distribution is better otherwise. 1 Introduction In order to explore information from very high volume raw data generated by scientific instruments, such as satellites, on-ground antennas, and simulators, scien- tists need to perform a wide range of analyses over the Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment. Proceedings of the 31st VLDB Conference, Trondheim, Norway, 2005 data streams. Complex analyses are presently done off-line on data stored on disk using hard-coded pre- defined processing of the data. The off-line processing creates large backlogs of unanalyzed data and the high volume produced by scientific instruments can even be too large to store and process [13, 14]. Furthermore, off-line data processing prevents timely analysis after interesting natural events occurred. We address these problems by utilizing an extensi- ble stream database system, GSDM 1 , where scientists can specify in a exible way analyses as on-line dis- tributed continuous queries (CQs) over the streams....
View Full Document

This note was uploaded on 03/01/2010 for the course ICT ... taught by Professor ... during the Three '10 term at University of Sydney.

Page1 / 12

p157-ivanova - Customizable Parallel Execution of...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online