{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

p157-ivanova - Customizable Parallel Execution of Scientic...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
Customizable Parallel Execution of Scientific Stream Queries Milena Ivanova Tore Risch Department of Information Technology Uppsala University, Sweden { milena.ivanova, tore.risch } @it.uu.se Abstract Scientific applications require processing high- volume on-line streams of numerical data from instruments and simulations. We present an extensible stream database system that allows scalable and flexible continuous queries on such streams. Application dependent streams and query functions are defined through an object-relational model. Distributed execu- tion plans for continuous queries are described as high-level data flow distribution templates. Using a generic template we define two par- titioning strategies for scalable parallel execu- tion of expensive stream queries: window split and window distribute. Window split provides operators for parallel execution of query func- tions by reducing the size of stream data units using application dependent functions as pa- rameters. By contrast, window distribute pro- vides operators for customized distribution of entire data units without reducing their size. We evaluate these strategies for a typical high volume scientific stream application and show that window split is favorable when expen- sive queries are executed on limited resources, while window distribution is better otherwise. 1 Introduction In order to explore information from very high volume raw data generated by scientific instruments, such as satellites, on-ground antennas, and simulators, scien- tists need to perform a wide range of analyses over the Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment. Proceedings of the 31st VLDB Conference, Trondheim, Norway, 2005 data streams. Complex analyses are presently done off-line on data stored on disk using hard-coded pre- defined processing of the data. The off-line processing creates large backlogs of unanalyzed data and the high volume produced by scientific instruments can even be too large to store and process [13, 14]. Furthermore, off-line data processing prevents timely analysis after interesting natural events occurred. We address these problems by utilizing an extensi- ble stream database system, GSDM 1 , where scientists can specify in a flexible way analyses as on-line dis- tributed continuous queries (CQs) over the streams. Since the target scientific applications have high vol- ume data and expensive computations, GSDM has been designed with a distributed and parallel archi- tecture to provide scalability for both data volumes and computations.
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}