P61-raman - Parallel Querying with Non-Dedicated Computers Vijayshankar Raman Wei Han Inderpal Narang IBM Almaden Research Center 650 Harry Road

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Parallel Querying with Non-Dedicated Computers Vijayshankar Raman Wei Han Inderpal Narang IBM Almaden Research Center 650 Harry Road, San Jose CA 95120 { ravijay,whan,narang } @us.ibm.com Abstract We present DITN, a new method of paral- lel querying based on dynamic outsourcing of join processing tasks to non-dedicated , hetero- geneous computers. In DITN, partitioning is not the means of parallelism. Data layout deci- sions are taken outside the scope of the DBMS, and handled within the storage software; query processors see a “Data In The Network” im- age. This allows gradual scaleout as the work- load grows, by using non-dedicated computers. A typical operator in a parallel query plan is Exchange [7]. We argue that Exchange is unsuitable for non-dedicated machines be- cause it poorly addresses node heterogeneity, and is vulnerable to failures or load spikes during query execution. DITN uses an al- ternate intra-fragment parallelism where each node executes an independent select-project- join-aggregate-group by block, with no tuple exchange between nodes. This method cleanly handles heterogeneous nodes, and well adapts during execution to node failures or load spikes. Initial experiments suggest that DITN performs competitively with a traditional configuration of dedicated machines and well-partitioned data for up to 10 processors at least. At the same time, DITN gives significant flexibility in terms of gradual scaleout and handling of heterogene- ity, load bursts, and failures. 1 Introduction Parallel query processing [4] has evolved from being a research idea ( e.g., Gamma, XPRS [5, 8]) to being a standard feature provided by most DBMS vendors ( e.g., Tandem, Teradata, Oracle, Informix XPS, and Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment. Proceedings of the 31st VLDB Conference, Trondheim, Norway, 2005 DB2 [20, 22, 14, 2]). The parallelism in these sys- tems is highly scalable, with vendors reporting good speedup with even 1000s of parallel nodes [4]. Traditionally, parallel query systems have been clas- sified as shared nothing (SN), shared memory (SMP), and shared disk (SD) [19]. But a common character- istic of all these three types is that they rely on ded- icated processors, pre-configured and pre-assigned for the parallel query task. For SMP systems (and SMP nodes in SD/SN sys- tems), this coupling is done in hardware. In SD sys- tems, the compute nodes are often connected by spe- cialized interconnects to a shared storage. SN systems are more loosely coupled due to a cluster-architecture....
View Full Document

This note was uploaded on 03/01/2010 for the course ICT ... taught by Professor ... during the Three '10 term at University of Sydney.

Page1 / 12

P61-raman - Parallel Querying with Non-Dedicated Computers Vijayshankar Raman Wei Han Inderpal Narang IBM Almaden Research Center 650 Harry Road

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online