hhhhhh - Parallel Computing Patterns for Grid Workflows...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
Parallel Computing Patterns for Grid Workflows Cesare Pautasso, Gustavo Alonso Department of Computer Science Swiss Federal Institute of Technology (ETHZ) ETH Zentrum, 8092 Z¨urich, Switzerland { pautasso,alonso } @inf.ethz.ch Abstract Whereas a consensus has been reached on defining the set of workflow patterns for business process modeling lan- guages, no such patterns exists for workflows applied to sci- entific computing on the Grid. By looking at different kinds of parallelism, in this paper we identify a set of workflow patterns related to parallel and pipelined execution. The paper presents how these patterns can be represented in different Grid workflow languages and discusses their im- plications for the design of the underlying workflow man- agement and execution infrastructure. A preliminary clas- sification of these patterns is introduced by surveying how they are supported by several existing advanced scientific and Grid workflow languages. 1 Introduction Scientific and Grid workflow languages [16, 23, 42] make use of techiques such as massively parallel execution and pipeline processing [18] to provide scientists with pow- erful modeling primitives and language constructs. These primitives are used to implement parallel task execution while retaining the characteristic high abstraction level of workflow languages. For example, the notion of data flow used in scientific workflows is a natural representation for simple data pro- cessing pipelines. It has the advantage that parallel execu- tion of independent tasks is modeled for free [19]. Pure data flow, however, is not expressive enough to model ei- ther branches and merges in the execution path nor iterative behavior [26]. This is why workflow languages typically focus on the control flow primitives rather than on the data flow aspects. An example of this focus is the existing lit- erature on control flow patterns [37]. In scientific applica- tions, however, data flow [34] and parallel computing pat- terns play a crucial role, not only in terms of design-time modeling but also in important performance optimization aspects related to run-time execution at a large-scale. In this paper, as a first step to better understand the re- lationship between parallel computing and scientific work- flows, we define a set of language patterns. These patterns can be classified in two broad categories: Parallel Execu- tion and Pipelined Execution (Table 1). Parallel execution patterns include: 1) Simple parallelism , where tasks lack- ing control flow dependencies are executed in parallel; 2) Data parallelism , a form of single instruction multiple data (SIMD) parallelism [15] with three variants: static, dynamic and adaptive. Pipelined execution patterns include: 3) best effort pipelines, where intermediate results are dropped if downstream tasks are not ready to process them; 4) block- ing pipelines, where a form of flow control is used to stop tasks that are located upstream from busy ones; 5) buffered
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 09/28/2009 for the course CS 525 taught by Professor Rjyosy during the Winter '09 term at Central Mich..

Page1 / 10

hhhhhh - Parallel Computing Patterns for Grid Workflows...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online