Lecture5 - College of Information Technology Master Program...

Info iconThis preview shows pages 1–15. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: College of Information Technology Master Program in Scientific Computing Scientific Computing II (SCOM6301) Introduction to Parallel Computing Lecture 5 Implementation Styles and Examples CH03 Decomposing Programs for Parallelism Implementation styles Iterative loops Recursive transversal of tree-like data structures Iterative loops Parallel loop programming: on sharedmemory systems; be aware of data races(dependences) SPMD programming (single-program,multiple data) Recursive task programming Not considered at this satage Parallel loop programming: To parallelize loops we assign different iterations, or different blocks of iterations, to different processors. On shared-memory systems, this decomposition is usually coded as some kind of PARALLEL DO loop Example DO I = 1, N A(I) = A(I) + C ENDDO Is there data sharing in this code? Example 2 DO I = 1, N A(I) = A(I+1) + C ENDDO Is there data sharing in this code? Write after read race Our Focus in loop parallelism: Is the discovery of loops that have no races. Example SUM = 0.0 DO I = 1, N R = F(B(I),C(I)) ! an expensive computation SUM = SUM + R ENDDO Are there races here? The variable SUM, which is written and read on every iteration Assuming that the computation of function F is expensive , how can we gain from parallelism in this case? We can gain some speed if we compute the values of F in parallel and then update SUM in the order in which those computations finish. To make this work, we must ensure that only one processor updates SUM at a time and each finishes before the next is allowed to begin. critical regions code segments that can be executed by only one processor at a time are used on shared-memory systems. Possible realization of the parallel version: SUM = 0.0 PARALLEL DO I = 1, N R = F(B(I),C(I)) ! an expensive computation BEGIN CRITICAL REGION SUM = SUM + R END CRITICAL REGION ENDDO SPMD Programming To perform the sum reduction above on a distributed memory message-passing system will need to rewrite the program to use explicit message passing. In an SPMD program, all of the processors execute the same code, but apply the code to different portions of the data. Considerations for SPMD Scalar variables are typically replicated on all of the processors and redundantly computed (to identical values) on each processor. In addition, the programmer must insert explicit communication primitives in order to pass the shared data between processors. The previous using SPMD program ! This code is executed by all processors ! MYSUM, MYFIRST, MYLAST, R, and I are private local variables ! MYFIRST and MYLAST are computed separately on each processor to point to nonintersecting sections of B and C ! GLOBALSUM is a global collective communication primitive MYSUM = 0.0 DO I = MYFIRST, MYLAST R = F(B(I),C(I)) ! an expensive computation MYSUM = MYSUM + R ENDDO SUM = GLOBALSUM(MYSUM) ! The communication is built into the function GLOBALSUM Assignment for next lecture Implement using JOMP the example in section 3.2.5 page 54 adopting the shared memory model. A Taste of Parallel Algorithms A Motivating Example...
View Full Document

This note was uploaded on 10/01/2009 for the course CS 525 taught by Professor Rjyosy during the Winter '09 term at Central Mich..

Page1 / 39

Lecture5 - College of Information Technology Master Program...

This preview shows document pages 1 - 15. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online